filesystem::u8path
should be undeprecatedSection: D.21 [depr.fs.path.factory] Status: Open Submitter: Daniel Krügler Opened: 2022-12-10 Last modified: 2024-01-29
Priority: 3
View all other issues in [depr.fs.path.factory].
View all issues with Open status.
Discussion:
The filesystem::u8path
function became deprecated with the adoption of
P0482R6, but the rationale for that change is rather thin:
"The C++ standard must improve support for UTF-8 by removing the existing barriers that result in redundant tagging of character encodings, non-generic UTF-8 specific workarounds like
u8path
."
The u8path
function is still useful if my original string source is a char
sequence and I do know that the encoding of this sequence is UTF-8.
std::u8string
instead, which costs me
an additional transformation and doesn't work without reinterpret_cast
.
Even in the presence of char8_t
, legacy code bases often are still ABI-bound to char
.
In the future we may solve this problem using the tools provided by P2626 instead,
but right now this is not part of the standard and it wasn't at the time when u8path
became
deprecated.
This is in my opinion a good reason to undeprecate u8path
now and decide later on the
appropriate time to deprecate it again (if it really turns out to be obsolete by alternative
functionality).
Billy O'Neal provides a concrete example where the current deprecation status causes pain:
Example: vcpkg-tool files.cpp#L21-L45
Before p0482, we could just callstd::u8path
and it would do the right thing on both POSIX and Windows. After compilers started implementing '20, we have to make assumptions about the correct 'internal'std::path
encoding because there is no longer a way to arrive tostd::path
with achar
buffer that we know is UTF-8 encoded and get the correct results. It's one of the reasons we completely ripped out use ofstd::filesystem
on most platforms from vcpkg, so you won't see this in current sources.
[2023-01-06; Reflector poll]
Set priority to 3 after reflector poll. Set status to LEWG.
[2023-05-30; status to "Open"]
LEWG discussed this in January and had no consensus for undeprecation.
Proposed resolution:
This wording is relative to N4917.
Restore the u8path
declarations to 31.12.4 [fs.filesystem.syn], header
<filesystem>
synopsis, as indicated:
namespace std::filesystem { // 31.12.6 [fs.class.path], paths class path; // 31.12.6.8 [fs.path.nonmember], path non-member functions void swap(path& lhs, path& rhs) noexcept; size_t hash_value(const path& p) noexcept; // [fs.path.factory], path factory functions template<class Source> path u8path(const Source& source); template<class InputIterator> path u8path(InputIterator first, InputIterator last); // 31.12.7 [fs.class.filesystem.error], filesystem errors class filesystem_error; […] }
Restore the previous sub-clause [fs.path.factory] by copying the contents of D.21 [depr.fs.path.factory] to a new sub-clause [fs.path.factory] between 31.12.6.8 [fs.path.nonmember] and 31.12.6.10 [fs.path.hash] and without Note 1 as indicated:
[Drafting note: As additional stylistic adaption we replace the obsolete Requires element by a Preconditions element plus a Mandates element (similar to that of 31.12.6.5.1 [fs.path.construct] p5).
As a second stylistic improvement we convert the now more unusual "if […]; otherwise" construction in bullets by "Otherwise, if […]" constructions.]
? Factory functions [fs.path.factory]
template<class Source> path u8path(const Source& source); template<class InputIterator> path u8path(InputIterator first, InputIterator last);-?- Mandates: The value type of
-?- Preconditions: TheSource
andInputIterator
ischar
orchar8_t
.source
and[first, last)
sequences are UTF-8 encoded. -?- Returns:
(?.1) — If
value_type
ischar
and the current native narrow encoding (31.12.6.3.2 [fs.path.type.cvt]) is UTF-8, returnpath(source)
orpath(first, last)
.(?.2) — Otherwise, if
value_type
iswchar_t
and the native wide encoding is UTF-16, or ifvalue_type
ischar16_t
orchar32_t
, convertsource
or[first, last)
to a temporary,tmp
, of typestring_type
and returnpath(tmp)
.(?.3) — Otherwise, convert
source
or[first, last)
to a temporary,tmp
, of typeu32string
and returnpath(tmp)
.-?- Remarks: Argument format conversion (31.12.6.3.1 [fs.path.fmt.cvt]) applies to the arguments for these functions. How Unicode encoding conversions are performed is unspecified.
-?- [Example 1: A string is to be read from a database that is encoded in UTF-8, and used to create a directory using the native encoding for filenames:namespace fs = std::filesystem; std::string utf8_string = read_utf8_data(); fs::create_directory(fs::u8path(utf8_string));For POSIX-based operating systems with the native narrow encoding set to UTF-8, no encoding or type conversion occurs.
For POSIX-based operating systems with the native narrow encoding not set to UTF-8, a conversion to UTF-32 occurs, followed by a conversion to the current native narrow encoding. Some Unicode characters may have no native character set representation. For Windows-based operating systems a conversion from UTF-8 to UTF-16 occurs. — end example]
Delete sub-clause D.21 [depr.fs.path.factory] in its entirety.