filesystem::u8path should be undeprecatedSection: D.22.1 [depr.fs.path.factory] Status: Open Submitter: Daniel Krügler Opened: 2022-12-10 Last modified: 2024-01-29
Priority: 3
View all other issues in [depr.fs.path.factory].
View all issues with Open status.
Discussion:
The filesystem::u8path function became deprecated with the adoption of
P0482R6, but the rationale for that change is rather thin:
"The C++ standard must improve support for UTF-8 by removing the existing barriers that result in redundant tagging of character encodings, non-generic UTF-8 specific workarounds like
u8path."
The u8path function is still useful if my original string source is a char
sequence and I do know that the encoding of this sequence is UTF-8.
std::u8string instead, which costs me
an additional transformation and doesn't work without reinterpret_cast.
Even in the presence of char8_t, legacy code bases often are still ABI-bound to char.
In the future we may solve this problem using the tools provided by P2626 instead,
but right now this is not part of the standard and it wasn't at the time when u8path became
deprecated.
This is in my opinion a good reason to undeprecate u8path now and decide later on the
appropriate time to deprecate it again (if it really turns out to be obsolete by alternative
functionality).
Billy O'Neal provides a concrete example where the current deprecation status causes pain:
Example: vcpkg-tool files.cpp#L21-L45
Before p0482, we could just callstd::u8pathand it would do the right thing on both POSIX and Windows. After compilers started implementing '20, we have to make assumptions about the correct 'internal'std::pathencoding because there is no longer a way to arrive tostd::pathwith acharbuffer that we know is UTF-8 encoded and get the correct results. It's one of the reasons we completely ripped out use ofstd::filesystemon most platforms from vcpkg, so you won't see this in current sources.
[2023-01-06; Reflector poll]
Set priority to 3 after reflector poll. Set status to LEWG.
[2023-05-30; status to "Open"]
LEWG discussed this in January and had no consensus for undeprecation.
Proposed resolution:
This wording is relative to N4917.
Restore the u8path declarations to 31.12.4 [fs.filesystem.syn], header
<filesystem> synopsis, as indicated:
namespace std::filesystem {
// 31.12.6 [fs.class.path], paths
class path;
// 31.12.6.8 [fs.path.nonmember], path non-member functions
void swap(path& lhs, path& rhs) noexcept;
size_t hash_value(const path& p) noexcept;
// [fs.path.factory], path factory functions
template<class Source>
path u8path(const Source& source);
template<class InputIterator>
path u8path(InputIterator first, InputIterator last);
// 31.12.7 [fs.class.filesystem.error], filesystem errors
class filesystem_error;
[…]
}
Restore the previous sub-clause [fs.path.factory] by copying the contents of D.22.1 [depr.fs.path.factory] to a new sub-clause [fs.path.factory] between 31.12.6.8 [fs.path.nonmember] and 31.12.6.10 [fs.path.hash] and without Note 1 as indicated:
[Drafting note: As additional stylistic adaption we replace the obsolete Requires element by a Preconditions element plus a Mandates element (similar to that of 31.12.6.5.1 [fs.path.construct] p5).
As a second stylistic improvement we convert the now more unusual "if […]; otherwise" construction in bullets by "Otherwise, if […]" constructions.]
? Factory functions [fs.path.factory]
template<class Source> path u8path(const Source& source); template<class InputIterator> path u8path(InputIterator first, InputIterator last);-?- Mandates: The value type of
-?- Preconditions: TheSourceandInputIteratorischarorchar8_t.sourceand[first, last)sequences are UTF-8 encoded. -?- Returns:
(?.1) — If
value_typeischarand the current native narrow encoding (31.12.6.3.2 [fs.path.type.cvt]) is UTF-8, returnpath(source)orpath(first, last).(?.2) — Otherwise, if
value_typeiswchar_tand the native wide encoding is UTF-16, or ifvalue_typeischar16_torchar32_t, convertsourceor[first, last)to a temporary,tmp, of typestring_typeand returnpath(tmp).(?.3) — Otherwise, convert
sourceor[first, last)to a temporary,tmp, of typeu32stringand returnpath(tmp).-?- Remarks: Argument format conversion (31.12.6.3.1 [fs.path.fmt.cvt]) applies to the arguments for these functions. How Unicode encoding conversions are performed is unspecified.
-?- [Example 1: A string is to be read from a database that is encoded in UTF-8, and used to create a directory using the native encoding for filenames:namespace fs = std::filesystem; std::string utf8_string = read_utf8_data(); fs::create_directory(fs::u8path(utf8_string));For POSIX-based operating systems with the native narrow encoding set to UTF-8, no encoding or type conversion occurs.
For POSIX-based operating systems with the native narrow encoding not set to UTF-8, a conversion to UTF-32 occurs, followed by a conversion to the current native narrow encoding. Some Unicode characters may have no native character set representation. For Windows-based operating systems a conversion from UTF-8 to UTF-16 occurs. — end example]
Delete sub-clause D.22.1 [depr.fs.path.factory] in its entirety.