std::formatter<std::filesystem::path>Section: 31.12.6.9.2 [fs.path.fmtr.funcs] Status: Open Submitter: Jonathan Wakely Opened: 2024-04-19 Last modified: 2025-10-10
Priority: 2
View all issues with Open status.
Discussion:
31.12.6.9.2 [fs.path.fmtr.funcs] says:
IfcharTischar,path::value_typeiswchar_t, and the literal encoding is UTF-8, then the escaped path is transcoded from the native encoding for wide character strings to UTF-8 with maximal subparts of ill-formed subsequences substituted with u+fffd replacement character per the Unicode Standard [...]. Otherwise, transcoding is implementation-defined.
This seems to mean that the Unicode substitutions are only done
for an escaped path, i.e. when the ? option is used. Otherwise, the form
of transcoding is completely implementation-defined.
However, this makes no sense.
An escaped string will have no ill-formed subsequences, because they will
already have been replaced as per 28.5.6.5 [format.string.escaped]:
Otherwise (X is a sequence of ill-formed code units), each code unit U is appended to E in order as the sequence\x{hex-digit-sequence}, wherehex-digit-sequenceis the shortest hexadecimal representation of U using lower-case hexadecimal digits.
So only unescaped strings can have ill-formed sequences by the time
we do transcoding to char, but whether or not any
u+fffd substitution
occurs is just implementation-defined.
I believe we want to specify the substitutions are done when transcoding an unescaped path (and it doesn't matter whether we specify it for escaped paths, because it's a no-op if escaping happens first, as is apparently intended).
It does matter whether we escape first or perform substitutions first.
If we escape first then every code unit in an ill-formed sequence is
individually escaped as \x{hex-digit-sequence}.
So an ill-formed sequence of two wchar_t values will be escaped as
two \x{...} strings, which are then transcoded to UTF-8.
If we transcode (with substitutions first) then the entire
ill-formed sequence is replaced with a single replacement character,
which will then be escaped as \x{fffd}.
SG16 should be asked to confirm that escaping first is intended,
so that an escaped string shows the original invalid code units.
For a non-escaped string, we want the ill-formed sequence to be
formatted as �, which the proposed resolution tries to ensure.
[2024-05-08; Reflector poll]
Set priority to 2 after reflector poll.
Previous resolution [SUPERSEDED]:
This wording is relative to N4981.
Modify 31.12.6.9.2 [fs.path.fmtr.funcs] as indicated:
template<class FormatContext> typename FormatContext::iterator format(const filesystem::path& p, FormatContext& ctx) const;-5- Effects: Letsbep.generic_string<filesystem::path::value_type>()if thegoption is used, otherwisep.native(). Writessintoctx.out(), adjusted according to the path-format-spec. IfcharTischar,path::value_typeiswchar_t, and the literal encoding is UTF-8, then theescaped path(possibly escaped) string is transcoded from the native encoding for wide character strings to UTF-8 with maximal subparts of ill-formed subsequences substituted with u+fffd replacement character per the Unicode Standard, Chapter 3.9 u+fffd Substitution in Conversion. IfcharTandpath::value_typeare the same then no transcoding is performed. Otherwise, transcoding is implementation-defined.- Modify the entry in the index of implementation-defined behavior as indicated:
transcoding of a formattedpathwhencharTandpath::value_typediffer and not converting fromwchar_tto UTF-8
[2025-06-11; SG16 comments and improves wording]
The "and not converting from wchar_t to UTF-8" wording added in the index of implementation-defined
behavior by the current proposed resolution should be changed to "and the literal encoding is not UTF-8".
wchar_t to UTF-8" with "and the literal encoding
is not UTF-8". The optional change is to insert "ordinary" before "literal encoding" as well. Once that is done,
I'll have SG16 confirm they are content with the new proposed resolution.
Previous resolution [SUPERSEDED]:
This wording is relative to N5008.
Modify 31.12.6.9.2 [fs.path.fmtr.funcs] as indicated:
template<class FormatContext> typename FormatContext::iterator format(const filesystem::path& p, FormatContext& ctx) const;-5- Effects: Let
sbep.generic_string<filesystem::path::value_type>()if thegoption is used, otherwisep.native(). Writessintoctx.out(), adjusted according to the path-format-spec. IfcharTischar,path::value_typeiswchar_t, and the ordinary literal encoding is UTF-8, then theescaped path(possibly escaped) string is transcoded from the native encoding for wide character strings to UTF-8 with maximal subparts of ill-formed subsequences substituted with u+fffd replacement character per the Unicode Standard, Chapter 3.9 u+fffd Substitution in Conversion. IfcharTandpath::value_typeare the same then no transcoding is performed. Otherwise, transcoding is implementation-defined.- Modify the entry in the index of implementation-defined behavior as indicated:
transcoding of a formattedpathwhencharTandpath::value_typediffer and the ordinary literal encoding is not UTF-8
[2025-07-30; SG16 meeting]
SG16 unanimously approved new wording produced during the discussion. The group concluded that the intended behavior would be best specified by introducing additional names to denote the sequence of transformations that produce the intended effect. Status updated SG16 → Open.
Proposed resolution:
This wording is relative to N5014.
Modify 31.12.6.9.2 [fs.path.fmtr.funcs] as indicated:
template<class FormatContext> typename FormatContext::iterator format(const filesystem::path& p, FormatContext& ctx) const;-5- Effects: Let
sbep.generic_stringif the<filesystem::path::value_type>()goption is used, otherwisep.native(). Lets2besadjusted according to the path-format-spec. Lets3be defined as follows:Writes
- (5.1) — If
charTischar,path::value_typeiswchar_t, and the ordinary literal encoding is UTF-8,s3is the result of transcodings2from the native encoding for wide character strings to UTF-8 with maximal subparts of ill-formed subsequences substituted with U+FFFD REPLACEMENT CHARACTER per the Unicode Standard, Chapter 3.9 U+FFFD Substitution in Conversion.- (5.2) — If
charTandpath::value_typeare the same, thens3is the same ass2.- (5.3) — Otherwise,
s3is the result of an implementation-defined transcoding ofs2.s3intoctx.out().Writessintoctx.out(), adjusted according to the path-format-spec. IfcharTischar,path::value_typeiswchar_t, and the literal encoding is UTF-8, then the escaped path is transcoded from the native encoding for wide character strings to UTF-8 with maximal subparts of ill-formed subsequences substituted with u+fffd replacement character per the Unicode Standard, Chapter 3.9 u+fffd Substitution in Conversion. IfcharTandpath::value_typeare the same then no transcoding is performed. Otherwise, transcoding is implementation-defined.
transcoding of a formattedpathwhencharTandpath::value_typediffer and the ordinary literal encoding is not UTF-8