std::format
Section: 28.5.2.2 [format.string.std] Status: Resolved Submitter: Mark de Wever Opened: 2021-08-01 Last modified: 2023-03-23
Priority: 2
View other active issues in [format.string.std].
View all other issues in [format.string.std].
View all issues with Resolved status.
Discussion:
The paper P1868 "width: clarifying units of width and precision in
std::format
" added optional Unicode support to the format
header.
This paper didn't update the definition of the fill character, which is defined as
"The fill character can be any character other than
{
or}
."
This wording means the fill is a character and not a Unicode grapheme cluster. Based
on the current wording the range of available fill characters depends on the
char_type
of the format string. After P1868 the determination of the required
padding size is Unicode aware, but it's not possible to use a Unicode grapheme clusters
as padding. This looks odd from a user's perspective and already lead to implementation
divergence between libc++ and MSVC STL:
The WIP libc++ implementation stores one char_type
, strictly adhering
to the wording of the Standard.
MSVC STL stores one code point, regardless of the char_type
used. This
is already better from a user's perspective; all 1 code point grapheme clusters are
properly handled.
For the width calculation the width of a Unicode grapheme cluster is estimated to be 1 or 2. Since padding with a 2 column width can't properly pad an odd number of columns the grapheme cluster used should always have a column width of 1.
The responsibility for precondition can be either be validated in the library or by the user. It would be possible to do the validation compile time and make the code ill-formed when the precondition is violated. For the following reason I think it's better to not validate the width:P1868 14. Implementation
"More importantly, our approach permits refining the definition in the future if there is interest in doing so. It will mostly require researching the status of Unicode support on terminals and minimal or no changes to the implementation."
When an estimated width of 1 is required it means that improving the Standard may make previously valid code ill-formed after the improvement.
P1868 13. Examples
The example of the family grapheme cluster is only rendered properly on the MacOS terminal. So even when the library does a proper validation it's not certain the output will be rendered properly.
Changing the fill type changes the size of the std::formatter
and thus will be an ABI break.
Previous resolution [SUPERSEDED]:
This wording is relative to N4892.
Modify 28.5.2.2 [format.string.std] as indicated:
-2- [Note 2:
The fill character can be any character other thanFor a string in a Unicode encoding, the fill character can be any Unicode grapheme cluster other than{
or}
.{
or}
. For a string in a non-Unicode encoding, the fill character can be any character other than{
or}
. The output width of the fill character is always assumed to be one column. The presence of a fill character is signaled by the character following it, which must be one of the alignment options. If the second character of std-format-spec is not a valid alignment option, then it is assumed that both the fill character and the alignment option are absent. — end note]
[2021-08-09; Mark de Wever provides improved wording]
[2021-08-20; Reflector poll]
Set priority to 2 and status to "SG16" after reflector poll.
Previous resolution [SUPERSEDED]:
This wording is relative to N4892.
Modify 28.5.2.2 [format.string.std] as indicated:
-1- […] The syntax of format specifications is as follows:
[…] fill: any Unicode grapheme cluster or character other than{
or}
[…]-2-
[Note 2: The presence of a fill character is signaled by the character following it, which must be one of the alignment options. If the second character of std-format-spec is not a valid alignment option, then it is assumed that both the fill character and the alignment option are absent. — end note][Note 2: The fill character can be any character other thanFor a string in a Unicode encoding, the fill character can be any Unicode grapheme cluster other than{
or}
.{
or}
. For a string in a non-Unicode encoding, the fill character can be any character other than{
or}
. The output width of the fill character is always assumed to be one column.
[2021-08-26; SG16 reviewed and provides alternative wording]
[2023-01-11; LWG telecon]
P2572 would resolve this issue and LWG 3639.
Previous resolution [SUPERSEDED]:
This wording is relative to N4892.
Modify 28.5.2.2 [format.string.std] as indicated:
-1- […] The syntax of format specifications is as follows:
[…] fill: anycharactercodepoint of the literal encoding other than{
or}
[…]-2- [Note 2: The fill character can be any
charactercodepoint other than{
or}
. The presence of a fill character is signaled by the character following it, which must be one of the alignment options. If the second character of std-format-spec is not a valid alignment option, then it is assumed that both the fill character and the alignment option are absent. — end note]
[2023-03-22 Resolved by the adoption of P2572R1 in Issaquah. Status changed: SG16 → Resolved.]
Proposed resolution: