3576. Clarifying fill character in std::format

Section: 28.5.2.2 [format.string.std] Status: Resolved Submitter: Mark de Wever Opened: 2021-08-01 Last modified: 2023-03-23

Priority: 2

View other active issues in [format.string.std].

View all other issues in [format.string.std].

View all issues with Resolved status.

Discussion:

The paper P1868 "width: clarifying units of width and precision in std::format" added optional Unicode support to the format header. This paper didn't update the definition of the fill character, which is defined as

"The fill character can be any character other than { or }."

This wording means the fill is a character and not a Unicode grapheme cluster. Based on the current wording the range of available fill characters depends on the char_type of the format string. After P1868 the determination of the required padding size is Unicode aware, but it's not possible to use a Unicode grapheme clusters as padding. This looks odd from a user's perspective and already lead to implementation divergence between libc++ and MSVC STL:

For the width calculation the width of a Unicode grapheme cluster is estimated to be 1 or 2. Since padding with a 2 column width can't properly pad an odd number of columns the grapheme cluster used should always have a column width of 1.

The responsibility for precondition can be either be validated in the library or by the user. It would be possible to do the validation compile time and make the code ill-formed when the precondition is violated. For the following reason I think it's better to not validate the width:

Changing the fill type changes the size of the std::formatter and thus will be an ABI break.

The proposed resolution probably needs some additional changes since the Unicode and output width are specified later in the standard, specifically 28.5.2.2 [format.string.std]/9 - 12.

Previous resolution [SUPERSEDED]:

This wording is relative to N4892.

  1. Modify 28.5.2.2 [format.string.std] as indicated:

    -2- [Note 2: The fill character can be any character other than { or }. For a string in a Unicode encoding, the fill character can be any Unicode grapheme cluster other than { or }. For a string in a non-Unicode encoding, the fill character can be any character other than { or }. The output width of the fill character is always assumed to be one column. The presence of a fill character is signaled by the character following it, which must be one of the alignment options. If the second character of std-format-spec is not a valid alignment option, then it is assumed that both the fill character and the alignment option are absent. — end note]

[2021-08-09; Mark de Wever provides improved wording]

[2021-08-20; Reflector poll]

Set priority to 2 and status to "SG16" after reflector poll.

Previous resolution [SUPERSEDED]:

This wording is relative to N4892.

  1. Modify 28.5.2.2 [format.string.std] as indicated:

    -1- […] The syntax of format specifications is as follows:

    […]
    fill:
                 any Unicode grapheme cluster or character other than { or }
    […]
    

    -2- [Note 2: The fill character can be any character other than { or }. For a string in a Unicode encoding, the fill character can be any Unicode grapheme cluster other than { or }. For a string in a non-Unicode encoding, the fill character can be any character other than { or }. The output width of the fill character is always assumed to be one column.

    [Note 2: The presence of a fill character is signaled by the character following it, which must be one of the alignment options. If the second character of std-format-spec is not a valid alignment option, then it is assumed that both the fill character and the alignment option are absent. — end note]

[2021-08-26; SG16 reviewed and provides alternative wording]

[2023-01-11; LWG telecon]

P2572 would resolve this issue and LWG 3639.

Previous resolution [SUPERSEDED]:

This wording is relative to N4892.

  1. Modify 28.5.2.2 [format.string.std] as indicated:

    -1- […] The syntax of format specifications is as follows:

    […]
    fill:
                    any charactercodepoint of the literal encoding other than { or }
    […]
    

    -2- [Note 2: The fill character can be any charactercodepoint other than { or }. The presence of a fill character is signaled by the character following it, which must be one of the alignment options. If the second character of std-format-spec is not a valid alignment option, then it is assumed that both the fill character and the alignment option are absent. — end note]

[2023-03-22 Resolved by the adoption of P2572R1 in Issaquah. Status changed: SG16 → Resolved.]

Proposed resolution: