3780. format's width estimation is too approximate and not forward compatible

Section: 28.5.2.2 [format.string.std] Status: Resolved Submitter: Corentin Jabot Opened: 2022-09-15 Last modified: 2023-03-23

Priority: 3

View other active issues in [format.string.std].

View all other issues in [format.string.std].

View all issues with Resolved status.

Discussion:

For the purpose of width estimation, format considers ranges of codepoints initially derived from an implementation of wcwidth with modifications (see P1868R1).

This however present a number of challenges:

Instead, we propose to

Note that per UAX-11

This change:

For the following code points, the estimated width used to be 1, and is 2 after the suggested change:

For the following code points, the estimated width used to be 2, and is 1 after the suggested change:

[2022-10-12; Reflector poll]

Set priority to 3 after reflector poll. Send to SG16.

Previous resolution [SUPERSEDED]:

This wording is relative to N4917.

  1. Modify 28.5.2.2 [format.string.std] as indicated:

    -12- For a string in a Unicode encoding, implementations should estimate the width of a string as the sum of estimated widths of the first code points in its extended grapheme clusters. The extended grapheme clusters of a string are defined by UAX #29. The estimated width of the following code points is 2:

    1. (12.1) — U+1100 – U+115F

    2. (12.2) — U+2329 – U+232A

    3. (12.3) — U+2E80 – U+303E

    4. (12.4) — U+3040 – U+A4CF

    5. (12.5) — U+AC00 – U+D7A3

    6. (12.6) — U+F900 – U+FAFF

    7. (12.7) — U+FE10 – U+FE19

    8. (12.8) — U+FE30 – U+FE6F

    9. (12.9) — U+FF00 – U+FF60

    10. (12.10) — U+FFE0 – U+FFE6

    11. (12.11) — U+1F300 – U+1F64F

    12. (12.12) — U+1F900 – U+1F9FF

    13. (12.13) — U+20000 – U+2FFFD

    14. (12.14) — U+30000 – U+3FFFD

    15. (?.1) — Any code point with the East_Asian_Width="W" or East_Asian_Width="F" Derived Extracted Property as described by UAX #44

    16. (?.2) — U+4DC0 – U+4DFF (Yijing Hexagram Symbols)

    17. (?.3) — U+1F300 – U+1F5FF (Miscellaneous Symbols and Pictographs)

    18. (?.4) — U+1F900 – U+1F9FF (Supplemental Symbols and Pictographs)

    The estimated width of other code points is 1.

[2023-03-22 Resolved by the adoption of P2675R1 in Issaquah. Status changed: SG16 → Resolved.]

Proposed resolution: