3314. Is stream insertion behavior locale dependent when Period::type is micro?

Section: 30.5.11 [time.duration.io] Status: C++20 Submitter: Tom Honermann Opened: 2019-11-04 Last modified: 2021-02-25

Priority: 2

View all other issues in [time.duration.io].

View all issues with C++20 status.

Discussion:

30.5.11 [time.duration.io] states:

template<class charT, class traits, class Rep, class Period>
  basic_ostream<charT, traits>&
    operator<<(basic_ostream<charT, traits>& os, const duration<Rep, Period>& d);

[…]

-3- The units suffix depends on the type Period::type as follows:

  1. […]

  2. (3.5) — Otherwise, if Period::type is micro, the suffix is "µs" ("\u00b5\u0073").

  3. […]

[…]

-4- If Period::type is micro, but the character U+00B5 cannot be represented in the encoding used for charT, the unit suffix "us" is used instead of "µs".

[…]

Which encoding is intended by "the encoding used for charT"? There are two candidates:

  1. The associated execution character set as defined by 5.3.1 [lex.charset] p3 used to encode character and string literals (e.g., the "execution wide-character set" for wchar_t).

  2. The locale dependent character set used by the std::locale ctype and codecvt facets as specified in 28.3.4.2 [category.ctype], sometimes referred to as the "native character set".

The behavior should not be dependent on locale and should therefore be specified in terms of the execution character sets.

The execution character set is implementation defined and some implementations allow the choice of execution character set to be specified via a compiler option or determined based on the locale active when the compiler is run. For example, the Microsoft compiler, when run on a Windows system with regional language settings configured for "English (United States)", will use Windows-1252 for the execution character set, but allows this choice to be overridden with the /execution-charset compiler option. The Microsoft compiler might therefore use "us" by default, but "µs" when invoked with the /execution-charset:utf-8 or /execution-charset:.437 options. In the latter two cases, the string contents would contain "\xb5\x73" and "\xe6\x73" respectively (Unicode and Windows code page 437 map µ (U+00B5, MICRO SIGN) to different code points).

This resolution relies on the character set for the locale used at run-time being compatible with the execution character set if the produced string is to be displayed correctly when written to a terminal or console. This is a typical requirement for character and string literals but is more strongly relevant for this issue since µ lacks representation in many character sets. Additionally, if the stream is imbued with a std::codecvt facet, the facet must provide appropriate conversion support for behavior to be well defined.

[2019-11 Priority to 2 during Tuesday morning issue processing in Belfast.]

Previous resolution [SUPERSEDED]:

This wording is relative to N4835.

  1. Modify 30.5.11 [time.duration.io] as indicated:

    [Drafting note: "implementation's native character set" is used in 28.3.4.2.2 [locale.ctype] and 28.3.4.2.5 [locale.codecvt] to refer to the locale dependent character encoding.]

    template<class charT, class traits, class Rep, class Period>
      basic_ostream<charT, traits>&
        operator<<(basic_ostream<charT, traits>& os, const duration<Rep, Period>& d);
    

    […]

    -3- The units suffix depends on the type Period::type as follows:

    1. […]

    2. (3.5) — Otherwise, if Period::type is micro, the suffix is "µs" ("\u00b5\u0073").

    3. […]

    […]

    -4- If Period::type is micro, but the character U+00B5 cannot be represented in the encoding usedlacks representation in the execution character set for charT, the unit suffix "us" is used instead of "µs". If "µs" is used but the implementation's native character set lacks representation for U+00B5 and the stream is associated with a terminal or console, or if the stream is imbued with a std::codecvt facet that lacks conversion support for the character, then the result is unspecified.

    […]

[2019-11-12; Tom Honermann improves wording]

[2020-02 Status to Immediate on Thursday night in Prague.]

Proposed resolution:

This wording is relative to N4835.

  1. Modify 30.5.11 [time.duration.io] as indicated:

    template<class charT, class traits, class Rep, class Period>
      basic_ostream<charT, traits>&
        operator<<(basic_ostream<charT, traits>& os, const duration<Rep, Period>& d);
    

    […]

    -3- The units suffix depends on the type Period::type as follows:

    1. […]

    2. (3.5) — Otherwise, if Period::type is micro, it is implementation-defined whether the suffix is "µs" ("\u00b5\u0073") or "us".

    3. […]

    […]

    -4- If Period::type is micro, but the character U+00B5 cannot be represented in the encoding used for charT, the unit suffix "us" is used instead of "µs".

    […]