4043. "ASCII" is not a registered character encoding

Section: 30.6.2.2 [text.encoding.general] Status: New Submitter: Jonathan Wakely Opened: 2024-01-23 Last modified: 2024-01-23 14:14:18 UTC

Priority: Not Prioritized

View all issues with New status.

Discussion:

The IANA Charater Sets registry does not contain "ASCII" as an alias of the "US-ASCII" encoding. This is apparently for historical reasons, because there used to be some ambiguity about exactly what "ASCII" meant. I don't think those historical reasons are relevant to C++26, but the absence of "ASCII" in the IANA registry means that it's not a registered character encoding as defined by 30.6.2.2 [text.encoding.general].

This means that the encoding referred to by notes in the C++ standard (31.12.6.2 [fs.path.generic], 30.4.4.1.3 [facet.numpunct.virtuals]) and by an example in the std::text_encoding proposal (P1885) isn't actually usable in portable code. So std::text_encoding("ASCII") creates an object with mib() == std::text_encoding::other, which is not the same encoding as std::text_encoding("US-ASCII"). This seems surprising.

Proposed resolution:

This wording is relative to N4971.

  1. Modify 30.6.2.2 [text.encoding.general] as indicated:

    -1- A registered character encoding is a character encoding scheme in the IANA Character Sets registry.

    [Note 1: The IANA Character Sets registry uses the term “character sets” to refer to character encodings. — end note]

    The primary name of a registered character encoding is the name of that encoding specified in the IANA Character Sets registry.

    -2- The set of known registered character encodings contains every registered character encoding specified in the IANA Character Sets registry except for the following:

    1. (2.1) – NATS-DANO (33)
    2. (2.2) – NATS-DANO-ADD (34)

    -3- Each known registered character encoding is identified by an enumerator in text_encoding::id, and has a set of zero or more aliases.

    -4- The set of aliases of a known registered character encoding is an implementation-defined superset of the aliases specified in the IANA Character Sets registry. The set of aliases for US-ASCII includes "ASCII". No two aliases or primary names of distinct registered character encodings are equivalent when compared by text_encoding::comp-name.