"ASCII"
is not a registered character encodingSection: 28.4.2.2 [text.encoding.general] Status: WP Submitter: Jonathan Wakely Opened: 2024-01-23 Last modified: 2024-04-02
Priority: Not Prioritized
View all issues with WP status.
Discussion:
The IANA Charater Sets registry does not contain "ASCII" as an alias of the "US-ASCII" encoding. This is apparently for historical reasons, because there used to be some ambiguity about exactly what "ASCII" meant. I don't think those historical reasons are relevant to C++26, but the absence of "ASCII" in the IANA registry means that it's not a registered character encoding as defined by 28.4.2.2 [text.encoding.general].
This means that the encoding referred to by notes in the C++ standard
(31.12.6.2 [fs.path.generic], 28.3.4.4.1.3 [facet.numpunct.virtuals])
and by an example in the std::text_encoding
proposal
(P1885) isn't actually usable in portable code.
So std::text_encoding("ASCII")
creates an object with
mib() == std::text_encoding::other
, which is not the same
encoding as std::text_encoding("US-ASCII")
.
This seems surprising.
[2024-03-12; Reflector poll]
SG16 approved the proposed resolution. Set status to Tentatively Ready after seven votes in favour during reflector poll.
[Tokyo 2024-03-23; Status changed: Voting → WP.]
Proposed resolution:
This wording is relative to N4971.
Modify 28.4.2.2 [text.encoding.general] as indicated:
-1- A registered character encoding is a character encoding scheme in the IANA Character Sets registry.
[Note 1: The IANA Character Sets registry uses the term “character sets” to refer to character encodings. — end note]
The primary name of a registered character encoding is the name of that encoding specified in the IANA Character Sets registry.
-2- The set of known registered character encodings contains every registered character encoding specified in the IANA Character Sets registry except for the following:
- (2.1) – NATS-DANO (33)
- (2.2) – NATS-DANO-ADD (34)
-3- Each known registered character encoding is identified by an enumerator in
text_encoding::id
, and has a set of zero or more aliases.-4- The set of aliases of a known registered character encoding is an implementation-defined superset of the aliases specified in the IANA Character Sets registry. The set of aliases for US-ASCII includes
"ASCII"
. No two aliases or primary names of distinct registered character encodings are equivalent when compared bytext_encoding::comp-name
.