Annex E (informative) Conformance with UAX #31 [uaxid]

E.1 General [uaxid.general]

This Annex describes the choices made in application of UAX #31 (“Unicode Identifier and Pattern Syntax”) to C++ in terms of the requirements from UAX #31 and how they do or do not apply to C++.
In terms of UAX #31, C++ conforms by meeting the requirements R1 “Default Identifiers” and R4 “Equivalent Normalized Identifiers”.
The other requirements, also listed below, are either alternatives not taken or do not apply to C++.

E.2 R1 Default identifiers [uaxid.def]

E.2.1 General [uaxid.def.general]

UAX #31 specifies a default syntax for identifiers based on properties from the Unicode Character Database, UAX #44.
The general syntax is
<Identifier> := <Start> <Continue>* (<Medial> <Continue>+)*
where <Start> has the XID_Start property, <Continue> has the XID_Continue property, and <Medial> is a list of characters permitted between continue characters.
For C++ we add the character U+005f low line, or _, to the set of permitted <Start> characters, the <Medial> set is empty, and the <Continue> characters are unmodified.
In the grammar used in UAX #31, this is
<Identifier> := <Start> <Continue>*
<Start> := XID_Start + U+005f
<Continue> := <Start> + XID_Continue
This is described in the C++ grammar in [lex.name], where identifier is formed from identifier-start or identifier followed by identifier-continue.

E.2.2 R1a Restricted format characters [uaxid.def.rfmt]

If an implementation of UAX #31 wishes to allow format characters such as U+200d zero width joiner or U+200c zero width non-joiner it must define a profile allowing them, or describe precisely which combinations are permitted.
C++ does not allow format characters in identifiers, so this does not apply.

E.2.3 R1b Stable identifiers [uaxid.def.stable]

An implementation of UAX #31 may choose to guarantee that identifiers are stable across versions of the Unicode Standard.
Once a string qualifies as an identifier it does so in all future versions.
C++ does not make this guarantee, except to the extent that UAX #31 guarantees the stability of the XID_Start and XID_Continue properties.

E.3 R2 Immutable identifiers [uaxid.immutable]

An implementation may choose to guarantee that the set of identifiers will never change by fixing the set of code points allowed in identifiers forever.
C++ does not choose to make this guarantee.
As scripts are added to Unicode, additional characters in those scripts may become available for use in identifiers.

E.4 R3 Pattern_White_Space and Pattern_Syntax characters [uaxid.pattern]

UAX #31 describes how formal languages such as computer languages should describe and implement their use of whitespace and syntactically significant characters during the processes of lexing and parsing.
C++ does not claim conformance with this requirement.

E.5 R4 Equivalent normalized identifiers [uaxid.eqn]

UAX #31 requires that implementations describe how identifiers are compared and considered equivalent.
C++ requires that identifiers be in Normalization Form C and therefore identifiers that compare the same under NFC are equivalent.
This is described in [lex.name].

E.6 R5 Equivalent case-insensitive identifiers [uaxid.eqci]

C++ considers case to be significant in identifier comparison, and does not do any case folding.
This requirement does not apply to C++.

E.7 R6 Filtered normalized identifiers [uaxid.filter]

If any characters are excluded from normalization, UAX #31 requires a precise specification of those exclusions.
C++ does not make any such exclusions.

E.8 R7 Filtered case-insensitive identifiers [uaxid.filterci]

C++ identifiers are case sensitive, and therefore this requirement does not apply.

E.9 R8 Hashtag identifiers [uaxid.hashtag]

There are no hashtags in C++, so this requirement does not apply.