5 Lexical conventions [lex]

5.13 Literals [lex.literal]

5.13.3 Character literals [lex.ccon]

encoding-prefix: one of
u8  u  U  L
basic-c-char:
any member of the translation character set except the U+0027 APOSTROPHE,
   U+005C REVERSE SOLIDUS, or new-line character
simple-escape-sequence-char: one of
' " ? \ a b f n r t v
conditional-escape-sequence-char:
any member of the basic character set that is not an octal-digit, a simple-escape-sequence-char, or the characters u, U, or x
A non-encodable character literal is a character-literal whose c-char-sequence consists of a single c-char that is not a numeric-escape-sequence and that specifies a character that either lacks representation in the literal's associated character encoding or that cannot be encoded as a single code unit.
A multicharacter literal is a character-literal whose c-char-sequence consists of more than one c-char.
The encoding-prefix of a non-encodable character literal or a multicharacter literal shall be absent or L.
Such character-literals are conditionally-supported.
The kind of a character-literal, its type, and its associated character encoding are determined by its encoding-prefix and its c-char-sequence as defined by Table 9.
The special cases for non-encodable character literals and multicharacter literals take precedence over their respective base kinds.
[Note 1:
The associated character encoding for ordinary and wide character literals determines encodability, but does not determine the value of non-encodable ordinary or wide character literals or ordinary or wide multicharacter literals.
The examples in Table 9 for non-encodable ordinary and wide character literals assume that the specified character lacks representation in the ordinary literal encoding or wide literal encoding, respectively, or that encoding the character would require more than one code unit.
— end note]
Table 9: Character literals [tab:lex.ccon.literal]
Encoding
Kind
Type
Associated char-
Example
prefix
acter encoding
none
char
ordinary
'v'
non-encodable ordinary character literal
int
literal
'\U0001F525'
ordinary multicharacter literal
int
encoding
'abcd'
L
wchar_­t
wide
L'w'
non-encodable wide character literal
wchar_­t
literal
L'\U0001F32A'
wide multicharacter literal
wchar_­t
encoding
L'abcd'
u8
char8_­t
UTF-8
u8'x'
u
char16_­t
UTF-16
u'y'
U
char32_­t
UTF-32
U'z'
In translation phase 4, the value of a character-literal is determined using the range of representable values of the character-literal's type in translation phase 7.
A non-encodable character literal or a multicharacter literal has an implementation-defined value.
The value of any other kind of character-literal is determined as follows:
The character specified by a simple-escape-sequence is specified in Table 10.
[Note 3:
Using an escape sequence for a question mark is supported for compatibility with ISO C++ 2014 and ISO C.
— end note]
Table 10: Simple escape sequences [tab:lex.ccon.esc]
character
U+000A
LINE FEED (LF)
\n
U+0009
CHARACTER TABULATION
\t
U+000B
LINE TABULATION
\v
U+0008
BACKSPACE
\b
U+000D
CARRIAGE RETURN (CR)
\r
U+000C
FORM FEED (FF)
\f
U+0007
BELL
\a
U+005C
REVERSE SOLIDUS
\\
U+003F
QUESTION MARK
\?
U+0027
APOSTROPHE
\'
U+0022
QUOTATION MARK
\"