5 Lexical conventions [lex]

If a multicharacter literal contains a c-char that is not encodable as a single code unit in the ordinary literal encoding, the program is ill-formed.

Multicharacter literals are conditionally-supported.

The kind of a character-literal, its type, and its associated character encoding ([lex.charset]) are determined by its encoding-prefix and its c-char-sequence as defined by Table 9 .

Table 9: Character literals [tab:lex.ccon.literal]

🔗	Encoding	Kind	Type	Associated char-	Example
🔗	prefix			acter encoding
🔗	none	ordinary character literal	char	ordinary literal	'v'
🔗		multicharacter literal	int	encoding	'abcd'
🔗	L	wide character literal	wchar_t	wide literal	L'w'
🔗				encoding
🔗	u8	UTF-8 character literal	char8_t	UTF-8	u8'x'
🔗	u	UTF-16 character literal	char16_t	UTF-16	u'y'
🔗	U	UTF-32 character literal	char32_t	UTF-32	U'z'

In translation phase 4, the value of a character-literal is determined using the range of representable values of the character-literal's type in translation phase 7.

A multicharacter literal has an implementation-defined value.

The value of any other kind of character-literal is determined as follows:

(3.1)
A character-literal with a c-char-sequence consisting of a single basic-c-char, simple-escape-sequence, or universal-character-name is the code unit value of the specified character as encoded in the literal's associated character encoding.

If the specified character lacks representation in the literal's associated character encoding or if it cannot be encoded as a single code unit, then the program is ill-formed.
(3.2)
A character-literal with a c-char-sequence consisting of a single numeric-escape-sequence has a value as follows:
- (3.2.1)
  Let v be the integer value represented by the octal number comprising the sequence of octal-digits in an octal-escape-sequence or by the hexadecimal number comprising the sequence of hexadecimal-digits in a hexadecimal-escape-sequence.
- (3.2.2)
  If v does not exceed the range of representable values of the character-literal's type, then the value is v.
- (3.2.3)
  Otherwise, if the character-literal's encoding-prefix is absent or L, and v does not exceed the range of representable values of the corresponding unsigned type for the underlying type of the character-literal's type, then the value is the unique value of the character-literal's type T that is congruent to v modulo $2^{N}$ , where N is the width of T.
- (3.2.4)
  Otherwise, the program is ill-formed.
(3.3)
A character-literal with a c-char-sequence consisting of a single conditional-escape-sequence is conditionally-supported and has an implementation-defined value.

The character specified by a simple-escape-sequence is specified in Table 10 .

[Note 1:

Using an escape sequence for a question mark is supported for compatibility with C++ 2014 and C.

— end note]

Table 10: Simple escape sequences [tab:lex.ccon.esc]

🔗	character		*simple-escape-sequence*
🔗	U+000a	line feed	\n
🔗	U+0009	character tabulation	\t
🔗	U+000b	line tabulation	\v
🔗	U+0008	backspace	\b
🔗	U+000d	carriage return	\r
🔗	U+000c	form feed	\f
🔗	U+0007	alert	\a
🔗	U+005c	reverse solidus	\\
🔗	U+003f	question mark	\?
🔗	U+0027	apostrophe	\'
🔗	U+0022	quotation mark	\"