5 Lexical conventions [lex]

5.13 Literals [lex.literal]

5.13.1 Kinds of literals [lex.literal.kinds]

There are several kinds of literals.14
[Note 1:
When appearing as an expression, a literal has a type and a value category ([expr.prim.literal]).
— end note]
14)14)
The term “literal” generally designates, in this document, those tokens that are called “constants” in ISO C.

5.13.2 Integer literals [lex.icon]

binary-digit: one of
0 1
octal-digit: one of
0 1 2 3 4 5 6 7
nonzero-digit: one of
1 2 3 4 5 6 7 8 9
0x 0X
0 1 2 3 4 5 6 7 8 9
a b c d e f
A B C D E F
unsigned-suffix: one of
u U
long-suffix: one of
l L
long-long-suffix: one of
ll LL
size-suffix: one of
z Z
In an integer-literal, the sequence of binary-digits, octal-digits, digits, or hexadecimal-digits is interpreted as a base N integer as shown in table Table 7; the lexically first digit of the sequence of digits is the most significant.
[Note 1:
The prefix and any optional separating single quotes are ignored when determining the value.
— end note]
Table 7: Base of integer-literals[tab:lex.icon.base]
 🔗 Kind of integer-literal base N 🔗 binary-literal 2 🔗 octal-literal 8 🔗 decimal-literal 10 🔗 hexadecimal-literal 16
The hexadecimal-digits a through f and A through F have decimal values ten through fifteen.
[Example 1:
The number twelve can be written 12, 014, 0XC, or 0b1100.
The integer-literals 1048576, 1'048'576, 0X100000, 0x10'0000, and 0'004'000'000 all have the same value.
— end example]
The type of an integer-literal is the first type in the list in Table 8 corresponding to its optional integer-suffix in which its value can be represented.
Table 8: Types of integer-literals[tab:lex.icon.type]
 🔗 integer-suffix decimal-literal integer-literal other than decimal-literal 🔗 none int int 🔗 long int unsigned int 🔗 long long int long int 🔗 unsigned long int 🔗 long long int 🔗 unsigned long long int 🔗 u or U unsigned int unsigned int 🔗 unsigned long int unsigned long int 🔗 unsigned long long int unsigned long long int 🔗 l or L long int long int 🔗 long long int unsigned long int 🔗 long long int 🔗 unsigned long long int 🔗 Both u or U unsigned long int unsigned long int 🔗 and l or L unsigned long long int unsigned long long int 🔗 ll or LL long long int long long int 🔗 unsigned long long int 🔗 Both u or U unsigned long long int unsigned long long int 🔗 and ll or LL 🔗 z or Z the signed integer type corresponding the signed integer type 🔗 to std​::​size_t ([support.types.layout]) corresponding to std​::​size_t 🔗 std​::​size_t 🔗 Both u or U std​::​size_t std​::​size_t 🔗 and z or Z
Except for integer-literals containing a size-suffix, if the value of an integer-literal cannot be represented by any type in its list and an extended integer type ([basic.fundamental]) can represent its value, it may have that extended integer type.
If all of the types in the list for the integer-literal are signed, the extended integer type is signed.
If all of the types in the list for the integer-literal are unsigned, the extended integer type is unsigned.
If the list contains both signed and unsigned types, the extended integer type may be signed or unsigned.
If an integer-literal cannot be represented by any of the allowed types, the program is ill-formed.
[Note 2:
An integer-literal with a z or Z suffix is ill-formed if it cannot be represented by std​::​size_t.
— end note]

5.13.3 Character literals [lex.ccon]

encoding-prefix: one of
u8  u  U  L
basic-c-char:
any member of the translation character set except the U+0027 apostrophe,
U+005c reverse solidus, or new-line character
simple-escape-sequence-char: one of
' " ? \ a b f n r t v
conditional-escape-sequence-char:
any member of the basic character set that is not an octal-digit, a simple-escape-sequence-char, or the characters N, o, u, U, or x
A multicharacter literal is a character-literal whose c-char-sequence consists of more than one c-char.
A multicharacter literal shall not have an encoding-prefix.
If a multicharacter literal contains a c-char that is not encodable as a single code unit in the ordinary literal encoding, the program is ill-formed.
Multicharacter literals are conditionally-supported.
The kind of a character-literal, its type, and its associated character encoding ([lex.charset]) are determined by its encoding-prefix and its c-char-sequence as defined by Table 9.
Table 9: Character literals [tab:lex.ccon.literal]
 🔗 Encoding Kind Type Associated char- Example 🔗 prefix acter encoding 🔗 none ordinary character literal char ordinary literal 'v' 🔗 multicharacter literal int encoding 'abcd' 🔗 L wide character literal wchar_t wide literal L'w' 🔗 encoding 🔗 u8 UTF-8 character literal char8_t UTF-8 u8'x' 🔗 u UTF-16 character literal char16_t UTF-16 u'y' 🔗 U UTF-32 character literal char32_t UTF-32 U'z'
In translation phase 4, the value of a character-literal is determined using the range of representable values of the character-literal's type in translation phase 7.
A multicharacter literal has an implementation-defined value.
The value of any other kind of character-literal is determined as follows:
The character specified by a simple-escape-sequence is specified in Table 10.
[Note 1:
Using an escape sequence for a question mark is supported for compatibility with ISO C++ 2014 and ISO C.
— end note]
Table 10: Simple escape sequences [tab:lex.ccon.esc]
 🔗 character simple-escape-sequence 🔗 U+000a line feed \n 🔗 U+0009 character tabulation \t 🔗 U+000b line tabulation \v 🔗 U+0008 backspace \b 🔗 U+000d carriage return \r 🔗 U+000c form feed \f 🔗 U+0007 alert \a 🔗 U+005c reverse solidus \\ 🔗 U+003f question mark \? 🔗 U+0027 apostrophe \' 🔗 U+0022 quotation mark \"

5.13.4 Floating-point literals [lex.fcon]

sign: one of
+ -
floating-point-suffix: one of
f l f16 f32 f64 f128 bf16 F L F16 F32 F64 F128 BF16
The type of a floating-point-literal ([basic.fundamental], [basic.extended.fp]) is determined by its floating-point-suffix as specified in Table 11.
[Note 1:
The floating-point suffixes f16, f32, f64, f128, bf16, F16, F32, F64, F128, and BF16 are conditionally-supported.
— end note]
Table 11: Types of floating-point-literals[tab:lex.fcon.type]
 🔗 floating-point-suffix type 🔗 none double 🔗 f or F float 🔗 l or L long double 🔗 f16 or F16 std::float16_t 🔗 f32 or F32 std::float32_t 🔗 f64 or F64 std::float64_t 🔗 f128 or F128 std::float128_t 🔗 bf16 or BF16 std::bfloat16_t
In the significand, the sequence of digits or hexadecimal-digits and optional period are interpreted as a base N real number s, where N is 10 for a decimal-floating-point-literal and 16 for a hexadecimal-floating-point-literal.
[Note 2:
Any optional separating single quotes are ignored when determining the value.
— end note]
If an exponent-part or binary-exponent-part is present, the exponent e of the floating-point-literal is the result of interpreting the sequence of an optional sign and the digits as a base 10 integer.
Otherwise, the exponent e is 0.
The scaled value of the literal is for a decimal-floating-point-literal and for a hexadecimal-floating-point-literal.
[Example 1:
The floating-point-literals 49.625 and 0xC.68p+2 have the same value.
The floating-point-literals 1.602'176'565e-19 and 1.602176565e-19 have the same value.
— end example]
If the scaled value is not in the range of representable values for its type, the program is ill-formed.
Otherwise, the value of a floating-point-literal is the scaled value if representable, else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner.

5.13.5 String literals [lex.string]

basic-s-char:
any member of the translation character set except the U+0022 quotation mark,
U+005c reverse solidus, or new-line character
r-char:
any member of the translation character set, except a U+0029 right parenthesis followed by
the initial d-char-sequence (which may be empty) followed by a U+0022 quotation mark
d-char:
any member of the basic character set except:
U+0020 space, U+0028 left parenthesis, U+0029 right parenthesis, U+005c reverse solidus,
U+0009 character tabulation, U+000b line tabulation, U+000c form feed, and new-line
The kind of a string-literal, its type, and its associated character encoding ([lex.charset]) are determined by its encoding prefix and sequence of s-chars or r-chars as defined by Table 12 where n is the number of encoded code units as described below.
Table 12: String literals [tab:lex.string.literal]
 🔗 Encoding Kind Type Associated Examples 🔗 prefix character 🔗 encoding 🔗 none ordinary string literal array of nconst char ordinary literal encoding "ordinary string"R"(ordinary raw string)" 🔗 L wide string literal array of nconst wchar_t wide literalencoding L"wide string"LR"w(wide raw string)w" 🔗 u8 UTF-8 string literal array of nconst char8_t UTF-8 u8"UTF-8 string"u8R"x(UTF-8 raw string)x" 🔗 u UTF-16 string literal array of nconst char16_t UTF-16 u"UTF-16 string"uR"y(UTF-16 raw string)y" 🔗 U UTF-32 string literal array of nconst char32_t UTF-32 U"UTF-32 string"UR"z(UTF-32 raw string)z"
A string-literal that has an R in the prefix is a raw string literal.
The d-char-sequence serves as a delimiter.
The terminating d-char-sequence of a raw-string is the same sequence of characters as the initial d-char-sequence.
A d-char-sequence shall consist of at most 16 characters.
[Note 1:
The characters '(' and ')' can appear in a raw-string.
Thus, R"delimiter((a|b))delimiter" is equivalent to "(a|b)".
— end note]
[Note 2:
A source-file new-line in a raw string literal results in a new-line in the resulting execution string literal.
Assuming no whitespace at the beginning of lines in the following example, the assert will succeed: const char* p = R"(a\ b c)"; assert(std::strcmp(p, "a\\\nb\nc") == 0);
— end note]
[Example 1:
The raw string R"a( )\ a" )a" is equivalent to "\n)\\\na\"\n".
The raw string R"(x = "\"y\"")" is equivalent to "x = \"\\\"y\\\"\"".
— end example]
Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals.
The common encoding-prefix for a sequence of adjacent string-literals is determined pairwise as follows: If two string-literals have the same encoding-prefix, the common encoding-prefix is that encoding-prefix.
If one string-literal has no encoding-prefix, the common encoding-prefix is that of the other string-literal.
Any other combinations are ill-formed.
[Note 3:
A string-literal's rawness has no effect on the determination of the common encoding-prefix.
— end note]
In translation phase 6 ([lex.phases]), adjacent string-literals are concatenated.
The lexical structure and grouping of the contents of the individual string-literals is retained.
[Example 2:
"\xA" "B" represents the code unit '\xA' and the character 'B' after concatenation (and not the single code unit '\xAB').
Similarly, R"(\u00)" "41" represents six characters, starting with a backslash and ending with the digit 1 (and not the single character 'A' specified by a universal-character-name).
Table 13 has some examples of valid concatenations.
— end example]
Table 13: String literal concatenations [tab:lex.string.concat]
 🔗 Source Means Source Means Source Means 🔗 u"a" u"b" u"ab" U"a" U"b" U"ab" L"a" L"b" L"ab" 🔗 u"a" "b" u"ab" U"a" "b" U"ab" L"a" "b" L"ab" 🔗 "a" u"b" u"ab" "a" U"b" U"ab" "a" L"b" L"ab"
Evaluating a string-literal results in a string literal object with static storage duration ([basic.stc]).
[Note 4:
String literal objects are potentially non-unique ([intro.object]).
Whether successive evaluations of a string-literal yield the same or a different object is unspecified.
— end note]
[Note 5:
The effect of attempting to modify a string literal object is undefined.
— end note]
String literal objects are initialized with the sequence of code unit values corresponding to the string-literal's sequence of s-chars (originally from non-raw string literals) and r-chars (originally from raw string literals), plus a terminating U+0000 null character, in order as follows:
• The sequence of characters denoted by each contiguous sequence of basic-s-chars, r-chars, simple-escape-sequences ([lex.ccon]), and universal-character-names ([lex.charset]) is encoded to a code unit sequence using the string-literal's associated character encoding.
If a character lacks representation in the associated character encoding, then the program is ill-formed.
[Note 6:
No character lacks representation in any Unicode encoding form.
— end note]
When encoding a stateful character encoding, implementations should encode the first such sequence beginning with the initial encoding state and encode subsequent sequences beginning with the final encoding state of the prior sequence.
[Note 7:
The encoded code unit sequence can differ from the sequence of code units that would be obtained by encoding each character independently.
— end note]
• Each numeric-escape-sequence ([lex.ccon]) contributes a single code unit with a value as follows:
When encoding a stateful character encoding, these sequences should have no effect on encoding state.
• Each conditional-escape-sequence ([lex.ccon]) contributes an implementation-defined code unit sequence.
When encoding a stateful character encoding, it is implementation-defined what effect these sequences have on encoding state.

5.13.6 Unevaluated strings [lex.string.uneval]

Each universal-character-name and each simple-escape-sequence in an unevaluated-string is replaced by the member of the translation character set it denotes.
An unevaluated-string is never evaluated and its interpretation depends on the context in which it appears.

5.13.7 Boolean literals [lex.bool]

boolean-literal:
false
true
The Boolean literals are the keywords false and true.
Such literals have type bool.

5.13.8 Pointer literals [lex.nullptr]

The pointer literal is the keyword nullptr.
It has type std​::​nullptr_t.
[Note 1:
std​::​nullptr_t is a distinct type that is neither a pointer type nor a pointer-to-member type; rather, a prvalue of this type is a null pointer constant and can be converted to a null pointer value or null member pointer value.
— end note]

5.13.9 User-defined literals [lex.ext]

If a token matches both user-defined-literal and another literal kind, it is treated as the latter.
[Example 1:
123_km is a user-defined-literal, but 12LL is an integer-literal.
— end example]
The syntactic non-terminal preceding the ud-suffix in a user-defined-literal is taken to be the longest sequence of characters that could match that non-terminal.
A user-defined-literal is treated as a call to a literal operator or literal operator template ([over.literal]).
To determine the form of this call for a given user-defined-literal L with ud-suffix X, first let S be the set of declarations found by unqualified lookup for the literal-operator-id whose literal suffix identifier is X ([basic.lookup.unqual]).
S shall not be empty.
If L is a user-defined-integer-literal, let n be the literal without its ud-suffix.
If S contains a literal operator with parameter type unsigned long long, the literal L is treated as a call of the form operator ""X(nULL)
Otherwise, S shall contain a raw literal operator or a numeric literal operator template ([over.literal]) but not both.
If S contains a raw literal operator, the literal L is treated as a call of the form operator ""X("n")
Otherwise (S contains a numeric literal operator template), L is treated as a call of the form operator ""X<'', '', ... ''>() where n is the source character sequence .
[Note 1:
The sequence can only contain characters from the basic character set.
— end note]
If L is a user-defined-floating-point-literal, let f be the literal without its ud-suffix.
If S contains a literal operator with parameter type long double, the literal L is treated as a call of the form operator ""X(fL)
Otherwise, S shall contain a raw literal operator or a numeric literal operator template ([over.literal]) but not both.
If S contains a raw literal operator, the literal L is treated as a call of the form operator ""X("f")
Otherwise (S contains a numeric literal operator template), L is treated as a call of the form operator ""X<'', '', ... ''>() where f is the source character sequence .
[Note 2:
The sequence can only contain characters from the basic character set.
— end note]
If L is a user-defined-string-literal, let str be the literal without its ud-suffix and let len be the number of code units in str (i.e., its length excluding the terminating null character).
If S contains a literal operator template with a non-type template parameter for which str is a well-formed template-argument, the literal L is treated as a call of the form operator ""X<str>()
Otherwise, the literal L is treated as a call of the form operator ""X(str, len)
If L is a user-defined-character-literal, let ch be the literal without its ud-suffix.
S shall contain a literal operator whose only parameter has the type of ch and the literal L is treated as a call of the form operator ""X(ch)
[Example 2: long double operator ""_w(long double); std::string operator ""_w(const char16_t*, std::size_t); unsigned operator ""_w(const char*); int main() { 1.2_w; // calls operator ""_w(1.2L) u"one"_w; // calls operator ""_w(u"one", 3) 12_w; // calls operator ""_w("12") "two"_w; // error: no applicable literal operator } — end example]
In translation phase 6 ([lex.phases]), adjacent string-literals are concatenated and user-defined-string-literals are considered string-literals for that purpose.
During concatenation, ud-suffixes are removed and ignored and the concatenation process occurs as described in [lex.string].
At the end of phase 6, if a string-literal is the result of a concatenation involving at least one user-defined-string-literal, all the participating user-defined-string-literals shall have the same ud-suffix and that suffix is applied to the result of the concatenation.
[Example 3: int main() { L"A" "B" "C"_x; // OK, same as L"ABC"_x "P"_x "Q" "R"_y; // error: two different ud-suffixes } — end example]