identifier:
identifier-nondigit
identifier identifier-nondigit
identifier digit
identifier-nondigit:
nondigit
universal-character-name
nondigit: one of
a b c d e f g h i j k l m
n o p q r s t u v w x y z
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z _
digit: one of
0 1 2 3 4 5 6 7 8 9
An identifier is an arbitrarily long sequence of letters and digits
. Each
universal-character-name in an identifier shall designate a
character whose encoding in ISO/IEC 10646 falls into one of the ranges
specified in Table
2. The initial element shall not be a
universal-character-name
designating a character whose encoding falls into one of the ranges
specified in Table
3. Upper- and lower-case letters are
different
. All characters are significant
.Table
2: Ranges of characters allowed [tab:lex.name.allowed]
00A8 | 00AA | 00AD | 00AF | 00B2-00B5 |
00B7-00BA | 00BC-00BE | 00C0-00D6 | 00D8-00F6 | 00F8-00FF |
0100-167F | 1681-180D | 180F-1FFF | | |
200B-200D | 202A-202E | 203F-2040 | 2054 | 2060-206F |
2070-218F | 2460-24FF | 2776-2793 | 2C00-2DFF | 2E80-2FFF |
3004-3007 | 3021-302F | 3031-D7FF | | |
F900-FD3D | FD40-FDCF | FDF0-FE44 | FE47-FFFD | |
10000-1FFFD | 20000-2FFFD | 30000-3FFFD | 40000-4FFFD | 50000-5FFFD |
60000-6FFFD | 70000-7FFFD | 80000-8FFFD | 90000-9FFFD | A0000-AFFFD |
B0000-BFFFD | C0000-CFFFD | D0000-DFFFD | E0000-EFFFD | |
Table
3: Ranges of characters disallowed initially (combining characters) [tab:lex.name.disallowed]
0300-036F | 1DC0-1DFF | 20D0-20FF | FE20-FE2F |
The identifiers in Table
4 have a special meaning when
appearing in a certain context
. When referred to in the grammar, these identifiers
are used explicitly rather than using the
identifier grammar production
. Unless otherwise specified, any ambiguity as to whether a given
identifier has a special meaning is resolved to interpret the
token as a regular
identifier.Table
4: Identifiers with special meaning [tab:lex.name.special]
In addition, some identifiers are reserved for use by C++
implementations and shall
not be used otherwise; no diagnostic is required
. Each identifier that contains a double underscore
__
or begins with an underscore followed by
an uppercase letter
is reserved to the implementation for any use
.Each identifier that begins with an underscore is
reserved to the implementation for use as a name in the global namespace
.
integer-literal:
binary-literal integer-suffixopt
octal-literal integer-suffixopt
decimal-literal integer-suffixopt
hexadecimal-literal integer-suffixopt
binary-literal:
0b binary-digit
0B binary-digit
binary-literal 'opt binary-digit
octal-literal:
0
octal-literal 'opt octal-digit
decimal-literal:
nonzero-digit
decimal-literal 'opt digit
hexadecimal-literal:
hexadecimal-prefix hexadecimal-digit-sequence
binary-digit: one of
0 1
octal-digit: one of
0 1 2 3 4 5 6 7
nonzero-digit: one of
1 2 3 4 5 6 7 8 9
hexadecimal-prefix: one of
0x 0X
hexadecimal-digit-sequence:
hexadecimal-digit
hexadecimal-digit-sequence 'opt hexadecimal-digit
hexadecimal-digit: one of
0 1 2 3 4 5 6 7 8 9
a b c d e f
A B C D E F
integer-suffix:
unsigned-suffix long-suffixopt
unsigned-suffix long-long-suffixopt
long-suffix unsigned-suffixopt
long-long-suffix unsigned-suffixopt
unsigned-suffix: one of
u U
long-suffix: one of
l L
long-long-suffix: one of
ll LL
[
Note: The prefix and any optional separating single quotes are ignored
when determining the value
. —
end note ]
Table
7: Base of
integer-literals [tab:lex.icon.base]
Kind of integer-literal | base N |
binary-literal | 2 |
octal-literal | 8 |
decimal-literal | 10 |
hexadecimal-literal | 16 |
The
hexadecimal-digits
a through
f and
A through
F
have decimal values ten through fifteen
. [
Example: The number twelve can be written
12,
014,
0XC, or
0b1100. The
integer-literals
1048576,
1'048'576,
0X100000,
0x10'0000, and
0'004'000'000 all have the same value
. —
end example ]
The type of an
integer-literal is
the first type in the list in Table
8
corresponding to its optional
integer-suffix
in which its value can be represented
. Table
8: Types of
integer-literals [tab:lex.icon.type]
integer-suffix | decimal-literal | integer-literal other than decimal-literal |
none | int | int |
| long int | unsigned int |
| long long int | long int |
| | unsigned long int |
| | long long int |
| | unsigned long long int |
u or U | unsigned int | unsigned int |
| unsigned long int | unsigned long int |
| unsigned long long int | unsigned long long int |
l or L | long int | long int |
| long long int | unsigned long int |
| | long long int |
| | unsigned long long int |
Both u or U | unsigned long int | unsigned long int |
and l or L | unsigned long long int | unsigned long long int |
ll or LL | long long int | long long int |
| | unsigned long long int |
Both u or U | unsigned long long int | unsigned long long int |
and ll or LL | | |
If an
integer-literal
cannot be represented by any type in its list and
an extended integer type (
[basic.fundamental]) can represent its value,
it may have that extended integer type
. If all of the types in the list for the
integer-literal
are signed,
the extended integer type shall be signed
. If all of the types in the list for the
integer-literal
are unsigned,
the extended integer type shall be unsigned
. If the list contains both signed and unsigned types,
the extended integer type may be signed or unsigned
. A program is ill-formed
if one of its translation units contains an
integer-literal
that cannot be represented by any of the allowed types
.character-literal:
encoding-prefixopt ' c-char-sequence '
encoding-prefix: one of
u8 u U L
c-char-sequence:
c-char
c-char-sequence c-char
c-char:
any member of the basic source character set except the single-quote ', backslash \, or new-line character
escape-sequence
universal-character-name
escape-sequence:
simple-escape-sequence
octal-escape-sequence
hexadecimal-escape-sequence
simple-escape-sequence: one of
\' \" \? \\
\a \b \f \n \r \t \v
octal-escape-sequence:
\ octal-digit
\ octal-digit octal-digit
\ octal-digit octal-digit octal-digit
hexadecimal-escape-sequence:
\x hexadecimal-digit
hexadecimal-escape-sequence hexadecimal-digit
An ordinary character literal that contains a
single
c-char representable in the execution character
set has type
char, with value equal to the
numerical value of the encoding of the
c-char in the
execution character set
. A multicharacter literal, or an ordinary character literal containing
a single
c-char not representable in the execution
character set, is conditionally-supported, has type
int,
and has an
implementation-defined value
. The value of a UTF-8 character literal
is equal to its ISO/IEC 10646 code point value,
provided that the code point value
can be encoded as a single UTF-8 code unit
. [
Note: That is, provided the code point value is in the range
[0,7F] (hexadecimal)
. —
end note ]
If the value is not representable with a single UTF-8 code unit,
the program is ill-formed
. A UTF-8 character literal containing multiple
c-chars is ill-formed
. The value of a UTF-16 character literal
is equal to its ISO/IEC 10646 code point value,
provided that the code point value is
representable with a single 16-bit code unit
. [
Note: That is, provided the code point value is in the range
[0,FFFF] (hexadecimal)
. —
end note ]
If the value is not representable
with a single 16-bit code unit, the program is ill-formed
. A UTF-16 character literal
containing multiple
c-chars is ill-formed
. The value of a
UTF-32 character literal containing a single
c-char is equal
to its ISO/IEC 10646 code point value
. A UTF-32 character literal containing
multiple
c-chars is ill-formed
. A wide-character literal has type
wchar_t.
The value of a wide-character literal containing a single
c-char has value equal to the numerical value of the encoding
of the
c-char in the execution wide-character set, unless the
c-char has no representation in the execution wide-character set, in which
case the value is
implementation-defined
. [
Note: The type
wchar_t is able to
represent all members of the execution wide-character set (see
[basic.fundamental])
. —
end note ]
The value
of a wide-character literal containing multiple
c-chars is
implementation-defined
. Certain non-graphic characters, the single quote
', the double quote
",
the question mark
?,
and the backslash
\, can be represented according to
Table
9. The double quote
" and the question mark
?, can be
represented as themselves or by the escape sequences
\" and
\? respectively, but
the single quote
' and the backslash
\
shall be represented by the escape sequences
\' and
\\ respectively
. Escape sequences in
which the character following the backslash is not listed in
Table
9 are conditionally-supported, with
implementation-defined semantics
. An escape sequence specifies a single
character
.Table
9: Escape sequences [tab:lex.ccon.esc]
new-line | NL(LF) | \n |
horizontal tab | HT | \t |
vertical tab | VT | \v |
backspace | BS | \b |
carriage return | CR | \r |
form feed | FF | \f |
alert | BEL | \a |
backslash | \ | \\ |
question mark | ? | \? |
single quote | ' | \' |
double quote | " | \" |
octal number | ooo | \ooo |
hex number | hhh | \xhhh |
The escape
\ooo consists of the backslash followed by one,
two, or three octal digits that are taken to specify the value of the
desired character
. The escape
\xhhh
consists of the backslash followed by
x followed by one or more
hexadecimal digits that are taken to specify the value of the desired
character
. There is no limit to the number of digits in a hexadecimal
sequence
. A sequence of octal or hexadecimal digits is terminated by the
first character that is not an octal digit or a hexadecimal digit,
respectively
. [
Note: If the value of a
character-literal prefixed by
u,
u8, or
U
is outside the range defined for its type,
the program is ill-formed
. —
end note ]
A
universal-character-name is translated to the encoding, in the appropriate
execution character set, of the character named
. [
Note: In translation phase 1, a
universal-character-name is introduced whenever an
actual extended
character is encountered in the source text
. However,
the actual compiler implementation may use its own native character set,
so long as the same results are obtained
. —
end note ]
string-literal:
encoding-prefixopt " s-char-sequenceopt "
encoding-prefixopt R raw-string
s-char-sequence:
s-char
s-char-sequence s-char
s-char:
any member of the basic source character set except the double-quote ", backslash \, or new-line character
escape-sequence
universal-character-name
raw-string:
" d-char-sequenceopt ( r-char-sequenceopt ) d-char-sequenceopt "
r-char-sequence:
r-char
r-char-sequence r-char
r-char:
any member of the source character set, except a right parenthesis ) followed by
the initial d-char-sequence (which may be empty) followed by a double quote ".
d-char-sequence:
d-char
d-char-sequence d-char
d-char:
any member of the basic source character set except:
space, the left parenthesis (, the right parenthesis ), the backslash \, and the control characters
representing horizontal tab, vertical tab, form feed, and newline.
[
Note: The characters
'(' and
')' are permitted in a
raw-string. Thus,
R"delimiter((a|b))delimiter" is equivalent to
"(a|b)". —
end note ]
[
Note: A source-file new-line in a raw string literal results in a new-line in the
resulting execution string literal
. Assuming no
whitespace at the beginning of lines in the following example, the assert will succeed:
const char* p = R"(a\
b
c)";
assert(std::strcmp(p, "a\\\nb\nc") == 0);
—
end note ]
[
Example: The raw string
R"a(
)\
a"
)a"
is equivalent to
"\n)\\\na\"\n". The raw string
R"(x = "\"y\"")"
is equivalent to
"x = \"\\\"y\\\"\"". —
end example ]
An ordinary string literal
has type “array of
n const char”
where
n is the size of the string as defined below,
has static storage duration (
[basic.stc]), and
is initialized with the given characters
. A UTF-8 string literal
has type “array of
n const char8_t”,
where
n is the size of the string as defined below;
each successive element of the object representation (
[basic.types]) has
the value of the corresponding code unit of the UTF-8 encoding of the string
. Ordinary string literals and UTF-8 string literals are
also referred to as narrow string literals
. A UTF-16 string literal has
type “array of
n const char16_t”, where
n is the
size of the string as defined below;
each successive element of the array
has the value of the corresponding code unit of
the UTF-16 encoding of the string
. [
Note: A single
c-char may
produce more than one
char16_t character in the form of
surrogate pairs
. A surrogate pair is a representation for a single code point
as a sequence of two 16-bit code units
. —
end note ]
A UTF-32 string literal has
type “array of
n const char32_t”, where
n is the
size of the string as defined below;
each successive element of the array
has the value of the corresponding code unit of
the UTF-32 encoding of the string
. A wide string literal has type “array of
n const
wchar_t”, where
n is the size of the string as defined below; it
is initialized with the given characters
. If a UTF-8 string literal token is adjacent to a
wide string literal token, the program is ill-formed
. Any other concatenations are
conditionally-supported with
implementation-defined
behavior
. [
Note: This
concatenation is an interpretation, not a conversion
. Because the interpretation happens in translation phase 6 (after each character from a
string-literal has been translated into a value from the appropriate character set), a
string-literal's initial rawness has no effect on the interpretation or
well-formedness of the concatenation
. —
end note ]
Table
11 has some examples of valid concatenations
.Table
11: String literal concatenations [tab:lex.string.concat]
Source | Means | Source | Means | Source | Means |
u"a" | u"b" | u"ab" | U"a" | U"b" | U"ab" | L"a" | L"b" | L"ab" |
u"a" | "b" | u"ab" | U"a" | "b" | U"ab" | L"a" | "b" | L"ab" |
"a" | u"b" | u"ab" | "a" | U"b" | U"ab" | "a" | L"b" | L"ab" |
Characters in concatenated strings are kept distinct
.[
Example:
"\xA" "B"
contains the two characters
'\xA' and
'B'
after concatenation (and not the single hexadecimal character
'\xAB')
. —
end example ]
After any necessary concatenation, in translation phase
7 (
[lex.phases]),
'\0' is appended to every
string-literal so that programs that scan a string can find its end
. The
size of a
char32_t or wide string literal is the total number of
escape sequences,
universal-character-names, and other characters, plus
one for the terminating
U'\0' or
L'\0'. The size of a UTF-16 string
literal is the total number of escape sequences,
universal-character-names, and other characters, plus one for each
character requiring a surrogate pair, plus one for the terminating
u'\0'. [
Note: The size of a
char16_t
string literal is the number of code units, not the number of
characters
. —
end note ]
The size of a narrow string literal is
the total number of escape sequences and other characters, plus at least
one for the multibyte encoding of each
universal-character-name, plus
one for the terminating
'\0'. Evaluating a
string-literal results in a string literal object
with static storage duration, initialized from the given characters as
specified above
. Whether all
string-literals are distinct (that is, are stored in
nonoverlapping objects) and whether successive evaluations of a
string-literal yield the same or a different object is
unspecified
. boolean-literal:
false
true
The Boolean literals are the keywords
false and
true. Such literals are prvalues and have type
bool.pointer-literal:
nullptr
The pointer literal is the keyword
nullptr. It is a prvalue of type
std::nullptr_t. [
Note: std::nullptr_t is a distinct type that is neither a pointer type nor a pointer-to-member type;
rather, a prvalue of this type is a null pointer constant and can be
converted to a null pointer value or null member pointer value
. —
end note ]
user-defined-literal:
user-defined-integer-literal
user-defined-floating-point-literal
user-defined-string-literal
user-defined-character-literal
user-defined-integer-literal:
decimal-literal ud-suffix
octal-literal ud-suffix
hexadecimal-literal ud-suffix
binary-literal ud-suffix
user-defined-floating-point-literal:
fractional-constant exponent-partopt ud-suffix
digit-sequence exponent-part ud-suffix
hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part ud-suffix
hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part ud-suffix
user-defined-string-literal:
string-literal ud-suffix
user-defined-character-literal:
character-literal ud-suffix
ud-suffix:
identifier
The syntactic non-terminal preceding the
ud-suffix in a
user-defined-literal is taken to be the longest sequence of
characters that could match that non-terminal
. Let
S be the set of declarations found by
this lookup
. If
S contains a literal operator with
parameter type
unsigned long long, the literal
L is treated as a call of
the form
operator "" X(nULL)
Otherwise,
S shall contain a raw literal operator
or a numeric literal operator template (
[over.literal]) but not both
. If
S contains a raw literal operator,
the literal
L is treated as a call of the form
operator "" X("n")
Otherwise (
S contains a numeric literal operator template),
L is treated as a call of the form
operator "" X<'c1', 'c2', ... 'ck'>()
where
n is the source character sequence
c1c2...ck. [
Note: The sequence
c1c2...ck can only contain characters from the basic source character set
. —
end note ]
If
S contains a literal operator
with parameter type
long double, the literal
L is treated as a call of
the form
operator "" X(fL)
Otherwise,
S shall contain a raw literal operator
or a numeric literal operator template (
[over.literal]) but not both
. If
S contains a raw literal operator,
the
literal L is treated as a call of the form
operator "" X("f")
Otherwise (
S contains a numeric literal operator template),
L is treated as a call of the form
operator "" X<'c1', 'c2', ... 'ck'>()
where
f is the source character sequence
c1c2...ck. [
Note: The sequence
c1c2...ck can only contain characters from the basic source character set
. —
end note ]
If
L is a
user-defined-string-literal,
let
str be the literal without its
ud-suffix
and let
len be the number of code units in
str
(i.e., its length excluding the terminating null character)
. If
S contains a literal operator template with
a non-type template parameter for which
str is
a well-formed
template-argument,
the literal
L is treated as a call of the form
operator "" X<str>()
Otherwise, the literal
L is treated as a call of the form
operator "" X(str, len)
S shall contain a
literal operator whose only parameter has
the type of
ch and the
literal
L is treated as a call
of the form
operator "" X(ch)
[
Example:
long double operator "" _w(long double);
std::string operator "" _w(const char16_t*, std::size_t);
unsigned operator "" _w(const char*);
int main() {
1.2_w;
u"one"_w;
12_w;
"two"_w;
}
—
end example ]
During concatenation,
ud-suffixes are removed and ignored and
the concatenation process occurs as described in
[lex.string]. [
Example:
int main() {
L"A" "B" "C"_x;
"P"_x "Q" "R"_y;
}
—
end example ]