Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set (introducing new-line characters for end-of-line indicators) if necessary.
The set of physical source file characters accepted is implementation-defined.
Any source file character not in the basic source character set is replaced by the universal-character-name that designates that character.
An implementation may use any internal encoding, so long as an actual extended character encountered in the source file, and the same extended character expressed in the source file as a universal-character-name (e.g., using the \uXXXX notation), are handled equivalently except where this replacement is reverted ([lex.pptoken]) in a raw string literal.
Each instance of a backslash character (\) immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines.
Only the last backslash on any physical source line shall be eligible for being part of such a splice.
Except for splices reverted in a raw string literal, if a splice results in a character sequence that matches the syntax of a universal-character-name, the behavior is undefined.
A source file that is not empty and that does not end in a new-line character, or that ends in a new-line character immediately preceded by a backslash character before any such splicing takes place, shall be processed as if an additional new-line character were appended to the file.
The source file is decomposed into preprocessing tokens ([lex.pptoken]) and sequences of whitespace characters (including comments).
Each comment is replaced by one space character.
New-line characters are retained.
Whether each nonempty sequence of whitespace characters other than new-line is retained or replaced by one space character is unspecified.
The process of dividing a source file's characters into preprocessing tokens is context-dependent.
Preprocessing directives are executed, macro invocations are expanded, and _Pragma unary operator expressions are executed.
If a character sequence that matches the syntax of a universal-character-name is produced by token concatenation, the behavior is undefined.
A #include preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4, recursively.
All preprocessing directives are then deleted.
Each basic source character set member in a character-literal or a string-literal, as well as each escape sequence and universal-character-name in a character-literal or a non-raw string literal, is converted to the corresponding member of the execution character set ([lex.ccon], [lex.string]); if there is no corresponding member, it is converted to an implementation-defined member other than the null (wide) character.11
Adjacent string literal tokens are concatenated.
White-space characters separating tokens are no longer significant.
The resulting tokens are syntactically and semantically analyzed and translated as a translation unit.
It is implementation-defined whether the sources for module units and header units on which the current translation unit has an interface dependency ([module.unit], [module.import]) are required to be available.
Source files, translation units and translated translation units need not necessarily be stored as files, nor need there be any one-to-one correspondence between these entities and any external representation.
The description is conceptual only, and does not specify any particular implementation.— end note]
Translated translation units and instantiation units are combined as follows: .
The definitions of the required templates are located.
It is implementation-defined whether the source of the translation units containing these definitions is required to be available.
The program is ill-formed if any instantiation fails.