Section: 17.9.3 [exception] Status: SG16 Submitter: Victor Zverovich Opened: 2024-04-28 Last modified: 2024-05-08
Priority: 3
View all other issues in [exception].
View all issues with SG16 status.
Discussion:
The null-terminated multibyte string returned by the what
method of std::exception
and its subclasses in the standard has an unspecified encoding. The closest thing in the specification
is the "suitable for conversion and display as a wstring
" part in Remarks
(17.9.3 [exception] p6) but it is too vague to be useful because anything can be converted to
wstring
in one way or another:
virtual const char* what() const noexcept;Returns: An implementation-defined ntbs.
Remarks: The message may be a null-terminated multibyte string (16.3.3.3.4.3 [multibyte.strings]), suitable for conversion and display as awstring
(27.4 [string.classes], 28.3.4.2.5 [locale.codecvt]). The return value remains valid until the exception object from which it is obtained is destroyed or a non-const
member function of the exception object is called.
As a result, it is impossible to portably use the exception message, e.g. print it. Since exception messages are commonly combined with string literals and are often constructed from string literals, at the very least the standard should say that the message is compatible with them, i.e. that it is in the ordinary literal encoding or its subset.
To give a specific example of this problem, consider the following code compiled on Windows with Microsoft Visual C++, the ordinary literal encoding of UTF-8 and the system locale set to Belarusian (the language of the text in this example):std::uintmax_t size = 0; try { size = std::filesystem::file_size(L"Шчучыншчына"); } catch (const std::exception& e) { std::print("Памылка: {}", e.what()); }
Since both std::filesystem::path
and std::print
support Unicode one would expect this
to work and, when run, print a readable error message if the file "Шчучыншчына" doesn't exist. However,
the output will be corrupted instead. The reason for the corruption is that filesystem_error
requires including the path in the message but doesn't say that it should be transcoded
(31.12.7.2 [fs.filesystem.error.members] p7):
virtual const char* what() const noexcept;Returns: An ntbs that incorporates the
what_arg
argument supplied to the constructor. The exact format is unspecified. Implementations should include thesystem_error::what()
string and the pathnames ofpath1
andpath2
in the native format in the returned string.
Therefore, the message will contain literal text in the ordinary literal encoding (UTF-8) combined with a
path, most likely in the operating system dependent current encoding for pathnames which in this case is CP1251.
So different parts of the output will be in two incompatible encodings and therefore unusable with
std::print
or any other facility.
print
since Microsoft STL doesn't implement std::print
yet.
Replacing std::print
with another output facility produces a different but equally unusable form
of mojibake.
[2024-05-04; Daniel comments]
The proposed wording is incomplete. There are about 12 other what
specifications in the Standard
Library with exactly the same specification as exception::what
that would either need to get the
same treatment or we would need general wording somewhere that says that the specification "contract" of
exception::what
extends to all of its derived classes. A third choice could be that we introduce
a new definition such as an lntbs (or maybe "literal
ntbs") that is essentially an
ntbs in the ordinary literal encoding.
[2024-05-08; Reflector poll]
Set priority to 3 after reflector poll. Send to SG16.
Proposed resolution:
This wording is relative to N4981.
Modify 17.9.3 [exception] as indicated:
virtual const char* what() const noexcept;Returns: An implementation-defined ntbs in the ordinary literal encoding.
Remarks: The message may be a null-terminated multibyte string (16.3.3.3.4.3 [multibyte.strings]), suitable for conversion and display as awstring
(27.4 [string.classes], 28.3.4.2.5 [locale.codecvt]). The return value remains valid until the exception object from which it is obtained is destroyed or a non-const
member function of the exception object is called.