2854. wstring_convert provides no indication of incomplete input or output

Section: 99 [depr.conversions.string] Status: NAD Submitter: PowerGamer Opened: 2017-01-08 Last modified: 2017-06-05

Priority: 3

View other active issues in [depr.conversions.string].

View all other issues in [depr.conversions.string].

View all issues with NAD status.

Discussion:

Example:

// Input UTF-16 string is incomplete - only first half of
// UTF-16 surrogate pair L"\xD843\xDEF9":
wchar_t in_utf16[] = L"\xD843";

std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> cvt;
auto out_utf8 = cvt.to_bytes(in_utf16); // No error.

There is no indication that input was incomplete (the value returned by cvt.state() is not documented and so cannot be examined by user for that purpose). As such the user will not know that more input data should be provided in additional call to cvt.to_bytes().

The output can be incomplete too: MSVC2017 implementation (which as far as I can tell is standard conforming) produces "\xF0" in out_utf8. Again, no indication of incomplete output produced is provided by std::wstring_convert.

IMO it makes std::wstring_convert in its current state completely useless (it cannot be relied upon to either produce complete and valid UTF sequence or throw an error in all situations).

Imagine a file has UTF16 encoded text. You want to read all the data from a file at once and convert it into UTF8 using std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>>.

Now, if a file contains completely invalid UTF16 (for example, forbidden or incorrectly encoded Unicode code points) you will get an exception from std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>>.

But if a file contains incomplete (but in all other regards valid) UTF16 (for ex. file ends with only the first half of a valid surrogate pair) you will neither get an error exception from std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> nor any indication that the input provided to std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> was incomplete.

[2017-01-27 Telecon]

Priority 3; send to LEWG

[2017-02 in Kona, LEWG recommends NAD]

[2017-06-02 Issues Telecon]

This facility has a number of known problems, including poor error handling. The feature has been deprecated, and the plan is to replace it with better facilities with a better API.

Resolve as NAD

Proposed resolution: