302. Need error indication from codecvt<>::do_length

Section: 28.3.4.2.6 [locale.codecvt.byname] Status: NAD Submitter: Gregory Bumgardner Opened: 2001-01-25 Last modified: 2016-01-28

Priority: Not Prioritized

View all other issues in [locale.codecvt.byname].

View all issues with NAD status.

Discussion:

The effects of codecvt<>::do_length() are described in 22.2.1.5.2, paragraph 10. As implied by that paragraph, and clarified in issue 75, codecvt<>::do_length() must process the source data and update the stateT argument just as if the data had been processed by codecvt<>::in(). However, the standard does not specify how do_length() would report a translation failure, should the source sequence contain untranslatable or illegal character sequences.

The other conversion methods return an "error" result value to indicate that an untranslatable character has been encountered, but do_length() already has a return value (the number of source characters that have been processed by the method).

Proposed resolution:

This issue cannot be resolved without modifying the interface. An exception cannot be used, as there would be no way to determine how many characters have been processed and the state object would be left in an indeterminate state.

A source compatible solution involves adding a fifth argument to length() and do_length() that could be used to return position of the offending character sequence. This argument would have a default value that would allow it to be ignored:

  int length(stateT& state, 
             const externT* from, 
             const externT* from_end, 
             size_t max,
             const externT** from_next = 0);

  virtual
  int do_length(stateT& state, 
                const externT* from, 
                const externT* from_end, 
                size_t max,
                const externT** from_next);

Then an exception could be used to report any translation errors and the from_next argument, if used, could then be used to retrieve the location of the offending character sequence.

Rationale:

The standard is already clear: the return value is the number of "valid complete characters". If it encounters an invalid sequence of external characters, it stops.