459. Requirement for widening in stage 2 is overspecification

Section: 28.3.4.3.2.3 [facet.num.get.virtuals] Status: NAD Submitter: Martin Sebor Opened: 2004-03-16 Last modified: 2016-01-28

Priority: Not Prioritized

View other active issues in [facet.num.get.virtuals].

View all other issues in [facet.num.get.virtuals].

View all issues with NAD status.

Discussion:

When parsing strings of wide-character digits, the standard requires the library to widen narrow-character "atoms" and compare the widened atoms against the characters that are being parsed. Simply narrowing the wide characters would be far simpler, and probably more efficient. The two choices are equivalent except in convoluted test cases, and many implementations already ignore the standard and use narrow instead of widen.

First, I disagree that using narrow() instead of widen() would necessarily have unfortunate performance implications. A possible implementation of narrow() that allows num_get to be implemented in a much simpler and arguably comparably efficient way as calling widen() allows, i.e. without making a virtual call to do_narrow every time, is as follows:

  inline char ctype<wchar_t>::narrow (wchar_t wc, char dflt) const
  {
      const unsigned wi = unsigned (wc);

      if (wi > UCHAR_MAX)
          return typeid (*this) == typeid (ctype<wchar_t>) ?
                 dflt : do_narrow (wc, dflt);

      if (narrow_ [wi] < 0) {
         const char nc = do_narrow (wc, dflt);
         if (nc == dflt)
             return dflt;
         narrow_ [wi] = nc;
      }

      return char (narrow_ [wi]);
  }

Second, I don't think the change proposed in the issue (i.e., to use narrow() instead of widen() during Stage 2) would be at all drastic. Existing implementations with the exception of libstdc++ currently already use narrow() so the impact of the change on programs would presumably be isolated to just a single implementation. Further, since narrow() is not required to translate alternate wide digit representations such as those mentioned in issue 303 to their narrow equivalents (i.e., the portable source characters '0' through '9'), the change does not necessarily imply that these alternate digits would be treated as ordinary digits and accepted as part of numbers during parsing. In fact, the requirement in 28.3.4.2.2.3 [locale.ctype.virtuals], p13 forbids narrow() to translate an alternate digit character, wc, to an ordinary digit in the basic source character set unless the expression (ctype<charT>::is(ctype_base::digit, wc) == true) holds. This in turn is prohibited by the C standard (7.25.2.1.5, 7.25.2.1.5, and 5.2.1, respectively) for charT of either char or wchar_t.

[Sydney: To a large extent this is a nonproblem. As long as you're only trafficking in char and wchar_t we're only dealing with a stable character set, so you don't really need either 'widen' or 'narrow': can just use literals. Finally, it's not even clear whether widen-vs-narrow is the right question; arguably we should be using codecvt instead.]

[ 2009-07 Frankfurt ]

NAD. The standard is clear enough as written.

Proposed resolution:

Change stage 2 so that implementations are permitted to use either technique to perform the comparison:

  1. call widen on the atoms and compare (either by using operator== or char_traits<charT>::eq) the input with the widened atoms, or
  2. call narrow on the input and compare the narrow input with the atoms
  3. do (1) or (2) only if charT is not char or wchar_t, respectively; i.e., avoid calling widen or narrow if it the source and destination types are the same