2217. operator==(sub_match, string) slices on embedded '\0's

Section: 28.6.8.3 [re.submatch.op] Status: C++17 Submitter: Jeffrey Yasskin Opened: 2012-11-26 Last modified: 2017-07-30

Priority: 2

View all other issues in [re.submatch.op].

View all issues with C++17 status.

Discussion:

template <class BiIter, class ST, class SA>
  bool operator==(
    const basic_string<
      typename iterator_traits<BiIter>::value_type, ST, SA>& lhs,
    const sub_match<BiIter>& rhs);

is specified as:

Returns: rhs.compare(lhs.c_str()) == 0.

This is odd because sub_match::compare(basic_string) is defined to honor embedded '\0' characters. This could allow a sub_match to == or != a std::string unexpectedly.

[Daniel:]

This wording change was done intentionally as of LWG 1181, but the here mentioned slicing effect was not considered at that time. It seems best to use another overload of compare to fix this problem:

Returns: rhs.str().compare(0, rhs.length(), lhs.data(), lhs.size()) == 0.

or

Returns: rhs.compare(sub_match<BiIter>::string_type(lhs.data(), lhs.size())) == 0.

[2013-10-17: Daniel provides concrete wording]

The original wording was suggested to reduce the need to allocate memory during comparisons. The specification would be very much easier, if sub_match would provide an additional compare overload of the form:

int compare(const value_type* s, size_t n) const;

But given the fact that currently all of basic_string's compare overloads are defined in terms of temporary string constructions, the following proposed wording does follow the same string-construction route as basic_string does (where needed to fix the embedded zeros issue) and to hope that existing implementations ignore to interpret this semantics in the literal sense.

I decided to use the second replacement form

Returns: rhs.compare(sub_match<BiIter>::string_type(lhs.data(), lhs.size())) == 0.

because it already reflects the existing style used in 28.6.8.3 [re.submatch.op] p31.

[2014-02-15 post-Issaquah session : move to Tentatively Ready]

Proposed resolution:

This wording is relative to N3691.

  1. Change 28.6.8.3 [re.submatch.op] as indicated:

    template <class BiIter, class ST, class SA>
      bool operator==(
        const basic_string<
          typename iterator_traits<BiIter>::value_type, ST, SA>& lhs,
        const sub_match<BiIter>& rhs);
    

    -7- Returns: rhs.compare(lhs.c_str()typename sub_match<BiIter>::string_type(lhs.data(), lhs.size())) == 0.

    […]

    template <class BiIter, class ST, class SA>
      bool operator<(
        const basic_string<
          typename iterator_traits<BiIter>::value_type, ST, SA>& lhs,
        const sub_match<BiIter>& rhs);
    

    -9- Returns: rhs.compare(lhs.c_str()typename sub_match<BiIter>::string_type(lhs.data(), lhs.size())) > 0.

    […]

    template <class BiIter, class ST, class SA>
      bool operator==(const sub_match<BiIter>& lhs,
                      const basic_string<
                        typename iterator_traits<BiIter>::value_type, ST, SA>& rhs);
    

    -13- Returns: lhs.compare(rhs.c_str()typename sub_match<BiIter>::string_type(rhs.data(), rhs.size())) == 0.

    […]

    template <class BiIter, class ST, class SA>
      bool operator<(const sub_match<BiIter>& lhs,
                     const basic_string<
                       typename iterator_traits<BiIter>::value_type, ST, SA>& rhs);
    

    -15- Returns: lhs.compare(rhs.c_str()typename sub_match<BiIter>::string_type(rhs.data(), rhs.size())) < 0.