3603. Matching of null characters by regular expressions is underspecified

Section: 32.7.2 [re.regex.construct], 32.10 [re.alg] Status: New Submitter: Jonathan Wakely Opened: 2021-09-27 Last modified: 2021-10-14 11:34:59 UTC

Priority: 3

View other active issues in [re.regex.construct].

View all other issues in [re.regex.construct].

View all issues with New status.

Discussion:

ECMAScript says that \0 is an ordinary character and can be matched. POSIX says the opposite:

"The interfaces specified in POSIX.1-2017 do not permit the inclusion of a NUL character in an RE or in the string to be matched. If during the operation of a standard utility a NUL is included in the text designated to be matched, that NUL may designate the end of the text string for the purposes of matching."

So does that mean std::regex{"", 1, regex::basic} should throw an exception?

And std::regex_match(string{"a\0b", 3}, regex{"a.b", regex::basic}) should fail?

The POSIX rule is because those interfaces are specified with NTBS arguments, so there's no way to distinguish "a\0b" and "a". The C++ interfaces could allow it, but we never specify any divergence from POSIX, so presumably the rule still applies. Is that what was intended and is it what we want?

[2021-10-14; Reflector poll]

Set priority to 3 after reflector poll.

Proposed resolution: