<regex>
ECMAScript IdentityEscape
is ambiguousSection: 28.6.12 [re.grammar] Status: C++17 Submitter: Billy O'Neal III Opened: 2016-01-13 Last modified: 2017-07-30
Priority: 2
View other active issues in [re.grammar].
View all other issues in [re.grammar].
View all issues with C++17 status.
Discussion:
Stephan and I are seeing differences in implementation for how non-special characters should be handled in the
IdentityEscape
part of the ECMAScript grammar. For example:
#include <stdio.h> #include <iostream> #ifdef USE_BOOST #include <boost/regex.hpp> using namespace boost; #else #include <regex> #endif using namespace std; int main() { try { const regex r("\\z"); cout << "Constructed \\z." << endl; if (regex_match("z", r)) cout << "Matches z" << endl; } catch (const regex_error& e) { cout << e.what() << endl; } }
libstdc++, boost, and browsers I tested with (Microsoft Edge, Google Chrome) all happily interpret \z
, which
otherwise has no meaning, as an identity character escape for the letter z
.
libc++ and msvc++ say that this is invalid, and throw regex_error
with error_escape
.
IdentityEscape :: SourceCharacter but not IdentifierPart IdentifierPart :: IdentifierStart UnicodeCombiningMark UnicodeDigit UnicodeConnectorPunctuation \ UnicodeEscapeSequence IdentifierStart :: UnicodeLetter $ _ \ UnicodeEscapeSequence
But this doesn't make any sense — it prohibits things like \$
which users absolutely need to be able to escape.
So let's look at ECMAScript 6. I believe this says much the same thing, but updates the spec to better handle Unicode by
referencing what the Unicode standard says is an identifier character:
IdentityEscape :: SyntaxCharacter / SourceCharacter but not UnicodeIDContinue UnicodeIDContinue :: any Unicode code point with the Unicode property "ID_Continue", "Other_ID_Continue", or "Other_ID_Start"
However, ECMAScript 6 has an appendix B defining "additional features for web browsers" which says:
IdentityEscape :: SourceCharacter but not c
which appears to agree with what libstdc++, boost, and browsers are doing.
What should be the correct behavior here?[2016-08, Chicago]
Monday PM: Move to tentatively ready
Proposed resolution:
This wording is relative to N4567.
Change 28.6.12 [re.grammar]/3 as indicated:
-3- The following productions within the ECMAScript grammar are modified as follows:
ClassAtom :: - ClassAtomNoDash ClassAtomExClass ClassAtomCollatingElement ClassAtomEquivalence IdentityEscape :: SourceCharacter but not c