31 Regular expressions library [re]

31.13 Modified ECMAScript regular expression grammar [re.grammar]

The regular expression grammar recognized by basic_­regex objects constructed with the ECMAScript flag is that specified by ECMA-262, except as specified below.

Objects of type specialization of basic_­regex store within themselves a default-constructed instance of their traits template parameter, henceforth referred to as traits_­inst. This traits_­inst object is used to support localization of the regular expression; basic_­regex member functions shall not call any locale dependent C or C++ API, including the formatted string input functions. Instead they shall call the appropriate traits member function to achieve the required effect.

The following productions within the ECMAScript grammar are modified as follows:

ClassAtom ::
  -
  ClassAtomNoDash
  ClassAtomExClass
  ClassAtomCollatingElement
  ClassAtomEquivalence

IdentityEscape ::
  SourceCharacter but not c

The following new productions are then added:

ClassAtomExClass ::
  [: ClassName :]

ClassAtomCollatingElement ::
  [. ClassName .]

ClassAtomEquivalence ::
  [= ClassName =]

ClassName ::
  ClassNameCharacter
  ClassNameCharacter ClassName

ClassNameCharacter ::
  SourceCharacter but not one of "." "=" ":"

The productions ClassAtomExClass, ClassAtomCollatingElement and ClassAtomEquivalence provide functionality equivalent to that of the same features in regular expressions in POSIX.

The regular expression grammar may be modified by any regex_­constants​::​syntax_­option_­type flags specified when constructing an object of type specialization of basic_­regex according to the rules in Table 130.

A ClassName production, when used in ClassAtomExClass, is not valid if traits_­inst.lookup_­classname returns zero for that name. The names recognized as valid ClassNames are determined by the type of the traits class, but at least the following names shall be recognized: alnum, alpha, blank, cntrl, digit, graph, lower, print, punct, space, upper, xdigit, d, s, w. In addition the following expressions shall be equivalent:

\d and [[:digit:]]

\D and [^[:digit:]]

\s and [[:space:]]

\S and [^[:space:]]

\w and [_[:alnum:]]

\W and [^_[:alnum:]]

A ClassName production when used in a ClassAtomCollatingElement production is not valid if the value returned by traits_­inst.lookup_­collatename for that name is an empty string.

The results from multiple calls to traits_­inst.lookup_­classname can be bitwise OR'ed together and subsequently passed to traits_­inst.isctype.

A ClassName production when used in a ClassAtomEquivalence production is not valid if the value returned by traits_­inst.lookup_­collatename for that name is an empty string or if the value returned by traits_­inst​.transform_­primary for the result of the call to traits_­inst.lookup_­collatename is an empty string.

When the sequence of characters being transformed to a finite state machine contains an invalid class name the translator shall throw an exception object of type regex_­error.

If the CV of a UnicodeEscapeSequence is greater than the largest value that can be held in an object of type charT the translator shall throw an exception object of type regex_­error. [Note: This means that values of the form "uxxxx" that do not fit in a character are invalid. end note]

Where the regular expression grammar requires the conversion of a sequence of characters to an integral value, this is accomplished by calling traits_­inst.value.

The behavior of the internal finite state machine representation when used to match a sequence of characters is as described in ECMA-262. The behavior is modified according to any match_­flag_­type flags specified when using the regular expression object in one of the regular expression algorithms. The behavior is also localized by interaction with the traits class template parameter as follows: