regex_match
ambiguitySection: 28.6.10.2 [re.alg.match] Status: C++17 Submitter: Howard Hinnant Opened: 2013-07-14 Last modified: 2017-07-30
Priority: 2
View all other issues in [re.alg.match].
View all issues with C++17 status.
Discussion:
28.6.10.2 [re.alg.match] p2 in describing regex_match says:
-2- Effects: Determines whether there is a match between the regular expression
e
, and all of the character sequence[first,last)
. The parameterflags
is used to control how the expression is matched against the character sequence. Returns true if such a match exists, false otherwise.
It has come to my attention that different people are interpreting the first sentence of p2 in different ways:
If a search of the input string using the regular expression e
matches the entire input string,
regex_match
should return true.
Search the input string using the regular expression e
. Reject all matches that do not match the
entire input string. If a such a match is found, return true.
The difference between these two subtly different interpretations is found using the following ECMAScript example:
std::regex re("Get|GetValue");
Using regex_search
, this re
can never match the input string "GetValue"
, because ECMA
specifies that alternations are ordered, not greedy. As soon as "Get"
is matched in the left alternation,
the matching algorithm stops.
regex_match
would return false for an input string of "GetValue"
.
However definition 2 alters the grammar and appears equivalent to augmenting the regex with a trailing '$'
,
which is an anchor that specifies, reject any matches which do not come at the end of the input sequence.
So, using definition 2, regex_match
would return true for an input string of "GetValue"
.
My opinion is that it would be strange to have regex_match
return true for a string/regex
pair that regex_search
could never find. I.e. I favor definition 1.
John Maddock writes:
The intention was always that regex_match
would reject any match candidate which didn't match the entire
input string. So it would find GetValue
in this case because the "Get"
alternative had already
been rejected as not matching. Note that the comparison with ECMA script is somewhat moot, as ECMAScript defines
the regex grammar (the bit we've imported), it does not define anything like regex_match
, nor do we import
from ECMAScript the behaviour of that function. So IMO the function should behave consistently regardless of the
regex dialect chosen. Saying "use awk regexes" doesn't cut it, because that changes the grammar in other ways.
(John favors definition 2).
We need to clarify 28.6.10.2 [re.alg.match]/p2 in one of these two directions.
[2014-06-21, Rapperswil]
AM: I think there's a clear direction and consensus we agree with John Maddock's position, and if noone else thinks we need the other function I won't ask for it.
Marshall Clow and STL to draft.[2015-06-10, Marshall suggests concrete wording]
[2015-01-11, Telecon]
Move to Tenatatively Ready
Proposed resolution:
This wording is relative to N4527.
Change 28.6.10.2 [re.alg.match]/2, as follows:
template <class BidirectionalIterator, class Allocator, class charT, class traits> bool regex_match(BidirectionalIterator first, BidirectionalIterator last, match_results<BidirectionalIterator, Allocator>& m, const basic_regex<charT, traits>& e, regex_constants::match_flag_type flags = regex_constants::match_default);-1- Requires: The type
-2- Effects: Determines whether there is a match between the regular expressionBidirectionalIterator
shall satisfy the requirements of a Bidirectional Iterator (24.2.6).e
, and all of the character sequence[first,last)
. The parameterflags
is used to control how the expression is matched against the character sequence. When determining if there is a match, only potential matches that match the entire character sequence are considered. Returnstrue
if such a match exists,false
otherwise. [Example:std::regex re("Get|GetValue"); std::cmatch m; regex_search("GetValue", m, re); // returns true, and m[0] contains "Get" regex_match ("GetValue", m, re); // returns true, and m[0] contains "GetValue" regex_search("GetValues", m, re); // returns true, and m[0] contains "Get" regex_match ("GetValues", m, re); // returns false— end example]
[…]