regex_iterator
and join_view
don't work together very wellSection: 28.6.11 [re.iter], 25.7.14 [range.join] Status: Resolved Submitter: Barry Revzin Opened: 2022-05-12 Last modified: 2023-03-23
Priority: 2
View all other issues in [re.iter].
View all issues with Resolved status.
Discussion:
Consider this example (from StackOverflow):
#include <ranges> #include <regex> #include <iostream> int main() { char const text[] = "Hello"; std::regex regex{"[a-z]"}; auto lower = std::ranges::subrange( std::cregex_iterator( std::ranges::begin(text), std::ranges::end(text), regex), std::cregex_iterator{} ) | std::views::join | std::views::transform([](auto const& sm) { return std::string_view(sm.first, sm.second); }); for (auto const& sv : lower) { std::cout << sv << '\n'; } }
This example seems sound, having lower
be a range of string_view
that should refer
back into text
, which is in scope for all this time. The std::regex
object is also
in scope for all this time.
transform_view
's iterator with heap-use-after-free.
The problem here is ultimately that regex_iterator
is a stashing iterator (it has a member
match_results
) yet advertises itself as a forward_iterator
(despite violating
24.3.5.5 [forward.iterators] p6 and 24.3.4.11 [iterator.concept.forward] p3.
Then, join_view
's iterator stores an outer iterator (the regex_iterator
) and an
inner_iterator
(an iterator into the container that the regex_iterator
stashes).
Copying that iterator effectively invalidates it — since the new iterator's inner iterator will
refer to the old iterator's outer iterator's container. These aren't (and can't be) independent copies.
In this particular example, join_view
's begin
iterator is copied into the
transform_view
's iterator, and then the original is destroyed (which owns the container that
the new inner iterator still points to), which causes us to have a dangling iterator.
Note that the example is well-formed in libc++ because libc++ moves instead of copying an iterator,
which happens to work. But I can produce other non-transform-view related examples that fail.
This is actually two different problems:
regex_iterator
is really an input iterator, not a forward iterator. It does not meet either
the C++17 or the C++20 forward iterator requirements.
join_view
can't handle stashing iterators, and would need to additionally store the outer
iterator in a non-propagating-cache for input ranges (similar to how it already potentially stores the
inner iterator in a non-propagating-cache).
(So potentially this could be two different LWG issues, but it seems nicer to think of them together.)
[2022-05-17; Reflector poll]
Set priority to 2 after reflector poll.
[Kona 2022-11-08; Move to Open]
Tim to write a paper
[2023-01-16; Tim comments]
The paper P2770R0 is provided with proposed wording.
[2023-03-22 Resolved by the adoption of P2770R0 in Issaquah. Status changed: Open → Resolved.]
Proposed resolution: