std::reverse
should be permitted to be vectorizedSection: 26.7.10 [alg.reverse] Status: LEWG Submitter: Billy O'Neal III Opened: 2017-06-24 Last modified: 2018-04-03
Priority: Not Prioritized
View all other issues in [alg.reverse].
View all issues with LEWG status.
Discussion:
The fine folks on our backend team suggested that we special case std::reverse
of 1/2/4/8 to take
advantage of vector units. Unfortunately, at present std::reverse
says it does N/2 iter_swap
s,
which doesn't permit our vector implementation even if the iterator inputs are pointers to trivially copyable T
s.
short
s is
~8x faster on Skylake than the serial version,
and about 7x faster for unsigned long long
s; and users don't actually care whether or not we call swap
here.
[2017-07 Toronto Monday issue prioritization]
Status to LEWG; this is similar to 2973
[2018-04-02, Billy comments]
This issue should be resolved by P0551, because it prohibits user specialization of
std::swap
and std::iter_swap
, which means the proposed vectorization optimization for
pointers-to-trivially-copyable is now implementable without changes to reverse
's specification (We can detect
if the user has provided an alternate swap
in their own namespace, but not if they explicitly specialized
swap
or iter_swap
).
Proposed resolution:
This wording is relative to N4659.
Edit 26.7.10 [alg.reverse] as indicated:
template<class BidirectionalIterator> void reverse(BidirectionalIterator first, BidirectionalIterator last); template<class ExecutionPolicy, class BidirectionalIterator> void reverse(ExecutionPolicy&& exec, BidirectionalIterator first, BidirectionalIterator last);-1- Requires:
-2- Effects: For each non-negative integer*first
shall be swappable (16.4.4.3 [swappable.requirements]).i < (last - first) / 2
, appliesiter_swap
to all pairs of iteratorsfirst + i, (last - i) - 1
. Ifis_trivially_copyable_v<typename iterator_traits<BidirectionalIterator>::value_type>
istrue
, an implementation may permute the elements by making temporary copies, rather than by callingiter_swap
. [Note: this allows the implementation to be vectorized. — end note]