2985. std::reverse should be permitted to be vectorized

Section: 26.7.10 [alg.reverse] Status: LEWG Submitter: Billy O'Neal III Opened: 2017-06-24 Last modified: 2018-04-03

Priority: Not Prioritized

View all other issues in [alg.reverse].

View all issues with LEWG status.

Discussion:

The fine folks on our backend team suggested that we special case std::reverse of 1/2/4/8 to take advantage of vector units. Unfortunately, at present std::reverse says it does N/2 iter_swaps, which doesn't permit our vector implementation even if the iterator inputs are pointers to trivially copyable Ts.

The vectorized version for pointers to shorts is ~8x faster on Skylake than the serial version, and about 7x faster for unsigned long longs; and users don't actually care whether or not we call swap here.

[2017-07 Toronto Monday issue prioritization]

Status to LEWG; this is similar to 2973

[2018-04-02, Billy comments]

This issue should be resolved by P0551, because it prohibits user specialization of std::swap and std::iter_swap, which means the proposed vectorization optimization for pointers-to-trivially-copyable is now implementable without changes to reverse's specification (We can detect if the user has provided an alternate swap in their own namespace, but not if they explicitly specialized swap or iter_swap).

Proposed resolution:

This wording is relative to N4659.

  1. Edit 26.7.10 [alg.reverse] as indicated:

    template<class BidirectionalIterator>
      void reverse(BidirectionalIterator first, BidirectionalIterator last);
    template<class ExecutionPolicy, class BidirectionalIterator>
      void reverse(ExecutionPolicy&& exec,
                   BidirectionalIterator first, BidirectionalIterator last);
    

    -1- Requires: *first shall be swappable (16.4.4.3 [swappable.requirements]).

    -2- Effects: For each non-negative integer i < (last - first) / 2, applies iter_swap to all pairs of iterators first + i, (last - i) - 1. If is_trivially_copyable_v<typename iterator_traits<BidirectionalIterator>::value_type> is true, an implementation may permute the elements by making temporary copies, rather than by calling iter_swap. [Note: this allows the implementation to be vectorized. — end note]