Issue 2932: Constraints on parallel algorithm implementations are underspecified

2932. Constraints on parallel algorithm implementations are underspecified

Section: 26.3.3 [algorithms.parallel.exec] Status: C++20 Submitter: Dietmar Kühl Opened: 2017-02-05 Last modified: 2021-02-25

Priority: Not Prioritized

View all other issues in [algorithms.parallel.exec].

View all issues with C++20 status.

Discussion:

Section 26.3.3 [algorithms.parallel.exec] specifies constraints a user of the parallel algorithms has to obey. Notably, it specifies in paragraph 3 that executions of element access functions are indeterminately sequenced with respect to each other. Correspondingly, it is the user's obligation to ensure that these calls do not introduce data races (this is also clarified in a note on this section).

Unfortunately, there is no constraint that, at least, mutating element access functions like operator++() on an iterator are called on different objects. An odd implementation of a parallel algorithm could increment a shared iterator from two threads without synchronisation of its own and the user would be obliged to make sure there is no data race!

For example:

template <typename FwdIt>
FwdIt adjacent_find(std::execution::parallel_policy, FwdIt it, FwdIt end) 
{
  if (2 <= std::distance(it, end)) {
    FwdIt tmp(it);
    auto f1 = std::async([&tmp](){ ++tmp; });
    auto f2 = std::async([&tmp](){ ++tmp; });
    f1.get();
    f2.get();
  }
  return std::adjancent_find(it, end);
}

This code is, obviously, a contrived example but with the current specification a legal implementation of adjacent_find(). The problem is that, e.g., for pointers there is a data race when incrementing tmp, i.e., the function can't be used on pointers. I don't think any of the containers makes a guarantee that their iterators can be incremented without synchronisation, i.e., the standard library doesn't have any iterators which could be used with this algorithm!

Of course, no sane implementation would do anything like that. However, they are allowed to be implemented as above, making it unnecessarily hard and probably inefficient to use the algorithms: in portable code any user of the parallel algorithms needs to deal with the possibility that mutating operations on the same object are called from different threads. There is a constraint missing for the parallel algorithm implementations which limits calls to, at least, some element access functions to be applied only to different objects if there is synchronisation done by the algorithm. At least, obviously mutating operations like the iterator increment/decrement operators need to be correspondingly constrained.

On the other hand, it seems reasonable that a shared data structure stores, e.g., a predicate used concurrently by all threads. This use is quite reasonable and there is nothing wrong. That is, demanding that all element access functions are called on different objects between different threads would possibly adversely over-constraining the algorithms. Only the element access functions deliberately changing the object need to be constrained.

Based on checking all algorithms in the standard library taking an ExecutionPolicy template parameter there are broadly four groups of template parameters:

Parameters with a known set of possible arguments: ExecutionPolicy (execution policies listed in 26.3.6 [execpol]).
Parameters specifying types of objects which are expected not to change: BinaryOperation, BinaryPredicate, Compare, Function, Predicate, UnaryFunction, UnaryOperation, and T (all but the last one are function objects although I couldn't locate concepts for some of them — that may be a separate issue).
Parameters of mutable types which are also meant to be mutated: InputIterator, OutputIterator, ForwardIterator, BidirectionalIterator, RandomAccessIterator, and Size (the concept for Size also seems to be unspecified).
Some algorithms use Generator which seems to be a mutable function object. However, I couldn't locate a concept for this parameter.

The problematic group is 3 and possibly 4: mutations on the objects are expected. It seem the general approach of disallowing calling non-const functions without synchronisation applies. Note, however, that prohibiting calling of any non-const function from the algorithms would put undue burden on the implementation of algorithms: any of the accessor functions may be non-const although the concept assume that the function would be const. The constraint should, thus, only apply to functions which may mutate the object according to their respective concept.

Suggested Resolution:

Add a statement prohibiting unsequenced calls to element access functions on the same object which are not applicable to const objects according to the corresponding concept. I'm not sure how to best specify the constraint in general, though.

Since the current algorithms use relatively few concepts there are fairly few operations actually affected. It may be reasonable at least for the initial version (and until we could refer to constructs in concepts in the language) to explicitly list the affected operations. I haven't done a full audit but iterator ++, --, @= (for @ being any of the operators which can be combined with an assignment), and assignments on all objects may be the set of affected element access functions whose use needs to be constrained.

Here is a concrete proposal for the change: In 26.3.2 [algorithms.parallel.user] add a paragraph:

Parallel algorithms are constrained when calling mutating element access functions without synchronisation: if any mutating element access function is called on an object there shall be no other unsynchronised accesses to this object. The mutating element access functions are those which are specified as mutating object in the concept, notably assignment on any object, operators ++, --, +=, and -= on any of the iterator or Size parameters, and any @= operators on the Size parameters.

Previous resolution [SUPERSEDED]:

This wording is relative to N4640.

Modify 26.3.2 [algorithms.parallel.user] as indicated:

-1- Function objects passed into parallel algorithms as objects of type Predicate, BinaryPredicate, Compare, and BinaryOperation shall not directly or indirectly modify objects via their arguments.
-?- Parallel algorithms are constrained when calling mutating element access functions without synchronisation: If any mutating element access function is called on an object there shall be no other unsynchronised accesses to this object. The mutating element access functions are those which are specified as mutating object in the concept, notably assignment on any object, operators ++, --, +=, and -= on any of the iterator or Size parameters, and any @= operators on the Size parameters.

[2017-03-03, Kona]

Dietmar provides improved wording. Issues with the PR before the change:

The part before the colon is redundant: we don't need to state that.
Replace "notably" with "specifically"
swap() needs to be in the list.
Not sure what "called on an object means"
The assignment side is overconstrained: the right hand side is allowed.

Previous resolution [SUPERSEDED]:

This wording is relative to N4640.

Modify 26.3.2 [algorithms.parallel.user] as indicated:

-1- Function objects passed into parallel algorithms as objects of type Predicate, BinaryPredicate, Compare, and BinaryOperation shall not directly or indirectly modify objects via their arguments.
-?- If an object is mutated by an element access function the algorithm will perform no other unsynchronized accesses to that object. The mutating element access functions are those which are specified as mutating the object in the relevant concept, such as swap(), ++, --, @=, and assignments. For the assignment and @= operators only the left argument is mutated.

[2017-03-03, Kona]

Dietmar finetunes wording after review by SG1.

[2017-03-03, Kona]

Move to Ready

Proposed resolution:

This wording is relative to N4640.

Add a new paragraph following 26.3.3 [algorithms.parallel.exec] p1 as indicated:

-1- Parallel algorithms have template parameters named ExecutionPolicy (20.19) which describe the manner in which the execution of these algorithms may be parallelized and the manner in which they apply the element access functions.
-?- If an object is modified by an element access function the algorithm will perform no other unsynchronized accesses to that object. The modifying element access functions are those which are specified as modifying the object in the relevant concept [Note: For example, swap(), ++, --, @=, and assignments modify the object. For the assignment and @= operators only the left argument is modified. — end note].
-2- […]