Section: 26.8.7 [alg.set.operations] Status: CD1 Submitter: Matt Austern Opened: 2001-01-03 Last modified: 2016-01-28
Priority: Not Prioritized
View all other issues in [alg.set.operations].
View all issues with CD1 status.
Discussion:
The standard library contains four algorithms that compute set
operations on sorted ranges: set_union
, set_intersection
,
set_difference
, and set_symmetric_difference
. Each
of these algorithms takes two sorted ranges as inputs, and writes the
output of the appropriate set operation to an output range. The elements
in the output range are sorted.
The ordinary mathematical definitions are generalized so that they
apply to ranges containing multiple copies of a given element. Two
elements are considered to be "the same" if, according to an
ordering relation provided by the user, neither one is less than the
other. So, for example, if one input range contains five copies of an
element and another contains three, the output range of set_union
will contain five copies, the output range of
set_intersection
will contain three, the output range of
set_difference
will contain two, and the output range of
set_symmetric_difference
will contain two.
Because two elements can be "the same" for the purposes of these set algorithms, without being identical in other respects (consider, for example, strings under case-insensitive comparison), this raises a number of unanswered questions:
The standard should either answer these questions, or explicitly say that the answers are unspecified. I prefer the former option, since, as far as I know, all existing implementations behave the same way.
Proposed resolution:
Add the following to the end of 26.8.7.3 [set.union] paragraph 5:
If [first1, last1) contains m elements that are equivalent to each other and [first2, last2) contains n elements that are equivalent to them, then max(m, n) of these elements will be copied to the output range: all m of these elements from [first1, last1), and the last max(n-m, 0) of them from [first2, last2), in that order.
Add the following to the end of 26.8.7.4 [set.intersection] paragraph 5:
If [first1, last1) contains m elements that are equivalent to each other and [first2, last2) contains n elements that are equivalent to them, the first min(m, n) of those elements from [first1, last1) are copied to the output range.
Add a new paragraph, Notes, after 26.8.7.5 [set.difference] paragraph 4:
If [first1, last1) contains m elements that are equivalent to each other and [first2, last2) contains n elements that are equivalent to them, the last max(m-n, 0) elements from [first1, last1) are copied to the output range.
Add a new paragraph, Notes, after 26.8.7.6 [set.symmetric.difference] paragraph 4:
If [first1, last1) contains m elements that are equivalent to each other and [first2, last2) contains n elements that are equivalent to them, then |m - n| of those elements will be copied to the output range: the last m - n of these elements from [first1, last1) if m > n, and the last n - m of these elements from [first2, last2) if m < n.
[Santa Cruz: it's believed that this language is clearer than what's in the Standard. However, it's also believed that the Standard may already make these guarantees (although not quite in these words). Bill and Howard will check and see whether they think that some or all of these changes may be redundant. If so, we may close this issue as NAD.]
Rationale:
For simple cases, these descriptions are equivalent to what's already in the Standard. For more complicated cases, they describe the behavior of existing implementations.