490. std::unique wrongly specified

Section: 26.7.9 [alg.unique] Status: NAD Submitter: Thomas Mang Opened: 2004-12-12 Last modified: 2016-01-28

Priority: Not Prioritized

View all other issues in [alg.unique].

View all issues with NAD status.

Discussion:

In Section 25.2.8 [lib.alg.unique], paragraphs 1 to 3 describe the behavior of the mutating sequence operation std::unique. However, the wording does not reflect the intended behavior [Note: See definition of intended behavior below] of these algorithms, as it is known to the C++ community [1].

1) Analysis of current wording:

25.2.8 [lib.alg.unique], paragraph 1:

Current wording says: "Effects: Eliminates all but the first element from every consecutive group of equal elements referred to by the iterator i in the range [first, last) for which the following corresponding conditions hold: *i == *(i - 1) or pred(*i, *(i -1)) != false"

This sentences expresses specifically that all elements denoted by the (original) range [first, last) which are not but the first element from a consecutive group of equal elements (where equality is defined as *i == *(i - 1) or pred(*i, *(i - 1)) ! = false) [Note: See DR 202], call them r-elements [Note: r...remove], will be eliminated. Since there is no formal definition of the term "eliminate" provided, it is undefined how this "elimination" takes place. But the meaning of "eliminate" in everyday language seems to disallow explicitly that after application of the algorithm, any r-element will remain at any position of the range [first, last) [2].

Another defect in the current wording concerns the iterators used to compare two elements for equality: The current wording contains the expression "(i - 1)", which is not covered by 25/9 [Note: See DR submitted by Thomas Mang regarding invalid iterator arithmetic expressions].

25.2.8 [lib.alg.unique], paragraph 2:

Current wording says: "Returns: The end of the resulting range."

The resulting range is not specified. In combination with 25.2.8 [lib.alg.unique], paragraph 1, one reasonable interpretation (in the author's opinion even the only possible interpretation) of this so-called resulting range is the range [first, last) - thus returning always the ForwardIterator 'last' parameter.

2) Description of intended behavior:

For the rest of this Defect Report, it is assumed that the intended behavior was that all elements denoted by the original range [first, last) which are the first element from a consecutive group of elements for which the corresponding conditions: *(i-1) == *i (for the version of unique without a predicate argument) or pred(*(i-1), *i) ! = false (for the version of unique with a predicate argument) [Note: If such a group of elements consists of only a single element, this is also considered the first element] [Note: See resolutions of DR 202], call them s-elements [Note: s...stay], will be placed into a contiguous subrange of [first, last), denoted by the iterators [first, return value). The number of elements in the resulting range [first, return value) shall be equal to the number of s-elements in the original range [first, last). Invalid iterator arithmetic expressions are expected to be resolved as proposed in DR submitted by Thomas Mang regarding invalid iterator arithmetic expressions. It is also assumed by the author that the relative order of the elements in the resulting subrange [first, return value) shall be the same as the relative order of the corresponding elements (the s-elements) in the original range [Note: If this was not intended behavior, the additional proposed paragraph about stable order will certainly become obsolete]. Furthermore, the resolutions of DR 202 are partially considered.

All implementations known to the author of this Defect Report comply with this intent [Note: Except possible effects of DR 202]. Since this intent of the behavior (contrary to the current wording) is also described in various utility references serving the C++ community [1], it is not expected that fixing the paragraphs will influence current code [Note: Except possible effects of DR 202] - unless the code relies on the behavior as it is described by current wording and the implementation indeed reflects the current wording, and not the intent.

3) Proposed fixes:

Change 25.2.8 [lib.alg.unique], paragraph 1 to:

"Effect: Places the first element from every consecutive group of elements, referred to by the iterator i in the range [first, last), for which the following conditions hold: *(i-1) == *i (for the version of unique without a predicate argument) or pred(*(i -1), *i) != false (for the version of unique with a predicate argument), into the subrange [first, k) of the original range, where k shall denote a value of type ForwardIterator."

Comments to the new wording:

a) The new wording was influenced by the resolutions of DR 202. If DR 202 is resolved in another way, the proposed wording need also additional review. b) "Places" has no special meaning, and the everyday language meaning should fit. c) The expression "(i - 1)" was left, but is expected that DR submitted by Thomas Mang regarding invalid iterator arithmetic expressions will take this into account. d) The wording "(for the version of unique without a predicate argument)" and "(for the version of unique with a predicate argument)" was added consciously for clarity and is in resemblence with current 23.2.2.4 [lib.list.ops], paragraph 19. It might be considered redundant. e) The wording "of the original range" might be redundant, since any subrange starting at first and containing no more elements than the original range is implicitly a subrange of the original range [first, last). f) The iterator k was introduced instead of "return value" in order to avoid a cyclic dependency on 25.2.8 [lib.alg.unique], paragraph 2. The wording ", where k shall denote a value of type ForwardIterator" might be redundant, because it follows implicitly by 25.2.8 [lib.alg.unique], paragraph 2. g) "Places" does, in the author's opinion, explicitly forbid duplicating any s-element in the original range [first, last) within the resulting range [first, k). If there is doubt this term might be not unambiguous regarding this, it is suggested that k is specified more closely by the following wording: "k shall denote a value of type ForwardIterator [Note: See f)] so that k - first is equal to the number of elements in the original range [first, last) being the first element from every consecutive group of elements for which the corresponding condition did hold". This could also be expressed as a separate paragraph "Postcondition:". h) If it is considered that the wording is unclear whether it declares the element of a group which consists of only a single element implicitly to be the first element of this group [Note: Such an interpretation could eventually arise especially in case last - first == 1] , the following additional sentence is proposed: "If such a group of elements consists of only a single element, this element is also considered the first element."

Change 25.2.8 [lib.alg.unique], paragraph 2 to: "Returns: The iterator k."

Add a separate paragraph "Notes:" as 25.2.8 [lib.alg.unique], paragraph 2a or 3a, or a separate paragraph "Postcondition:" before 25.2.8 [lib.alg.unique], paragraph 2 (wording inside {} shall be eliminated if the preceding expressions are used, or the preceding expressions shall be eliminated if wording inside {} is used):

"Notes:{Postcondition:} Stable: the relative order of the elements that are placed into the subrange [first, return value {k}) shall be the same as their relative order was in the original range [first, last) prior to application of the algorithm."

Comments to the new wording:

a) It is assumed by the author that the algorithm was intended to be stable. In case this was not the intent, this paragraph becomes certainly obsolete. b) The wording "was ... prior to application of the algorithm" is used to explicitly distinguish the original range not only by means of iterators, but also by a 'chronological' factor from the resulting range [first, return value). It might be redundant.

25.2.8 [lib.alg.unique], paragraph 3:

See DR 239.

4) References to other DRs:

See DR 202, but which does not address any of the problems described in this Defect Report [Note: This DR is supposed to complement DR 202]. See DR 239. See DR submitted by Thomas Mang regarding invalid iterator arithmetic expressions.

[1]: The wording of these references is not always unambiguous, and provided examples partially contradict verbal description of the algorithms, because the verbal description resembles the problematic wording of ISO/IEC 14882:2003.

[2]: Illustration of conforming implementations according to current wording:

One way the author of this DR considers how this "elimination" could be achieved by a conforming implementation according to current wording is by substituting each r-element by _any_ s-element [Note: s...stay; any non-r-element], since all r-elements are "eliminated".

In case of a sequence consisting of elements being all 'equal' [Note: See DR 202], substituting each r-element by the single s-element is the only possible solution according to current wording.

Proposed resolution:

Rationale:

The LWG believes the standard is sufficiently clear. No implementers get it wrong, and changing it wouldn't cause any code to change, so there is no real-world harm here.