2987. Relationship between traits_inst.lookup_collatename and the regex FSM is underspecified with regards to ClassAtomCollatingElement

Section: 32.12 [re.grammar] Status: New Submitter: Hubert Tong Opened: 2017-06-25 Last modified: 2017-07-12 01:35:02 UTC

Priority: 3

View other active issues in [re.grammar].

View all other issues in [re.grammar].

View all issues with New status.


For a user to implement a regular expression traits class meaningfully, the relationship between the return value of traits_inst.lookup_collatename to the behaviour of the finite state machine corresponding to a regular expression needs to be better specified.

From N4660 subclause 31.13 [re.grammar], traits_inst.lookup_collatename only feeds clearly into two operations:

  1. a test if the returned string is empty ([re.grammar]/8), and

  2. a test if the result of traits_inst.transform_primary, with the returned string, is empty ([re.grammar]/10).

Note: It is unclear if bullet 14.3 in [re.grammar]/14 refers to the result of traits_inst.lookup_collatename when it refers to a "collating element"; and if it does, it is unclear what input is to be used.

It is therefore unclear what the effect is if traits_inst.lookup_collatename substitutes another member of the equivalence class as its output.

For example, when processing "[[.AA.]]" as a pattern under a locale da_DK.utf8, what is the expected behaviour difference (if any) should traits_inst.lookup_collatename return, for "AA", "\u00C5" (where U+00C5 is A with ring, which sorts the same as "AA")?

[2017-07 Toronto Monday issue prioritization]

Priority 3

Proposed resolution: