3968. std::endian::native value should be more specific about object representations

Section: 22.11.8 [bit.endian] Status: New Submitter: Brian Bi Opened: 2023-08-06 Last modified: 2024-02-22

Priority: 4

View all issues with New status.

Discussion:

22.11.8 [bit.endian] says that "big-endian" and "little-endian" refer to whether bytes are stored in descending or ascending order of significance. In other words, when std::endian::native is either std::endian::big or std::endian::little, we are told something about the object representations o f multi-byte scalar types. However, the guarantee provided in this case is not strong enough to fully specify the object representation, even in the common situation where padding bits are not present. It would be more useful to provide a stronger guarantee.

Consider, for example, if char is 8 bits and there is an uint32_t type on the current platform. If std::endian::native is std::endian::little, then the program should be able to rely on the fact that if a uint32_t object is copied into an array of 4 unsigned char, then the value of the first element of that array actually equals the original value modulo 256. However, because P1236R1 removed the core language specification of the value representation of unsigned integer types, the program cannot actually rely on this. It is conceivable (though unlikely), for example, that std::endian::native could be std::endian::little but the first byte in a uint32_t object is actually the least significant 8 bits flipped, or the least significant 8 bits permuted, or something like that.

[2024-02-22; Reflector poll]

Set priority to 4 after reflector poll in August 2023.

[Jonathan expressed shock that P1236R1 remove portability guarantees that were previously present.]

[Jens explained that no observable guarantees were ever present anyway, which is why Core removed the wording.]

I agree with the thrust of the issue (i.e. the special values for std::endian should permit reliance on a particular object representation), but I disagree with the wording chosen. The "pure binary" phrasing that is sort-of defined in a footnote is bad. I think we want to say that all scalar types have no padding bits and that the base-2 representation of an unsigned integer type is formed by the bit concatenation of the base-2 representations of the "unsigned char" values that comprise the object representation of that unsigned integer type. "bit concatenation" should best be phrased in math, e.g. given a value x of some unsigned integer type and the sequence of unsigned char values cj (each having width M) comprising the object representation of x, the coefficients of the base-2 representation of x are xi = c⌊i/M⌋i mod M or somesuch. See 7.6.11 [expr.bit.and] for some phrasing in this area.

Proposed resolution:

This wording is relative to N4950.

  1. Modify the 22.11.8 [bit.endian] as indicated; using removed wording from C++17:

    -2- If all scalar types have size 1 byte, then all of endian::little, endian::big, and endian::native have the same value. Otherwise, endian::little is not equal to endian::big. If all scalar types are big-endian, endian::native is equal to endian::big. If all scalar types are little-endian, endian::native is equal to endian::little. Otherwise, endian::native is not equal to either endian::big or endian::little.endian::little is equal to endian::big if and only if all scalar types have size 1 byte. If the value representation (6.8 [basic.types]) of every unsigned integer type uses a pure binary numeration systemfootnote ?, then:

    • If all scalar types have size 1 byte, then endian::native is equal to the common value of endian::little and endian::big.

    • Otherwise, if all scalar types are big-endian, endian::native is equal to endian::big.

    • Otherwise, if all scalar types are little-endian, endian::native is equal to endian::little.

    • Otherwise, endian::native is not equal to either endian::big or endian::little.

    Otherwise, endian::native is not equal to either endian::big or endian::little.

    footnote ?) A positional representation for integers that uses the binary digits 0 and 1, in which the values represented by successive bits are additive, begin with 1, and are multiplied by successive integral powers of 2, except perhaps for the bit with the highest position. (Adapted from the American National Dictionary for Information Processing Systems.)