std::endian::native
value should be more specific about object representationsSection: 22.11.8 [bit.endian] Status: New Submitter: Brian Bi Opened: 2023-08-06 Last modified: 2024-02-22
Priority: 4
View all issues with New status.
Discussion:
22.11.8 [bit.endian] says that "big-endian" and "little-endian" refer to whether bytes are stored
in descending or ascending order of significance. In other words, when std::endian::native
is either
std::endian::big
or std::endian::little
, we are told something about the object representations o
f multi-byte scalar types. However, the guarantee provided in this case is not strong enough to fully specify
the object representation, even in the common situation where padding bits are not present. It would be more
useful to provide a stronger guarantee.
char
is 8 bits and there is an uint32_t
type on the current platform.
If std::endian::native
is std::endian::little
, then the program should be able to rely on the
fact that if a uint32_t
object is copied into an array of 4 unsigned char
, then the value of
the first element of that array actually equals the original value modulo 256. However, because
P1236R1 removed the core language specification of the value representation of unsigned integer
types, the program cannot actually rely on this. It is conceivable (though unlikely), for example, that
std::endian::native
could be std::endian::little
but the first byte in a uint32_t
object is actually the least significant 8 bits flipped, or the least significant 8 bits permuted, or something
like that.
[2024-02-22; Reflector poll]
Set priority to 4 after reflector poll in August 2023.
[Jonathan expressed shock that P1236R1 remove portability guarantees that were previously present.]
[Jens explained that no observable guarantees were ever present anyway, which is why Core removed the wording.]
I agree with the thrust of the issue (i.e. the special values for
std::endian
should permit reliance on a particular object
representation), but I disagree with the wording chosen. The
"pure binary" phrasing that is sort-of defined in a footnote
is bad. I think we want to say that all scalar types have no
padding bits and that the base-2 representation of
an unsigned integer type is formed by the bit concatenation
of the base-2 representations of the "unsigned char" values that
comprise the object representation of that unsigned integer type.
"bit concatenation" should best be phrased in math, e.g.
given a value x of some unsigned integer type and the
sequence of unsigned char values cj (each having width M)
comprising the object representation of x,
the coefficients of the base-2 representation of x are
xi = c⌊i/M⌋i mod M
or somesuch. See 7.6.11 [expr.bit.and] for some phrasing in this area.
Proposed resolution:
This wording is relative to N4950.
Modify the 22.11.8 [bit.endian] as indicated; using removed wording from C++17:
-2-
If all scalar types have size 1 byte, then all ofendian::little
,endian::big
, andendian::native
have the same value. Otherwise,endian::little
is not equal toendian::big
. If all scalar types are big-endian,endian::native
is equal toendian::big
. If all scalar types are little-endian,endian::native
is equal toendian::little
. Otherwise,endian::native
is not equal to eitherendian::big
orendian::little
.endian::little
is equal toendian::big
if and only if all scalar types have size 1 byte. If the value representation (6.8 [basic.types]) of every unsigned integer type uses a pure binary numeration systemfootnote ?, then:
If all scalar types have size 1 byte, then
endian::native
is equal to the common value ofendian::little
andendian::big
.Otherwise, if all scalar types are big-endian,
endian::native
is equal toendian::big
.Otherwise, if all scalar types are little-endian,
endian::native
is equal toendian::little
.Otherwise,
endian::native
is not equal to eitherendian::big
orendian::little
.Otherwise,
endian::native
is not equal to eitherendian::big
orendian::little
.footnote ?) A positional representation for integers that uses the binary digits 0 and 1, in which the values represented by successive bits are additive, begin with 1, and are multiplied by successive integral powers of 2, except perhaps for the bit with the highest position. (Adapted from the American National Dictionary for Information Processing Systems.)