4134. Issue with Philox algorithm specification

Section: 29.5.4.5 [rand.eng.philox] Status: Ready Submitter: Ilya A. Burylov Opened: 2024-08-06 Last modified: 2024-08-21

Priority: 1

View other active issues in [rand.eng.philox].

View all other issues in [rand.eng.philox].

View all issues with Ready status.

Discussion:

The P2075R6 proposal introduced the Philox engine and described the algorithm closely following the original paper (further Random123sc11).

Matt Stephanson implemented P2075R6 and the 10000'th number did not match. Further investigation revealed several places in Random123sc11 algorithm description, which either deviate from the reference implementation written by Random123sc11 authors or loosely defined, which opens the way for different interpretations.

All major implementations of the Philox algorithm (NumPy, Intel MKL, Nvidia cuRAND, etc.) match the reference implementation written by Random123sc11 authors and we propose to align wording with that.

The rationale of proposed changes:

  1. Random123sc11 refers to the permutation step as "the inputs are permuted using the Threefish N-word P-box", thus the P2075R6 permutation table ([tab:rand.eng.philox.f]) is taken from Threefish algorithm. But the permutation for N=4 in this table does not match the reference implementations. It's worth noting that while Random123sc11 described the Philox algorithm for N=8 and N=16, there are no known reference implementations of it either provided by authors or implemented by other libraries. We proposed to drop N=8 and N=16 for now and update the permutation indices for N=4 to match the reference implementation. Note: the proposed resolution allows extending N > 4 cases in the future.

  2. The original paper describes the S-box update for X values in terms of L' and R' but does not clarify their ordering as part of the counter. In order to match Random123sc11 reference implementation the ordering should be swapped.

  3. Philox alias templates should be updated, because the current ordering of constants matches the specific optimization in the reference Random123sc11 implementation and not the generic algorithm description.

All proposed modifications below are confirmed by:

[2024-08-21; Reflector poll]

Set priority to 1 after reflector poll.

[2024-08-21; Move to Ready at LWG telecon]

Proposed resolution:

This wording is relative to N4988.

  1. Modify 29.5.4.5 [rand.eng.philox], [tab:rand.eng.philox.f] as indicated (This effectively reduces 16 data columns to 4 data columns and 4 data rows to 2 data rows):

    Table 101 — Values for the word permutation fn(j) [tab:rand.eng.philox.f]
    fn(j) j
    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
    n
    2 0 1
    4 20 13 02 31
    8 2 1 4 7 6 5 0 3
    16 0 9 2 13 6 11 4 15 10 7 12 3 14 5 8 1
  2. Modify 29.5.4.5 [rand.eng.philox] as indicated:

    -4- […]

    1. (4.1) — […]

    2. (4.2) — The following computations are applied to the elements of the V sequence:

      X2k+0 = mulhimullo(V2k+1,Mk,w) xor keykq xor V2k+1

      X2k+1 = mullomulhi(V2k+1,Mk,w) xor keykq xor V2k

    -5- […]

    -6- Mandates:

    1. (6.1) — […]

    2. (6.2) — n == 2 || n == 4 || n == 8 || n == 16 is true, and

    3. (6.3) — […]

    4. (6.4) — […]

  3. Modify 29.5.6 [rand.predef] as indicated:

    using philox4x32 =
          philox_engine<uint_fast32_t, 32, 4, 10,
           0xCD9E8D570xD2511F53, 0x9E3779B9, 0xD2511F530xCD9E8D57, 0xBB67AE85>;
    

    -11- Required behavior: The 10000th consecutive invocation a default-constructed object of type philox4x32 produces the value 1955073260.

    using philox4x64 =
          philox_engine<uint_fast64_t, 64, 4, 10,
           0xCA5A8263951211570xD2E7470EE14C6C93, 0x9E3779B97F4A7C15, 0xD2E7470EE14C6C930xCA5A826395121157, 0xBB67AE8584CAA73B>;
    

    -12- Required behavior: The 10000th consecutive invocation a default-constructed object of type philox4x64 produces the value 3409172418970261260.