Issue 2524: generate_canonical can occasionally return 1.0

2524. `generate_canonical` can occasionally return 1.0

Section: 29.5.9.4.2 [rand.dist.pois.exp] Status: Resolved Submitter: Michael Prähofer Opened: 2015-08-20 Last modified: 2024-04-02

Priority: 2

View all issues with Resolved status.

Discussion:

Original title was: exponential_distribution<float> sometimes returns inf.

The random number distribution class template exponential_distribution<float> may return "inf" as can be seen from the following example program:

// compiled with
// g++ -std=c++11 Error_exp_distr.cpp

#include <iostream>
#include <random>
#include <bitset>

int main(){
  unsigned long long h;
  std::mt19937_64 mt1(1);
  std::mt19937_64 mt2(1);
  mt1.discard(517517);
  mt2.discard(517517);
  std::exponential_distribution<float> dis(1.0);
  h = mt2();
  std::cout << std::bitset<64>(h) << " " << (float) -log(1 - h/pow(2, 64)) << " " 
            << -log(1 - (float) h/pow(2, 64)) << " " << dis(mt1) << std::endl;
  h = mt2();
  std::cout << std::bitset<64>(h) << " " << (float) -log(1 - h/pow(2, 64)) << " " 
            << -log(1 - (float) h/pow(2, 64)) << " " << dis(mt1) << std::endl;
}

output:

0110010110001001010011000111000101001100111110100001110011100001 0.505218 0.505218 0.505218
1111111111111111111111111101010011000110011110011000110101100110 18.4143 inf inf

The reason seems to be that converting a double x in the range [0, 1) to float may result in 1.0f if x is close enough to 1. I see two possibilities to fix that:

use internally double (or long double?) and then convert the result at the very end to float.
take only 24 random bits and convert them to a float x in the range [0, 1) and then return -log(1 - x).

I have not checked if std::exponential_distribution<double> has the same problem: For float on the average 1 out of 2²⁴ (~10⁷) draws returns "inf", which is easily confirmed. For double on the average 1 out of 2⁵³ (~10¹⁶) draws might return "inf", which I have not tested.

Marshall:
I don't think the problem is in std::exponential_distribution; but rather in generate_canonical.

Consider:

std::mt19937_64 mt2(1);
mt2.discard(517517);
std::cout << std::hexfloat << std::generate_canonical<float, std::numeric_limits<float>::digits>(mt2) << std::endl;
std::cout << std::hexfloat << std::generate_canonical<float, std::numeric_limits<float>::digits>(mt2) << std::endl;
std::cout << std::hexfloat << std::generate_canonical<float, std::numeric_limits<float>::digits>(mt2) << std::endl;

which outputs:

0x1.962532p-2
0x1p+0
0x1.20d0cap-3

but generate_canonical is defined to return a result in the range [0, 1).

[2015-10, Kona Saturday afternoon]

Options:

1) NAD, make it work. E.g. nudge slightly, making the hightest result return the highest allowable value (next_after). This might change the distribution slightly.
2) Re-run the algorithm when you get a 1. This changes the specification in that it makes extra calls to the random engine.
3) Something else

WEB: The one thing we cannot tolerate is any output range other than [0, 1).

WEB: I believe there may be a documented algorithm for the generator, and perhaps it's possible to discover en-route that the algorithm produces the wrong result and fix it.

MC: No. I analyzed this once, and here it is: the algorithm is in [rand.util.canonical], and it's all fine until p5. The expression S/R^k is mathematically less than one, but it may round to one.

GR: Could we change the rounding mode for the computation?

HH: No, because the rounding mode is global, not thread-local.

AM: SG1 wants to get rid of the floating point environment.

STL: The problem is that the standard specifies the implementation, and the implementation doesn't work.

MC: I'm not sure if nudging it down will introduce a subtle bias.

EF: I worry about how the user's choice of floating point environment affects the behaviour.

MS offers to run the topic past colleagues.

MC: Will set the status to open. STL wants to rename the issue. WEB wants to be able to find the issue by its original name still.

Mike Spertus to run the options past his mathematical colleagues, and report back.

[2017-11 Albuquerque Wednesday issue processing]

Choice: Rerun the algorithm if it gets 1.0.

Thomas K to provide wording; Marshall and STL to review.

[2018-08 Batavia Monday issue discussion]

Davis has a paper P0952 which resolves this.

[2024-04-02 Kona 2023; Resolved by P0952R2. Status changed: Open → Resolved.]

Proposed resolution:

2524. generate_canonical can occasionally return 1.0

2524. `generate_canonical` can occasionally return 1.0