Issue 736: Comment on [rand.dist.samp.discrete]

736. Comment on [rand.dist.samp.discrete]

Section: 29.5.9.6.1 [rand.dist.samp.discrete] Status: NAD Submitter: Stephan Tolksdorf Opened: 2007-09-21 Last modified: 2016-01-28

Priority: Not Prioritized

View all other issues in [rand.dist.samp.discrete].

View all issues with NAD status.

Discussion:

The specification for discrete_distribution requires the member probabilities() to return a vector of standardized probabilities, which forces the implementation every time to divide each probability by the sum of all probabilities, as the sum will in practice almost never be exactly 1.0. This is unnecessarily inef ficient as the implementation would otherwise not need to compute the standardized probabilities at all and could instead work with the non-standardized probabilities and the sum. If there was no standardization the user would just get back the probabilities that were previously supplied to the distribution object, which to me seems to be the more obvious solution.
The behaviour of discrete_distribution is not specified in case the number of given probabilities is larger than the maximum number representable by the IntType.

Possible resolution: I propose to change the specification such that the non-standardized probabilities need to be returned and that an additional requirement is included for the number of probabilities to be smaller than the maximum of IntType.

[ Stephan Tolksdorf adds pre-Bellevue: ]

In reply to the discussion in N2424 of this issue:

Rescaled floating-point parameter vectors can not be expected to compare equal because of the limited precision of floating-point numbers. My proposal would at least guarantee that a parameter vector (of type double) passed into the distribution would compare equal with the one returned by the probabilities() method. Furthermore, I do not understand why "the changed requirement would lead to a significant increase in the amount of state in the distribution object". A typical implementation's state would increase by exactly one number: the sum of all probabilities. The textual representation for serialization would not need to grow at all. Finally, the proposed replacement "0 < n <= numeric_limits<IntType>::max() + 1" makes the implementation unnecessarily complicated, "0 < n <= numeric_limits<IntType>::max()" would be better.

[ Bellevue: ]

In N2424. We agree with the observation and the proposed resolution to part b). We recommend the wording n > 0 be replaced with 0 < n numeric_limits::max() + 1. However, we disagree with part a), as it would interfere with the definition of parameters' equality. Further, the changed requirement would lead to a significant increase in the amount of state of the distribution object.

As it stands now, it is convenient, and the changes proposed make it much less so.

NAD. Part a the current behavior is desirable. Part b, any constructor can fail, but the rules under which it can fail do not need to be listed here.

Proposed resolution:

See N2424 for the proposed resolution.

[ Stephan Tolksdorf adds pre-Bellevue: ]

In 29.5.9.6.1 [rand.dist.samp.discrete]:

Proposed wording a):

Change in para. 2

Constructs a discrete_distribution object with n=1 and p₀ = w₀ = 1

and change in para. 5

Returns: A vector<double> whose size member returns n and whose operator[] member returns ~~p_k~~ the weight w_k as a double value when invoked with argument k for k = 0, ..., n-1

Proposed wording b):

Change in para. 3:

If firstW == lastW, let the sequence w have length n = 1 and consist of the single value w₀ = 1. Otherwise, [firstW,lastW) shall form a sequence w of length n > 0 such that 0 < n <= numeric_limits<IntType>::max(), and *firstW shall yield a value w₀ convertible to double. [Note: The values w_k are commonly known as the weights . -- end note]