887. issue with condition::wait_...

Section: 32.7.4 [thread.condition.condvar] Status: NAD Submitter: Lawrence Crowl Opened: 2008-09-15 Last modified: 2016-01-28

Priority: Not Prioritized

View all other issues in [thread.condition.condvar].

View all issues with NAD status.

Discussion:

The Posix/C++ working group has identified an inconsistency between Posix and the C++ working draft in that Posix requires the clock to be identified at creation, whereas C++ permits identifying the clock at the call to wait. The latter cannot be implemented with the former.

[ San Francisco: ]

Howard recommends NAD with the following explanation:

The intent of the current wording is for the condtion_variable::wait_until be able to handle user-defined clocks as well as clocks the system knows about. This can be done by providing overloads for the known clocks, and another overload for unknown clocks which synchs to a known clock before waiting. For example:

template <class Duration>
bool
condition_variable::wait_until(unique_lock<mutex>& lock,
                               const chrono::time_point<chrono::system_clock, Duration>& abs_time)
{
    using namespace chrono;
    nanoseconds d = __round_up<nanoseconds>(abs_time.time_since_epoch());
    __do_timed_wait(lock.mutex()->native_handle(), time_point<system_clock, nanoseconds>(d));
    return system_clock::now() < abs_time;
}

template <class Clock, class Duration>
bool
condition_variable::wait_until(unique_lock<mutex>& lock,
                               const chrono::time_point<Clock, Duration>& abs_time)
{
    using namespace chrono;
    system_clock::time_point    s_entry = system_clock::now();
    typename Clock::time_point  c_entry = Clock::now();
    nanoseconds dn = __round_up<nanoseconds>(abs_time.time_since_epoch() -
                                              c_entry.time_since_epoch());
    __do_timed_wait(lock.mutex()->native_handle(), s_entry + dn);
    return Clock::now() < abs_time;
}

In the above example, system_clock is the only clock which the underlying condition variable knows how to deal with. One overload just passes that clock through. The second overload (approximately) converts the unknown clock into a system_clock time_point prior to passing it down to the native condition variable.

On Posix systems vendors are free to add implementation defined constructors which take a clock. That clock can be stored in the condition_variable, and converted to (or not as necessary) as shown above.

If an implementation defined constructor takes a clock (for example), then part of the semantics for that implementation defined ctor might include that a wait_until using a clock other than the one constructed with results in an error (exceptional condition) instead of a conversion to the stored clock. Such a design is up to the vendor as once an implementation defined ctor is used, the vendor is free to specifiy the behavior of waits and/or notifies however he pleases (when the cv is constructed in an implementation defined manner).

[ Post Summit: ]

"POSIX people will review the proposed NAD resolution at their upcoming NY meeting.

See the minutes at: http://wiki.dinkumware.com/twiki/bin/view/Posix/POSIX-CppBindingWorkingGroupNewYork2009.

[ 2009-07 Frankfurt ]

Move to NAD.

[ 2009-07-18 Detlef reopens the issue: ]

On Friday afternoon in Frankfurt is was decided that 887 is NAD. This decision was mainly based on a sample implementation presented by Howard that implemented one clock on top of another. Unfortunately this implementation doesn't work for the probably most important case where a system has a monotonic clock and a real-time clock (or "wall time" clock):

If the underlying "system_clock" is a monotonic clock, and the program waits on the real-time clock, and the real-time clock is set forward, the wait will unblock too late.

If the underlying "system_clock" is a real-time clock, and the program waits on the monotonic clock, and the real-time clock is set back, the wait again will unblock too late.

Sorry that I didn't remember this on Friday, but it was Friday afternoon after a busy week...

So as the decision was made on a wrong asumption, I propose to re-open the issue.

[ 2009-07-26 Howard adds: ]

Detlef correctly argues that condition_variable::wait_until could return "too late" in the context of clocks being adjusted during the wait. I agree with his logic. But I disagree that this makes this interface unimplementable on POSIX.

The POSIX spec also does not guarantee that pthread_cond_timedwait does not return "too late" when clocks are readjusted during the wait. Indeed, the POSIX specification lacks any requirements at all concerning how soon pthread_cond_timedwait returns after a time out. This is evidently a QOI issue by the POSIX standard. Here is a quote of the most relevant normative text concerning pthread_cond_timedwait found here.

The pthread_cond_timedwait() function shall be equivalent to pthread_cond_wait(), except that an error is returned if the absolute time specified by abstime passes (that is, system time equals or exceeds abstime) before the condition cond is signaled or broadcasted, or if the absolute time specified by abstime has already been passed at the time of the call.

I.e. the POSIX specification speaks of the error code returned in case of a time out, but not on the timeliness of that return.

Might this simply be an oversight, or minor defect in the POSIX specification?

I do not believe so. This same section goes on to say in non-normative text:

For cases when the system clock is advanced discontinuously by an operator, it is expected that implementations process any timed wait expiring at an intervening time as if that time had actually occurred.

Here is non-normative wording encouraging the implementation to ignore an advancing underlying clock and subsequently causing an early (spurious) return. There is no wording at all which addresses Detlef's example of a "late return". With pthread_cond_timedwait this would be caused by setting the system clock backwards. It seems reasonable to assume, based on the wording that is already in the POSIX spec, that again, the discontinuously changed clock would be ignored by pthread_cond_timedwait.

A noteworthy difference between pthread_cond_timedwait and condition_variable::wait_until is that the POSIX spec appears to say that ETIMEDOUT should be returned if pthread_cond_timedwait returns because of timeout signal, whether or not the system clock was discontinuously advanced during the wait. In contrast condition_variable::wait_until always returns:

Clock::now() < abs_time

That is, the C++ spec requires that the clock be rechecked (detecting discontinuous adjustments during the wait) at the time of return. condition_variable::wait_until may indeed return early or late. But regardless it will return a value reflecting timeout status at the time of return (even if clocks have been adjusted). Of course the clock may be adjusted after the return value is computed but before the client has a chance to read the result of the return. Thus there are no iron-clad guarantees here.

condition_variable::wait_until (and pthread_cond_timedwait) is little more than a convenience function for making sure condition_variable::wait doesn't hang for an unreasonable amount of time (where the client gets to define "unreasonable"). I do not think it is in anyone's interest to try to make it into anything more than that.

I maintain that this is a useful and flexible specification in the spirit of C++, and is implementable on POSIX. The implementation technique described above is a reasonable approach. There may also be higher quality approaches. This specification, like the POSIX specification, gives a wide latitude for QOI.

I continue to recommend NAD, but would not object to a clarifying note regarding the behavior of condition_variable::wait_until. At the moment, I do not have good wording for such a note, but welcome suggestions.

[ 2009-09-30: See N2969. ]

[ 2009-10 Santa Cruz: ]

The LWG is in favor of Detlef to supply revision which adopts Option 2 from N2969 but is modified by saying that system_clock must be available for wait_until.

[ 2010-02-11 Anthony provided wording. ]

[ 2010-02-22 Anthony adds: ]

I am strongly against N2999.

Firstly, I think that the most appropriate use of a timed wait on a condition variable is with a monotonic clock, so it ought to be guaranteed to be available on systems that support such a clock. Also, making the set of supported clocks implementation defined essentially kills portability around the use of user-defined clocks.

I also think that wait_for is potentially useful, and trivially implementable given a working templated wait_until and a monotonic clock.

I also disagree with many of Detlef's points in the rationale. In a system with hard latency limits there is likely to be a monotonic clock, otherwise you have no way of measuring against these latency limits since the system_clock may change arbitrarily. In such systems, you want to be able to use wait_for, or wait_until with a monotonic clock.

I disagree that the wait_* functions cannot be implemented correctly on top of POSIX: I have done so. The only guarantee in the working draft is that when the function returns certain properties are true; there is no guarantee that the function will return immediately that the properties are true. My resolution to issue 887 makes this clear. How small the latency is is QoI.

On systems without a monotonic clock, you cannot measure the problem since the system clock can change arbitrarily so any timing calculations you make may be wrong due to clock changes.

On systems with a monotonic clock, you can choose to use it for your condition variables. If you are waiting against a system_clock::time_point then you can check the clock when waking, and either return as a timeout or spurious wake depending on whether system_clock::now() is before or after the specified time_point.

Windows does provide condition variables from Vista onwards. I choose not to use them, but they are there. If people are concerned about implementation difficulty, the Boost implementation can be used for most purposes; the Boost license is pretty liberal in that regard.

My preferred resolution to issue 887 is currently the PR in the issues list.

[ 2010 Pittsburgh: ]

There is no consensus for moving the related paper N2999 into the WP.

There was support for moving this issue as proposed to Ready, but the support was insufficient to call a consensus.

There was consensus for moving this issue to NAD as opposed to leaving it open. Rationale added.

Rationale:

The standard as written is sufficiently implementable and self consistent.

Proposed resolution:

Add a new paragraph after 32.2.4 [thread.req.timing]p3:

3 The resolution of timing provided by an implementation depends on both operating system and hardware. The finest resolution provided by an implementation is called the native resolution.

If a function in this clause takes a timeout argument, and the time point or elapsed time specified passes before the function returns, the latency between the timeout occurring and the function returning is unspecified [Note: Implementations should strive to keep such latency as small as possible, but portable code should not rely on any specific upper limits — end note]