2036. istream >> char and eofbit

Section: 31.7.5.2 [istream] Status: NAD Submitter: Howard Hinnant Opened: 2011-02-27 Last modified: 2016-01-28

Priority: Not Prioritized

View all other issues in [istream].

View all issues with NAD status.

Discussion:

The question is: When a single character is extracted from an istream using operator>>, does eofbit get set if this is the last character extracted from the stream? The current standard is at best ambiguous on the subject. 31.7.5.2 [istream]/p3 describes all extraction operations with:

3 If rdbuf()->sbumpc() or rdbuf()->sgetc() returns traits::eof(), then the input function, except as explicitly noted otherwise, completes its actions and does setstate(eofbit), which may throw ios_base::failure (31.5.4.4 [iostate.flags]), before returning.

And [istream::extractors]/p12 in describing operator>>(basic_istream<charT,traits>& in, charT& c); offers no further clarification:

12 Effects: Behaves like a formatted input member (as described in [istream.formatted.reqmts]) of in. After a sentry object is constructed a character is extracted from in, if one is available, and stored in c. Otherwise, the function calls in.setstate(failbit).

I coded it one way in libc++, and g++ coded it another way. Chris Jefferson noted that some boost code was sensitive to the difference and fails for libc++. Therefore I believe that it is very important that we specify this extraction operator in enough detail that both vendors and clients know what behavior is required and expected.

Here is a brief code example demonstrating the issue:

#include <sstream>
#include <cassert>

int main()
{
  std::istringstream ss("1");
  char t;
  ss >> t;
  assert(!ss.eof());
};

For every type capable of reading this istringstream but char, ss.eof() will be true after the extraction (bool, int, double, etc.). So for consistency's sake we might want to have char behave the same way as other built-in types.

However Jean-Marc Bourguet offers this counter example code using an interactive stream. He argues that setting eof inhibits reading the next line:

#include <iostream>

int main()
{
 char c;
 std::cin >> std::noskipws;
 std::cout << "First line: ";
 while (std::cin >> c) {
    if (c == '\n') {
       std::cout << "Next line: ";
    }
 }
}

As these two code examples demonstrate, whether or not eofbit gets set is an observable difference and it is impacting real-world code. I feel it is critical that we clearly and unambiguously choose one behavior or the other. I am proposing wording for both behaviors and ask the LWG to choose one (and only one!).

Wording for setting eof bit:

Modify [istream::extractors]/p12 as follows:

12 Effects: Behaves like a formatted input member (as described in [istream.formatted.reqmts]) of in. After a sentry object is constructed a character is extracted from in, if one is available, and stored in c. Otherwise, the function calls in.setstate(failbit). If a character is extracted and it is the last character in the pending sequence, the function calls in.setstate(eofbit). If a character is not extracted the function calls in.setstate(failbit | eofbit).

Wording for not setting eof bit:

12 Effects: Behaves like a formatted input member (as described in [istream.formatted.reqmts]) of in. After a sentry object is constructed a character is extracted from in, if one is available, and stored in c. Otherwise, the function calls in.setstate(failbit). with in.rdbuf()->sbumpc(). If traits::eof() is returned, the function calls in.setstate(failbit | eofbit). Otherwise the return value is converted to type charT and stored in c.

[2011-02-27: Jean-Marc Bourguet comments]

Just for completeness: it [the counter example] doesn't inhibit to read the next line, it inhibits the prompt to be put at the appropriate time.

More information to take into account when deciding:

[2011-02-28: Martin Sebor comments]

[Responds to bullet 1 of Jean-Marc's list]

Yes, this matches the stdcxx test suite for num_get and time_get but not money_get when the currency symbol is last. I don't see where in the locale.money.get.virtuals section we specify whether eofbit is or isn't set and when.

IMO, if we try to fix the char extractor to be consistent we should also fix all the others extractors and manipulators that aren't consistent (including std::get_money and std::get_time).

[2011-03-24 Madrid meeting]

Dietmar convinced Howard, that the standard does already say the right words

Rationale:

Reading the last character does not set eofbit and the standard says so already

Proposed resolution: