istream >> char
and eofbit
Section: 31.7.5.2 [istream] Status: NAD Submitter: Howard Hinnant Opened: 2011-02-27 Last modified: 2016-01-28
Priority: Not Prioritized
View all other issues in [istream].
View all issues with NAD status.
Discussion:
The question is: When a single character is extracted from an istream
using operator>>
,
does eofbit
get set if this is the last character extracted from the stream? The current standard is at
best ambiguous on the subject. 31.7.5.2 [istream]/p3 describes all extraction operations with:
3 If
rdbuf()->sbumpc()
orrdbuf()->sgetc()
returnstraits::eof()
, then the input function, except as explicitly noted otherwise, completes its actions and doessetstate(eofbit)
, which may throwios_base::failure
(31.5.4.4 [iostate.flags]), before returning.
And [istream::extractors]/p12 in describing operator>>(basic_istream<charT,traits>& in, charT& c);
offers no further clarification:
12 Effects: Behaves like a formatted input member (as described in [istream.formatted.reqmts]) of
in
. After asentry
object is constructed a character is extracted fromin
, if one is available, and stored inc
. Otherwise, the function callsin.setstate(failbit)
.
I coded it one way in libc++, and g++ coded it another way. Chris Jefferson noted that some boost code was sensitive to the difference and fails for libc++. Therefore I believe that it is very important that we specify this extraction operator in enough detail that both vendors and clients know what behavior is required and expected.
Here is a brief code example demonstrating the issue:
#include <sstream> #include <cassert> int main() { std::istringstream ss("1"); char t; ss >> t; assert(!ss.eof()); };
For every type capable of reading this istringstream but char
, ss.eof()
will be true after the
extraction (bool
, int
, double
, etc.). So for consistency's sake we might want to have
char
behave the same way as other built-in types.
However Jean-Marc Bourguet offers this counter example code using an interactive stream. He argues that
setting eof
inhibits reading the next line:
#include <iostream> int main() { char c; std::cin >> std::noskipws; std::cout << "First line: "; while (std::cin >> c) { if (c == '\n') { std::cout << "Next line: "; } } }
As these two code examples demonstrate, whether or not eofbit
gets set is an observable difference and it
is impacting real-world code. I feel it is critical that we clearly and unambiguously choose one behavior or the other.
I am proposing wording for both behaviors and ask the LWG to choose one (and only one!).
Wording for setting eof
bit:
Modify [istream::extractors]/p12 as follows:
12 Effects: Behaves like a formatted input member (as described in [istream.formatted.reqmts]) of
in
. After asentry
object is constructed a character is extracted fromin
, if one is available, and stored inc
.Otherwise, the function callsIf a character is extracted and it is the last character in the pending sequence, the function callsin.setstate(failbit)
.in.setstate(eofbit)
. If a character is not extracted the function callsin.setstate(failbit | eofbit)
.
Wording for not setting eof
bit:
12 Effects: Behaves like a formatted input member (as described in [istream.formatted.reqmts]) of
in
. After asentry
object is constructed a character is extracted fromin
, if one is available, and stored inwithc
. Otherwise, the function callsin.setstate(failbit)
.in.rdbuf()->sbumpc()
. Iftraits::eof()
is returned, the function callsin.setstate(failbit | eofbit)
. Otherwise the return value is converted to typecharT
and stored inc
.
[2011-02-27: Jean-Marc Bourguet comments]
Just for completeness: it [the counter example] doesn't inhibit to read the next line, it inhibits the prompt to be put at the appropriate time.
More information to take into account when deciding:
if I'm reading correctly the section to get boolean values when boolalpha
is set, there we mandate
that eof
isn't set if trying to read past the end of the pending sequence wasn't needed to determine the result.
see also the behaviour of getline
(which isn't a formatted input function but won't set eof
if it occurs just after the delimiter)
if I'm reading the C standard correctly scanf("%c")
wouldn't set feof
either in that situation.
[2011-02-28: Martin Sebor comments]
[Responds to bullet 1 of Jean-Marc's list]
Yes, this matches the stdcxx test suite for num_get
and time_get
but not money_get
when the currency symbol is last. I don't see
where in the locale.money.get.virtuals section we specify whether
eofbit
is or isn't set and when.
char
extractor to be consistent we
should also fix all the others extractors and manipulators that
aren't consistent (including std::get_money
and std::get_time
).
[2011-03-24 Madrid meeting]
Dietmar convinced Howard, that the standard does already say the right words
Rationale:
Reading the last character does not set eofbit and the standard says so already
Proposed resolution: