std::print
on POSIX platformsSection: 31.7.10 [print.fun] Status: New Submitter: Jonathan Wakely Opened: 2024-01-24 Last modified: 2024-01-24 20:20:08 UTC
Priority: Not Prioritized
View other active issues in [print.fun].
View all other issues in [print.fun].
View all issues with New status.
Discussion:
The effects for vprintf_unicode
say:
If
stream
refers to a terminal capable of displaying Unicode, writesout
to the terminal using the native Unicode API; ifout
contains invalid code units, the behavior is undefined and implementations are encouraged to diagnose it. Otherwise writesout
to stream unchanged. If the native Unicode API is used, the function flushesstream
before writingout
.[Note 1: On POSIX and Windows,
stream
referring to a terminal means that, respectively,isatty(fileno(stream))
andGetConsoleMode(_get_osfhandle(_fileno(stream)), ...)
return nonzero. — end note][Note 2: On Windows, the native Unicode API is
WriteConsoleW
. — end note]-8- Throws: [...]
-9- Recommended practice: If invoking the native Unicode API requires transcoding, implementations should substitute invalid code units with u+fffd replacement character per the Unicode Standard, Chapter 3.9 u+fffd Substitution in Conversion.
The very explicit mention of isatty
for POSIX platforms has
confused at least two implementers into thinking that we're supposed to
use isatty
, and supposed to do something differently based
on what it returns. That seems consistent with the nearly identical wording
in 22.14.2.2 [format.string.std] paragraph 12, which says
"Implementations should use either UTF-8, UTF-16, or UTF-32,
on platforms capable of displaying Unicode text in a terminal"
and then has a note explicitly saying this is the case for Windows-based and
many POSIX-based operating systems. So it seems clear that POSIX platforms
are supposed to be considered to have "a terminal capable of displaying
Unicode text", and so std::print
should use isatty
and then use a native Unicode API, and diagnose invalid code units.
This is a problem however, because isatty
needs
to make a system call on Linux, adding 500ns to every std::print
call. This results in a 10x slowdown on Linux, where std::print
can take just 60ns without the isatty
check.
From discussions with Tom Honermann I learned that the "native Unicode API"
wording is only relevant on Windows. This makes sense, because for POSIX
platforms, writing to a terminal is done using the usual stdio functions,
so there's no need to treat a terminal differently to any other file stream.
And substitution of invalid code units with
u+fffd
is recommended for Windows because that's what typical modern terminals do on
POSIX platforms, so requiring the implementation to do that on Windows gives
consistent behaviour. But the implementation doesn't need to do anything to
make that happen with a POSIX terminal, it happens anyway.
So the isatty
check is unnecessary for POSIX platforms,
and the note mentioning it just causes confusion and has no benefit.
Secondly, there initially seems to be a contradiction between the "implementations are encouraged to diagnose it" wording and the later Recommended practice. In fact, there's no contradiction because the native Unicode API might accept UTF-8 and therefore require no transcoding, and so the Recommended practice wouldn't apply. The intention is that diagnosing invalid UTF-8 is still desirable in this case, but how should it be diagnosed? By writing an error to the terminal alongside the formatted string? Or by substituting u+fffd maybe? If the latter is the intention, why is one suggestion in the middle of the Effects, and one given as Recommended practice?
The proposed resolution attempts to clarify that a "native Unicode API" is only needed if that's how you display Unicode on the terminal. It also moves the flushing requirement to be adjacent to the other requirements for systems using a native Unicode API instead of on its own later in the paragraph. And the suggestion to diagnose invalid code units is moved into the Recommended practice and clarified that it's only relevant if using a native Unicode API. I'm still not entirely happy with encouragement to diagnose invalid code units without giving any clue as to how that should be done. What does it mean to diagnose something at runtime? That's novel for the C++ standard. The way it's currently phrased seems to imply something other than u+fffd substitution should be done, although that seems the most obvious implementation to me.
Proposed resolution:
This wording is relative to N4971.
Modify 31.7.6.3.5 [ostream.formatted.print] as indicated:
void vprint_unicode(ostream& os, string_view fmt, format_args args); void vprint_nonunicode(ostream& os, string_view fmt, format_args args);-3- Effects: Behaves as a formatted output function (31.7.6.3.1 [ostream.formatted.reqmts]) of
os
, except that:
- (3.1) – failure to generate output is reported as specified below, and
- (3.2) – any exception thrown by the call to
vformat
is propagated without regard to the value ofos.exceptions()
and without turning onios_base::badbit
in the error state ofos
.After constructing a
sentry
object, the function initializes an automatic variable viaIf the function isstring out = vformat(os.getloc(), fmt, args);
vprint_unicode
andos
is a stream that refers to a terminal capable of displaying Unicode via a native Unicode API, which is determined in an implementation-defined manner, flushesos
and then writesout
to the terminal using the native Unicode API; ifout
contains invalid code units, the behavior is undefinedand implementations are encouraged to diagnose it.If the native Unicode API is used, the function flushesOtherwise, (isos
before writingout
.os
is not such a stream or the function isvprint_nonunicode
), inserts the character sequence [out.begin()
,out.end()
) intoos
. If writing to the terminal or inserting intoos
fails, callsos.setstate(ios_base::badbit)
(which may throwios_base::failure
).-4- Recommended practice: For
vprint_unicode
, if invoking the native Unicode API requires transcoding, implementations should substitute invalid code units with u+fffd replacement character per the Unicode Standard, Chapter 3.9 u+fffd Substitution in Conversion. If invoking the native Unicode API does not require transcoding, implementations are encouraged to diagnose invalid code units.
Modify 31.7.10 [print.fun] as indicated:
void vprint_unicode(FILE* stream, string_view fmt, format_args args);-6- Preconditions:
stream
is a valid pointer to an output C stream.-7- Effects: The function initializes an automatic variable via
Ifstring out = vformat(fmt, args);
stream
refers to a terminal capable of displaying Unicode via a native Unicode API, flushesstream
and then writesout
to the terminal using the native Unicode API; ifout
contains invalid code units, the behavior is undefinedand implementations are encouraged to diagnose it. Otherwise writesout
to stream unchanged.If the native Unicode API is used, the function flushesstream
before writingout
.[Note 1: On
POSIX andWindows,the native Unicode API isWriteConsoleW
andstream
referring to a terminal means that, respectively,isatty(fileno(stream))
andGetConsoleMode(_get_osfhandle(_fileno(stream)), ...)
return nonzero. — end note]
[Note 2: On Windows, the native Unicode API isWriteConsoleW
. — end note]-8- Throws: [...]
-9- Recommended practice: If invoking the native Unicode API requires transcoding, implementations should substitute invalid code units with u+fffd replacement character per the Unicode Standard, Chapter 3.9 u+fffd Substitution in Conversion. If invoking the native Unicode API does not require transcoding, implementations are encouraged to diagnose invalid code units.