3328. Clarify that std::string is not good for UTF-8

Section: D.21 [depr.fs.path.factory] Status: C++20 Submitter: The Netherlands Opened: 2019-11-07 Last modified: 2021-02-25

Priority: 0

View all other issues in [depr.fs.path.factory].

View all issues with C++20 status.

Discussion:

Addresses NL 375

Example in deprecated section implies that std::string is the type to use for utf8 strings.

[Example: A string is to be read from a database that is encoded in UTF-8, and used to create a directory using the native encoding for filenames:

namespace fs = std::filesystem;
std::string utf8_string = read_utf8_data();
fs::create_directory(fs::u8path(utf8_string));

Proposed change:

Add clarification that std::string is the wrong type for utf8 strings

Jeff Garland:

SG16 in Belfast: Recommend to accept with a modification to update the example in D.21 [depr.fs.path.factory] p4 to state that std::u8string should be preferred for UTF-8 data.

Rationale: The example code is representative of historic use of std::filesystem::u8path and should not be changed to use std::u8string. The recommended change is to a non-normative example and may therefore be considered editorial.

Previous resolution [SUPERSEDED]:

This wording is relative to N4835.

  1. Modify D.21 [depr.fs.path.factory] as indicated:

    -4- [Example: A string is to be read from a database that is encoded in UTF-8, and used to create a directory using the native encoding for filenames:

    namespace fs = std::filesystem;
    std::string utf8_string = read_utf8_data();
    fs::create_directory(fs::u8path(utf8_string));
    
    For POSIX-based operating systems with the native narrow encoding set to UTF-8, no encoding or type conversion occurs.

    For POSIX-based operating systems with the native narrow encoding not set to UTF-8, a conversion to UTF-32 occurs, followed by a conversion to the current native narrow encoding. Some Unicode characters may have no native character set representation.

    For Windows-based operating systems a conversion from UTF-8 to UTF-16 occurs. — end example]

    [Note: The example above is representative of historic use of filesystem u8path. New code should use std::u8string in place of std::string. — end note]

LWG Belfast Friday Morning

Requested changes:

Billy O'Neal provides updated wording.

[2020-02 Moved to Immediate on Tuesday in Prague.]

Proposed resolution:

This wording is relative to N4835.

  1. Modify D.21 [depr.fs.path.factory] as indicated:

    -4- [Example: A string is to be read from a database that is encoded in UTF-8, and used to create a directory using the native encoding for filenames:

    namespace fs = std::filesystem;
    std::string utf8_string = read_utf8_data();
    fs::create_directory(fs::u8path(utf8_string));
    
    For POSIX-based operating systems with the native narrow encoding set to UTF-8, no encoding or type conversion occurs.

    For POSIX-based operating systems with the native narrow encoding not set to UTF-8, a conversion to UTF-32 occurs, followed by a conversion to the current native narrow encoding. Some Unicode characters may have no native character set representation.

    For Windows-based operating systems a conversion from UTF-8 to UTF-16 occurs. — end example]

    [Note: The example above is representative of a historical use of filesystem::u8path. Passing a std::u8string to path's constructor is preferred for an indication of UTF-8 encoding more consistent with path's handling of other encodings. — end note]