22 Localization library [localization]

22.4 Standard locale categories [locale.categories]

22.4.1 The ctype category [category.ctype]

22.4.1.4 Class template codecvt [locale.codecvt]

namespace std {
  class codecvt_base {
  public:
    enum result { ok, partial, error, noconv };
  };

  template <class internT, class externT, class stateT>
  class codecvt : public locale::facet, public codecvt_base {
  public:
    typedef internT  intern_type;
    typedef externT  extern_type;
    typedef stateT state_type;

    explicit codecvt(size_t refs = 0);

    result out(stateT& state,
               const internT* from, const internT* from_end, const internT*& from_next,
               externT*   to,       externT* to_end, externT*& to_next) const;
    result unshift(stateT& state,
                   externT*   to,        externT* to_end, externT*& to_next) const;
    result in(stateT& state,
              const externT* from, const externT* from_end, const externT*& from_next,
              internT*   to,       internT* to_end, internT*& to_next) const;
    int encoding() const noexcept;
    bool always_noconv() const noexcept;
    int length(stateT&, const externT* from, const externT* end,
               size_t max) const;
    int max_length() const noexcept;

    static locale::id id;

  protected:
    ~codecvt();
    virtual result do_out(stateT& state,
                          const internT* from, const internT* from_end, const internT*& from_next,
                          externT* to,         externT* to_end, externT*& to_next) const;
    virtual result do_in(stateT& state,
                         const externT* from, const externT* from_end, const externT*& from_next,
                         internT* to,         internT* to_end, internT*& to_next) const;
    virtual result do_unshift(stateT& state,
                              externT* to,         externT* to_end, externT*& to_next) const;
    virtual int do_encoding() const noexcept;
    virtual bool do_always_noconv() const noexcept;
    virtual int do_length(stateT&, const externT* from,
                          const externT* end, size_t max) const;
    virtual int do_max_length() const noexcept;
  };
}

The class codecvt<internT,externT,stateT> is for use when converting from one character encoding to another, such as from wide characters to multibyte characters or between wide character encodings such as Unicode and EUC.

The stateT argument selects the pair of character encodings being mapped between.

The specializations required in Table [tab:localization.category.facets] ([locale.category]) convert the implementation-defined native character set. codecvt<char, char, mbstate_t> implements a degenerate conversion; it does not convert at all. The specialization codecvt<char16_t, char, mbstate_t> converts between the UTF-16 and UTF-8 encoding forms, and the specialization codecvt <char32_t, char, mbstate_t> converts between the UTF-32 and UTF-8 encoding forms. codecvt<wchar_t,char,mbstate_t> converts between the native character sets for narrow and wide characters. Specializations on mbstate_t perform conversion between encodings known to the library implementer. Other encodings can be converted by specializing on a user-defined stateT type. Objects of type stateT can contain any state that is useful to communicate to or from the specialized do_in or do_out members.

22.4.1.4.1 codecvt members [locale.codecvt.members]

result out(stateT& state, const internT* from, const internT* from_end, const internT*& from_next, externT* to, externT* to_end, externT*& to_next) const;

Returns: do_out(state, from, from_end, from_next, to, to_end, to_next)

result unshift(stateT& state, externT* to, externT* to_end, externT*& to_next) const;

Returns: do_unshift(state, to, to_end, to_next)

result in(stateT& state, const externT* from, const externT* from_end, const externT*& from_next, internT* to, internT* to_end, internT*& to_next) const;

Returns: do_in(state, from, from_end, from_next, to, to_end, to_next)

int encoding() const noexcept;

Returns: do_encoding()

bool always_noconv() const noexcept;

Returns: do_always_noconv()

int length(stateT& state, const externT* from, const externT* from_end, size_t max) const;

Returns: do_length(state, from,from_end,max)

int max_length() const noexcept;

Returns: do_max_length()

22.4.1.4.2 codecvt virtual functions [locale.codecvt.virtuals]

result do_out(stateT& state, const internT* from, const internT* from_end, const internT*& from_next, externT* to, externT* to_end, externT*& to_next) const; result do_in(stateT& state, const externT* from, const externT* from_end, const externT*& from_next, internT* to, internT* to_end, internT*& to_next) const;

Requires: (from<=from_end && to<=to_end) well-defined and true; state initialized, if at the beginning of a sequence, or else equal to the result of converting the preceding characters in the sequence.

Effects: Translates characters in the source range [from,from_end), placing the results in sequential positions starting at destination to. Converts no more than (from_end-from) source elements, and stores no more than (to_end-to) destination elements.

Stops if it encounters a character it cannot convert. It always leaves the from_next and to_next pointers pointing one beyond the last element successfully converted. If returns noconv, internT and externT are the same type and the converted sequence is identical to the input sequence [from, from_next). to_next is set equal to to, the value of state is unchanged, and there are no changes to the values in [to, to_end).

A codecvt facet that is used by basic_filebuf ([file.streams]) shall have the property that if

do_out(state, from, from_end, from_next, to, to_end, to_next)

would return ok, where from != from_end, then

do_out(state, from, from + 1, from_next, to, to_end, to_next)

shall also return ok, and that if

do_in(state, from, from_end, from_next, to, to_end, to_next)

would return ok, where to != to_end, then

do_in(state, from, from_end, from_next, to, to + 1, to_next)

shall also return ok.243Note: As a result of operations on state, it can return ok or partial and set from_next == from and to_next != to.  — end note ]

Remarks: Its operations on state are unspecified. [ Note: This argument can be used, for example, to maintain shift state, to specify conversion options (such as count only), or to identify a cache of seek offsets.  — end note ]

Returns: An enumeration value, as summarized in Table [tab:localization.convert.result.values.out.in].

Table 83do_in/do_out result values
ValueMeaning
ok completed the conversion
partial not all source characters converted
error encountered a character in [from,from_end) that it could not convert
noconv internT and externT are the same type, and input sequence is identical to converted sequence

A return value of partial, if (from_next==from_end), indicates that either the destination sequence has not absorbed all the available destination elements, or that additional source elements are needed before another destination element can be produced.

result do_unshift(stateT& state, externT* to, externT* to_end, externT*& to_next) const;

Requires: (to <= to_end) well defined and true; state initialized, if at the beginning of a sequence, or else equal to the result of converting the preceding characters in the sequence.

Effects: Places characters starting at to that should be appended to terminate a sequence when the current stateT is given by state.244 Stores no more than (to_end-to) destination elements, and leaves the to_next pointer pointing one beyond the last element successfully stored.

Returns: An enumeration value, as summarized in Table [tab:localization.convert.result.values.unshift].

Table 84do_unshift result values
ValueMeaning
ok completed the sequence
partial space for more than to_end-to destination elements was needed to terminate a sequence given the value of state
error an unspecified error has occurred
noconv no termination is needed for this state_type

int do_encoding() const noexcept;

Returns: -1 if the encoding of the externT sequence is state-dependent; else the constant number of externT characters needed to produce an internal character; or 0 if this number is not a constant.245

bool do_always_noconv() const noexcept;

Returns: true if do_in() and do_out() return noconv for all valid argument values. codecvt<char, char, mbstate_t> returns true.

int do_length(stateT& state, const externT* from, const externT* from_end, size_t max) const;

Requires: (from<=from_end) well-defined and true; state initialized, if at the beginning of a sequence, or else equal to the result of converting the preceding characters in the sequence.

Effects: The effect on the state argument is “as if” it called do_in(state, from, from_end, from, to, to+max, to) for to pointing to a buffer of at least max elements.

Returns: (from_next-from) where from_next is the largest value in the range [from,from_end] such that the sequence of values in the range [from,from_next) represents max or fewer valid complete characters of type internT. The specialization codecvt<char, char, mbstate_t>, returns the lesser of max and (from_end-from).

int do_max_length() const noexcept;

Returns: The maximum value that do_length(state, from, from_end, 1) can return for any valid range [from, from_end) and stateT value state. The specialization codecvt<char, char, mbstate_t>::do_max_length() returns 1.

Informally, this means that basic_filebuf assumes that the mappings from internal to external characters is 1 to N: a codecvt facet that is used by basic_filebuf must be able to translate characters one internal character at a time.

Typically these will be characters to return the state to stateT()

If encoding() yields -1, then more than max_length() externT elements may be consumed when producing a single internT character, and additional externT elements may appear at the end of a sequence after those that yield the final internT character.