1282. A proposal to add std::split algorithm

Section: 26 [algorithms] Status: NAD Submitter: Igor Semenov Opened: 2009-12-07 Last modified: 2019-02-26

Priority: Not Prioritized

View other active issues in [algorithms].

View all other issues in [algorithms].

View all issues with NAD status.

Discussion:

  1. Motivation and Scope

    Splitting strings into parts by some set of delimiters is an often task, but there is no simple and generalized solution in C++ Standard. Usually C++ developers use std::basic_stringstream<> to split string into parts, but there are several inconvenient restrictions:

  2. Impact on the Standard

    This algorithm doesn't interfere with any of current standard algorithms.

  3. Design Decisions

    This algorithm is implemented in terms of input/output iterators. Also, there is one additional wrapper for const CharType * specified delimiters.

  4. Example implementation

    template< class It, class DelimIt, class OutIt >
    void split( It begin, It end, DelimIt d_begin, DelimIt d_end, OutIt out )
    {
       while ( begin != end )
       {
           It it = std::find_first_of( begin, end, d_begin, d_end );
           *out++ = std::make_pair( begin, it );
           begin = std::find_first_of( it, end, d_begin, d_end,
               std::not2( std::equal_to< typename It::value_type >() ) );
       }
    }
    
    template< class It, class CharType, class OutIt >
    void split( It begin, It end, const CharType * delim, OutIt out )
    {
       split( begin, end, delim, delim + std::strlen( delim ), out );
    }
    
  5. Usage

    std::string ss( "word1 word2 word3" );
    std::vector< std::pair< std::string::const_iterator, std::string::const_iterator > > v;
    split( ss.begin(), ss.end(), " ", std::back_inserter( v ) );
    
    for ( int i = 0; i < v.size(); ++i )
    {
       std::cout << std::string( v[ i ].first, v[ i ].second ) << std::endl;
    }
    // word1
    // word2
    // word3
    

[ 2010-01-22 Moved to Tentatively NAD Future after 5 positive votes on c++std-lib. Rationale added below. ]

[LEWG Kona 2017]

Recommend NAD: Paper encouraged. Have papers for this; LEWG259.

Rationale:

The LWG is not considering completely new features for standardization at this time. We would like to revisit this good suggestion for a future TR and/or standard.

Proposed resolution:

Add to the synopsis in 26.1 [algorithms.general]:

template< class ForwardIterator1, class ForwardIterator2, class OutputIterator >
  void split( ForwardIterator1 first, ForwardIterator1 last,
              ForwardIterator2 delimiter_first, ForwardIterator2 delimiter_last,
              OutputIterator result );

template< class ForwardIterator1, class CharType, class OutputIterator >
  void split( ForwardIterator1 first, ForwardIterator1 last,
              const CharType * delimiters, OutputIterator result );

Add a new section [alg.split]:

template< class ForwardIterator1, class ForwardIterator2, class OutputIterator >
  void split( ForwardIterator1 first, ForwardIterator1 last,
              ForwardIterator2 delimiter_first, ForwardIterator2 delimiter_last,
              OutputIterator result );

1. Effects: splits the range [first, last) into parts, using any element of [delimiter_first, delimiter_last) as a delimiter. Results are pushed to output iterator in the form of std::pair<ForwardIterator1, ForwardIterator1>. Each of these pairs specifies a maximal subrange of [first, last) which does not contain a delimiter.

2. Returns: nothing.

3. Complexity: Exactly last - first assignments.

template< class ForwardIterator1, class CharType, class OutputIterator >
  void split( ForwardIterator1 first, ForwardIterator1 last,
              const CharType * delimiters, OutputIterator result );

1. Effects: split the range [first, last) into parts, using any element of delimiters (interpreted as zero-terminated string) as a delimiter. Results are pushed to output iterator in the form of std::pair<ForwardIterator1, ForwardIterator1>. Each of these pairs specifies a maximal subrange of [first, last) which does not contain a delimiter.

2. Returns: nothing.

3. Complexity: Exactly last - first assignments.