Safe to use substr?

Discussion in 'C++' started by Immortal Nephi, Jan 31, 2010.

  1. I want to know that size_type returns –1 (minus one) is safe before I
    extract one string into two substrings. First example is safe and
    second example is not sure.

    const basic_string <char>::size_type npos = -1;
    basic_string< char >::size_type begin_index, end_index, length_index;

    end_index = 0;

    string data = "Hello World!!", token1, token2;


    First example:

    begin_index = data.find_first_not_of( " ", end_index );
    end_index = data.find_first_of( " ", begin_index );
    token1 = data.substr( begin_index, end_index - begin_index );
    length_index = token1.length();


    begin_index returns 0 and end_index returns 5. substr is safe.


    Second example:

    begin_index = data.find_first_not_of( " ", end_index );
    end_index = data.find_first_of( " ", begin_index );
    token2 = data.substr( begin_index, end_index - begin_index );
    length_index = token2.length();

    begin_index returns 6 and end_index returns –1. Is substr safe for
    token2 because end_index returns –1 indicates space character is not
    found.

    Another question—is size_type the same as size_t? They are always
    unsigned maximum integer. Can I always copy variable from size_type
    to signed integer or unsigned integer?

    const basic_string <char>::size_type npos = -1;

    signed int sNpos = npos;
    unsigned int uNpos = npos;
     
    Immortal Nephi, Jan 31, 2010
    #1
    1. Advertising

  2. Immortal Nephi

    Robert Fendt Guest

    And thus spake Immortal Nephi <>
    Sat, 30 Jan 2010 18:49:34 -0800 (PST):

    > I want to know that size_type returns –1 (minus one) is safe before I
    > extract one string into two substrings. First example is safe and
    > second example is not sure.
    >
    > const basic_string <char>::size_type npos = -1;


    Why don't you just use std::string (which is a typedef of std::basic_string<char>)? It is more readable. Secondly, consider using string::npos instead of redefining it yourself. IIRC, the exact definition of npos is implementation-defined, thus it is dangerous to assume too much about it. It _is_ in fact defined as (size_t)-1 on almost all systems, but strictly speaking that depends on implementation and processor architecture.

    > begin_index returns 0 and end_index returns 5. substr is safe.


    Let's just say, it does what you expected it to do.

    > Second example:
    >
    > begin_index = data.find_first_not_of( " ", end_index );
    > end_index = data.find_first_of( " ", begin_index );
    > token2 = data.substr( begin_index, end_index - begin_index );
    > length_index = token2.length();
    >
    > begin_index returns 6 and end_index returns –1. Is substr safe for
    > token2 because end_index returns –1 indicates space character is not
    > found.


    Yes. The standard specifies that its parameters are of type string::size_type, thus (at least in case of basic_string<char> and basic_string<wchar_t>) they are definitely unsigned. So in fact you are passing a _very_ large number as second parameter. My standard library docs state that if the second parameter points beyond the string, the end of the string is assumed instead (in fact, the default value for the second parameter is string::npos).

    > Another question—is size_type the same as size_t? They are always
    > unsigned maximum integer. Can I always copy variable from size_type
    > to signed integer or unsigned integer?


    First question: yes, and no. size_type gets into string via traits class templates. If you are not familiar with that technique, I suggest you read up on it, since it is used extensively throughout the STL. The point being that you can adapt basic_string to just about any type of underlying data, thus it does not assume that every length is of type size_t but rather gets its definition via a traits template.

    That said, it _is_ true that basic_string<char> and basic_string<wchar_t> (i.e., string and wstring) do use a definition of size_type that is identical to size_t.

    Second question: no. You cannot assume that size_t is the same size as "int". It might, or it might not (in fact, e.g. on newer MSVC++ in 64bit mode, it is not!). Secondly, while it is safe to cast an unsigned value to signed and back (the resulting value being IIRC guaranteed to be identical to the original), the semantics of interpreting an unsigned value as signed if it is 'too large' are unspecified.

    On most systems, casting a large unsigned number to signed yields a negative number, since that is how signed values are usually implemented. However, I don't think the standard actually specifies that casting numeric_limits<max>(unsigned) to int actually yields "-1".

    Regards,
    Robert
     
    Robert Fendt, Jan 31, 2010
    #2
    1. Advertising

  3. Immortal Nephi

    James Kanze Guest

    On Jan 31, 2:49 am, Immortal Nephi <> wrote:
    > I want to know that size_type returns -1 (minus one)


    size_type never contains -1. It can't, since it is an unsigned
    type. (Also, variables and types don't "return" anything. Only
    functions return things.)

    > is safe before I extract one string into two substrings.
    > First example is safe and second example is not sure.


    > const basic_string <char>::size_type npos = -1;


    Which results in an implicit conversion, according to the rules
    of conversion of signed to unsigned. Basically, npos will be
    the largest possible value of size_type.

    But why are you defining this? (And why are you using
    basic_string< char > instead of the typedef std::string?) If,
    for convenience, you want a local constant variable (to be able
    to write npos, rather than std::string::npos), then:

    std::string::size_type const npos = std::string::npos;

    is the simplest solution.

    > basic_string< char >::size_type begin_index, end_index, length_index;


    Just a general rule (good practice, not a language requirement):
    don't define variables until you can initialize them.

    > end_index = 0;


    > string data = "Hello World!!", token1, token2;


    > First example:


    > begin_index = data.find_first_not_of( " ", end_index );
    > end_index = data.find_first_of( " ", begin_index );
    > token1 = data.substr( begin_index, end_index - begin_index );
    > length_index = token1.length();


    > begin_index returns 0 and end_index returns 5. substr is safe.


    > Second example:


    > begin_index = data.find_first_not_of( " ", end_index );
    > end_index = data.find_first_of( " ", begin_index );
    > token2 = data.substr( begin_index, end_index - begin_index );
    > length_index = token2.length();


    > begin_index returns 6 and end_index returns -1.


    Again, end_index doesn't return anything; data.substring returns
    std::string::npos. Which is the largest possible value which
    can be held in an std::string::size_type.

    > Is substr safe for token2 because end_index returns -1
    > indicates space character is not found.


    What does the documentation for substr say? What is the meaning
    of the second argument? (I don't have my copy of the standard
    handy to quote exactly, but what it says is something along the
    lines of "the second argument specifies the maximum length of
    the returned string", and that the return value is something
    like "std::string( s.begin() + position, s.begin() + position +
    std::min(length, s.size() - position))".)

    > Another question---is size_type the same as size_t?


    For std::string and std::wstring, yes. If you instantiate
    std::basic_string with a non-standard allocator, not
    necessarily.

    > They are always unsigned maximum integer.


    No. size_t is an unsigned integer large enough that the size of
    the largest possible object can be represented in it. I've used
    machines where size_t was 16 bits, for example.

    > Can I always copy variable from size_type to signed integer or
    > unsigned integer?


    There are several possible answers to that question. If you
    mean copy without loss of value, the answer is no; a lot of
    modern machines have a 64 bit size_type, but a 32 bit integer
    type, and there's no way you can convert a 64 bit type into a 32
    bit type without loss of value.

    Formally, of course, you can convert to the unsigned
    integer---the results of converting to the signed integer are
    implementation defined, but on most implementations, the
    conversion is well defined as well. But if the value doesn't
    fit, you'll get some other value.

    Finally, in practice, it's likely that practical constraints
    mean that you won't have strings larger than what can be
    represented in an int. In which case, there's no problem.

    > const basic_string <char>::size_type npos = -1;


    > signed int sNpos = npos;


    The results here are implementation defined. It's very likely
    that sNpos will end up -1, but it's not guaranteed by the
    standard. (And if sNpos does end up -1, then the conversion
    back to size_t is guaranteed, so comparison with a size_t will
    work.)

    > unsigned int uNpos = npos;


    Perfectly legal, but uNpos will not compare equal to npos on
    most 64 bit machines.

    I'm not too clear as to what your goal is. First, for better or
    for worse, std::string uses an unsigned size_t for all of its
    indexing and positionning. Mixing signed and unsigned in C++
    often gives surprising results, and should be avoided. (Using
    unsigned for numeric values should generally be avoided as well,
    but the rule about not mixing is more critical, and trumps this
    rule---if an external library uses unsigned, you should stick
    with whatever type it uses.)

    Also, and this is really just a question of personal preference,
    but I prefer by far using the algorithms in <algorithm> to the
    special member functions in std::string. Once you're used to
    the standard library, it just seems more comfortable working
    with iterators than with indexes. And it avoids all of the
    issues related to unsigned types in C++. Given that any time
    you're going to be processing text, you're going to be using
    functions like isalpha, isspace, etc. a lot, the first thing to
    do is to defined predicate object types for each of the
    functions and its complement. (Macros make this fairly easy.)
    Then you use them with std::find_if. So your initial example
    becomes:

    typedef std::string::const_iterator text_iterator;
    std::string const data( "Hello, world!" );
    text_iterator begin_token = std::find_if(data.begin(), data.end(),
    is_not_space());
    text_iterator end_token = std::find_if(begin_token, data.end(),
    is_space());
    // or is_not_alnum(), or whatever...
    std::string const first_token( begin_token, end_token );

    (As I say, this is a personal preference, not any established
    rule. But IMHO, it fits in better with the philosophy of the
    standard library.)

    --
    James Kanze
     
    James Kanze, Jan 31, 2010
    #3
  4. Immortal Nephi

    James Kanze Guest

    On Jan 31, 8:47 am, Robert Fendt <> wrote:
    > And thus spake Immortal Nephi <>
    > Sat, 30 Jan 2010 18:49:34 -0800 (PST):


    > > I want to know that size_type returns -1 (minus one) is safe
    > > before I extract one string into two substrings. First
    > > example is safe and second example is not sure.


    > > const basic_string <char>::size_type npos = -1;


    > Why don't you just use std::string (which is a typedef of
    > std::basic_string<char>)? It is more readable. Secondly,
    > consider using string::npos instead of redefining it yourself.
    > IIRC, the exact definition of npos is implementation-defined,
    > thus it is dangerous to assume too much about it. It _is_ in
    > fact defined as (size_t)-1 on almost all systems, but strictly
    > speaking that depends on implementation and processor
    > architecture.


    The standard requires it to be defined as
    static_cast< size_type >( -1 )
    The implemenation and process architecture dependencies are in
    the definition of size_type (which must be size_t in the default
    allocator). The actual numeric value will vary, but it is well
    defined, and used correctly as a sentinal value, there should be
    no portability problems.

    --
    James Kanze
     
    James Kanze, Jan 31, 2010
    #4
  5. On Jan 31, 2:47 am, Robert Fendt <> wrote:
    > And thus spake Immortal Nephi <>
    > Sat, 30 Jan 2010 18:49:34 -0800 (PST):
    >
    > >    I want to know that size_type returns –1 (minus one) is safe before I
    > > extract one string into two substrings.  First example is safe and
    > > second example is not sure.

    >
    > >    const basic_string <char>::size_type npos = -1;

    >
    > Why don't you just use std::string (which is a typedef of std::basic_string<char>)? It is more readable. Secondly, consider using string::npos instead of redefining it yourself. IIRC, the exact definition of npos is implementation-defined, thus it is dangerous to assume too much about it. It _is_ in fact defined as (size_t)-1 on almost all systems, but strictly speaking that depends on implementation and processor architecture.
    >
    > > begin_index returns 0 and end_index returns 5.  substr is safe.

    >
    > Let's just say, it does what you expected it to do.
    >
    > > Second example:

    >
    > >    begin_index = data.find_first_not_of( " ", end_index );
    > >    end_index = data.find_first_of( " ", begin_index );
    > >    token2 = data.substr( begin_index, end_index - begin_index );
    > >    length_index = token2.length();

    >
    > > begin_index returns 6 and end_index returns –1.  Is substr safe for
    > > token2 because end_index returns –1 indicates space character is not
    > > found.

    >
    > Yes. The standard specifies that its parameters are of type string::size_type, thus (at least in case of basic_string<char> and basic_string<wchar_t>) they are definitely unsigned. So in fact you are passing a _very_ large number as second parameter. My standard library docs state that if the second parameter points beyond the string, the end of the string is assumed instead (in fact, the default value for the second parameter is string::npos).


    find_first_not of() function and find_first_of() function always
    return unsigned integer like size_type. The size _type gives you the
    information if unsigned integer is valid or not valid.
    The minimum size_type is 0 and maximum size_type is 0xFFFFFFFE (on 32
    bit machine). Both integer values provide you the information how
    many elements do string have. The 0xFFFFFFFF or –1 indicates that
    data in the string is not found or is not valid.
    Let’s discuss substr() function. The substr() function’s first
    parameter must always have minimum size_type and maximum size_type.
    If 0xFFFFFFFF or –1 is detected, then exception will be thrown.
    The second parameter always has default 0xFFFFFFFF or –1 if you do
    not assign second parameter.

    For example

    string data( “Hello World!“ );
    string token = data.substr( 0 );

    The data has 11 elements in length. Notice that second parameter in
    substr() function is not assigned. The default is –1. How do substr
    () function know to count 11 elements correctly? It should always
    count all 256 values of character set including ‘\0’.
    If you insert ‘\0’ between Hello and World ( “Hello \0World!” ), then
    it will count 12 elements including ‘\0’. The string object is not
    like C string. It does not check null terminator and it always check
    number of elements in size with size() function or length() function.

    end_token = 5;
    begin_token = data.find_first_not_of( " ", end_token + 1 );
    end_token = data.find_first_of( " ", begin_token );

    string token = data.substr( begin_token, end_token - begin_token );
    length_token = token.length();

    find_first_of() function returns –1 indicates space is not found.
    substr() function cannot guarantee to assume to be 11. Possibly, it
    will go beyond 11 elements boundary until it detects ‘\0’ and returns
    the wrong end_token value.

    I think that my example code above is not a good solution. I will
    use iterator loop to test each element instead.
     
    Immortal Nephi, Jan 31, 2010
    #5
  6. Immortal Nephi

    LR Guest

    Immortal Nephi wrote:

    > find_first_not of() function and find_first_of() function always
    > return unsigned integer like size_type. The size _type gives you the
    > information if unsigned integer is valid or not valid.
    > The minimum size_type is 0 and maximum size_type is 0xFFFFFFFE (on 32
    > bit machine). Both integer values provide you the information how
    > many elements do string have. The 0xFFFFFFFF or –1 indicates that
    > data in the string is not found or is not valid.



    > Let’s discuss substr() function. The substr() function’s first
    > parameter must always have minimum size_type and maximum size_type.


    I think you mean the argument pos must be between 0 and size().
    const std::string s ("Hello World");
    const std::string t = s.substr(); // pos == 0
    const std::string u = s.substr(0);
    const std::string v = s.substr(s.size());

    > If 0xFFFFFFFF or –1 is detected, then exception will be thrown.
    > The second parameter always has default 0xFFFFFFFF or –1 if you do
    > not assign second parameter.
    >
    > For example
    >
    > string data( “Hello World!“ );
    > string token = data.substr( 0 );
    >
    > The data has 11 elements in length. Notice that second parameter in
    > substr() function is not assigned. The default is –1. How do substr
    > () function know to count 11 elements correctly?


    std::string keeps track of the length or size of the string. It doesn't
    use zero termination the way C strings do.

    Also, note that a std::string cannot grow to be larger than
    std::string::max_size(). In the implementation I use this is
    std::numeric_limits<std::string::size_type>::max()-1.



    >It should always
    > count all 256 values of character set including ‘\0’.


    It will. Try this:

    const std::string s =
    std::string("Hello") + '\0' + std::string("World");
    std::cout << s << std::endl;
    std::cout << s.size() << std::endl;

    > If you insert ‘\0’ between Hello and World ( “Hello \0World!” ), then
    > it will count 12 elements including ‘\0’. The string object is not
    > like C string. It does not check null terminator and it always check
    > number of elements in size with size() function or length() function.
    >
    > end_token = 5;
    > begin_token = data.find_first_not_of( " ", end_token + 1 );


    You're not looking for the '\0'.

    > end_token = data.find_first_of( " ", begin_token );


    Same.
    >
    > string token = data.substr( begin_token, end_token - begin_token );
    > length_token = token.length();


    I think this will work:

    const std::string
    data = std::string("Hello ") + '\0' + std::string("World");

    const std::string look_for = std::string(" ")+'\0';
    const std::string::size_type first = data.find_first_of(look_for);
    const std::string::size_type
    begin_token = data.find_first_not_of(look_for, first+1);
    const std::string::size_type
    end_token = data.find_first_of(look_for, begin_token);

    const std::string
    token = data.substr(begin_token, end_token-begin_token);
    const std::string::size_type length_token = token.length();


    LR
     
    LR, Jan 31, 2010
    #6
  7. Immortal Nephi

    James Kanze Guest

    On 31 Jan, 16:58, Immortal Nephi <> wrote:
    > On Jan 31, 2:47 am, Robert Fendt <> wrote:
    > > And thus spake Immortal Nephi <>
    > > Sat, 30 Jan 2010 18:49:34 -0800 (PST):


    [...]
    > > > Second example:


    > > > begin_index = data.find_first_not_of( " ", end_index );
    > > > end_index = data.find_first_of( " ", begin_index );
    > > > token2 = data.substr( begin_index, end_index - begin_index );
    > > > length_index = token2.length();

    >
    > > > begin_index returns 6 and end_index returns -1. Is substr
    > > > safe for token2 because end_index returns -1 indicates
    > > > space character is not found.


    > > Yes. The standard specifies that its parameters are of type
    > > string::size_type, thus (at least in case of
    > > basic_string<char> and basic_string<wchar_t>) they are
    > > definitely unsigned. So in fact you are passing a _very_
    > > large number as second parameter. My standard library docs
    > > state that if the second parameter points beyond the string,
    > > the end of the string is assumed instead (in fact, the
    > > default value for the second parameter is string::npos).


    > find_first_not of() function and find_first_of() function
    > always return unsigned integer like size_type. The size_type
    > gives you the information if unsigned integer is valid or not
    > valid.


    I'm afraid I don't understand that last sentence. A type can't
    give you any information.

    > The minimum size_type is 0 and maximum size_type is 0xFFFFFFFE
    > (on 32 bit machine). Both integer values provide you the
    > information how many elements do string have.


    What do you mean by "both" here? A zero value designates the
    first character of the string, or indicates that the length of
    the string is 0. The maximum value is used as a sentinal:
    std::string::size will never return it. The only functions
    which do return it are those which look for something, and they
    use it as a special value, to indicate that they didn't find
    what they were looking for.

    > The 0xFFFFFFFF or -1 indicates that data in the string is not
    > found or is not valid.


    (Just a nit, but 0xFFFFFFFF is *not* -1. They're two different
    values.)

    > Let’s discuss substr() function. The substr() function’s
    > first parameter must always have minimum size_type and maximum
    > size_type.


    The first argument must be in the range [0...s.size()], where s
    is the string you're concerned with. It specifies the index of
    the first character in the substring you want.

    > If 0xFFFFFFFF or -1 is detected, then exception will be
    > thrown.


    (Again, -1 cannot be detected, because it cannot be represented
    on the type of the argument.)

    > The second parameter always has default 0xFFFFFFFF or -1 if
    > you do not assign second parameter.


    > For example


    > string data( "Hello World!" );
    > string token = data.substr( 0 );


    > The data has 11 elements in length. Notice that second
    > parameter in substr() function is not assigned. The default
    > is -1.


    The default is std::string::npos, not -1.

    > How do substr () function know to count 11 elements
    > correctly?


    It's a member function. It knows the length of the string.
    (How do you think std::string::size works?)

    > It should always count all 256 values of character set
    > including ‘\0’.


    It doesn't count anything.

    > If you insert ‘\0’ between Hello and World ( "Hello \0World!"
    > ), then it will count 12 elements including ‘\0’. The string
    > object is not like C string. It does not check null
    > terminator and it always check number of elements in size with
    > size() function or length() function.


    > end_token = 5;
    > begin_token = data.find_first_not_of( " ", end_token + 1 );
    > end_token = data.find_first_of( " ", begin_token );


    > string token = data.substr( begin_token, end_token - begin_token );
    > length_token = token.length();


    > find_first_of() function returns -1 indicates space is not found.


    It returns std::string::npos (which is *not* -1) to indicate
    that it didn't find any character in the list given.

    > substr() function cannot guarantee to assume to be 11.
    > Possibly, it will go beyond 11 elements boundary until it
    > detects ‘\0’ and returns the wrong end_token value.


    Why on earth would it do a thing like that? An std::string
    knows its length, and unless the standard specifically states
    otherwise, it uses this length. No member function ever looks
    for '\0'.

    > I think that my example code above is not a good solution. I
    > will use iterator loop to test each element instead.


    I think you still have a lot to learn about the standard
    library. (And also expressing yourself clearly---which is a
    prerequisite to good programming. I don't know how much of this
    is due to English not being your native language, however.)

    --
    James Kanze
     
    James Kanze, Jan 31, 2010
    #7
  8. Immortal Nephi

    James Kanze Guest

    On 31 Jan, 21:19, LR <> wrote:
    > Immortal Nephi wrote:


    [...]
    > I think this will work:


    > const std::string
    > data = std::string("Hello ") + '\0' + std::string("World");


    An even simpler solution might be:
    std::string const data( "Hello \0World", 12 );

    --
    James Kanze
     
    James Kanze, Jan 31, 2010
    #8
  9. Immortal Nephi

    Öö Tiib Guest

    On Feb 1, 12:14 am, "Leigh Johnston" <> wrote:
    > >> If 0xFFFFFFFF or -1 is detected, then exception will be
    > >> thrown.

    >
    > > (Again, -1 cannot be detected, because it cannot be represented
    > > on the type of the argument.)

    >
    > assert(static_cast<unsigned int>(-1) == -1);
    >
    > :)
    >
    > /Leigh


    Anyway you get diagnostic warnings for it from most compilers. If
    'static_cast<unsigned int>(-1)' is needed then '~0U' is perhaps
    shortest form that makes all compilers happy with it.
     
    Öö Tiib, Feb 1, 2010
    #9
  10. Immortal Nephi

    LR Guest

    James Kanze wrote:
    > On 31 Jan, 16:58, Immortal Nephi <> wrote:
    >
    >> The minimum size_type is 0 and maximum size_type is 0xFFFFFFFE
    >> (on 32 bit machine). Both integer values provide you the
    >> information how many elements do string have.

    >
    > What do you mean by "both" here? A zero value designates the
    > first character of the string, or indicates that the length of
    > the string is 0. The maximum value is used as a sentinal:
    > std::string::size will never return it. The only functions
    > which do return it are those which look for something, and they
    > use it as a special value, to indicate that they didn't find
    > what they were looking for.
    >
    >> The 0xFFFFFFFF or -1 indicates that data in the string is not
    >> found or is not valid.

    >
    > (Just a nit, but 0xFFFFFFFF is *not* -1. They're two different
    > values.)
    >
    >> Let’s discuss substr() function. The substr() function’s
    >> first parameter must always have minimum size_type and maximum
    >> size_type.

    >
    > The first argument must be in the range [0...s.size()], where s
    > is the string you're concerned with. It specifies the index of
    > the first character in the substring you want.
    >
    >> If 0xFFFFFFFF or -1 is detected, then exception will be
    >> thrown.

    >
    > (Again, -1 cannot be detected, because it cannot be represented
    > on the type of the argument.)


    My copy of the standard, or my most recent copy of a working draft
    explicitly initializes static const size_type npos = -1;

    LR
     
    LR, Feb 1, 2010
    #10
  11. Immortal Nephi

    LR Guest

    James Kanze wrote:
    > On 31 Jan, 21:19, LR <> wrote:
    >> Immortal Nephi wrote:

    >
    > [...]
    >> I think this will work:

    >
    >> const std::string
    >> data = std::string("Hello ") + '\0' + std::string("World");

    >
    > An even simpler solution might be:
    > std::string const data( "Hello \0World", 12 );


    I didn't think of that, but I hate to count things since I think it
    makes maintenance more difficult.

    LR
     
    LR, Feb 1, 2010
    #11
  12. LR wrote:
    > James Kanze wrote:
    >> On 31 Jan, 21:19, LR <> wrote:
    >>> Immortal Nephi wrote:

    >> [...]
    >>> I think this will work:
    >>> const std::string
    >>> data = std::string("Hello ") + '\0' + std::string("World");

    >> An even simpler solution might be:
    >> std::string const data( "Hello \0World", 12 );

    >
    > I didn't think of that, but I hate to count things since I think it
    > makes maintenance more difficult.
    >
    > LR


    Presumably then you could do something like:
    const char blah[] = "Hello \0World";
    const std::string data(blah, sizeof(blah));

    James
     
    James Lothian, Feb 1, 2010
    #12
  13. Immortal Nephi

    James Kanze Guest

    On Feb 1, 4:28 am, LR <> wrote:
    > James Kanze wrote:
    > > On 31 Jan, 21:19, LR <> wrote:
    > >> Immortal Nephi wrote:


    > > [...]
    > >> I think this will work:


    > >> const std::string
    > >> data = std::string("Hello ") + '\0' + std::string("World");


    > > An even simpler solution might be:
    > > std::string const data( "Hello \0World", 12 );


    > I didn't think of that, but I hate to count things since I
    > think it makes maintenance more difficult.


    Yes. I was afraid my usual solution would confuse the original
    poster:

    static char const init[] = "Hello \0World";
    std::string const data(begin(init), end(init)-1);

    (In this case, of course, begin and end are the usual template
    functions.)

    As soon as you accept to give a name to the initialization, you
    can get the compiler to do the counting. You need the name,
    however, since you need to refer to the initialization object
    twice. (
    static char const init[] = "Hello \0World";
    std::string const data(init, init + sizeof(init) - 1);
    will also work, but the begin and end solution is more general.)

    --
    James Kanze

    value,
     
    James Kanze, Feb 1, 2010
    #13
  14. Immortal Nephi

    James Kanze Guest

    On Feb 1, 4:54 pm, James Lothian
    <> wrote:
    > LR wrote:
    > > James Kanze wrote:
    > >> On 31 Jan, 21:19, LR <> wrote:
    > >>> Immortal Nephi wrote:
    > >> [...]
    > >>> I think this will work:
    > >>> const std::string
    > >>> data = std::string("Hello ") + '\0' + std::string("World");
    > >> An even simpler solution might be:
    > >> std::string const data( "Hello \0World", 12 );


    > > I didn't think of that, but I hate to count things since I think it
    > > makes maintenance more difficult.


    > Presumably then you could do something like:
    > const char blah[] = "Hello \0World";
    > const std::string data(blah, sizeof(blah));


    sizeof(blah) - 1, if you don't want the final '\0'.

    --
    James Kanze
     
    James Kanze, Feb 1, 2010
    #14
  15. Immortal Nephi

    James Kanze Guest

    On Feb 1, 12:52 am, Öö Tiib <> wrote:
    > On Feb 1, 12:14 am, "Leigh Johnston" <> wrote:


    > > >> If 0xFFFFFFFF or -1 is detected, then exception will be
    > > >> thrown.


    > > > (Again, -1 cannot be detected, because it cannot be
    > > > represented on the type of the argument.)


    > > assert(static_cast<unsigned int>(-1) == -1);


    > > :)


    > Anyway you get diagnostic warnings for it from most compilers.
    > If 'static_cast<unsigned int>(-1)' is needed then '~0U' is
    > perhaps shortest form that makes all compilers happy with it.


    Except that it doesn't work. There are only two portable
    solutions to get the vaue yourself:
    static_cast< size_t >( -1 )
    or
    std::numeric_limits< size_t >::max();
    (Both are guaranteed to be equal.)

    Of course, the best solution is just to use std::string::npos.
    There's no reason for you to worry about anything else. (And
    you don't care what the value really is.)

    --
    James Kanze
     
    James Kanze, Feb 1, 2010
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Kees Hoogendijk

    getline and substr

    Kees Hoogendijk, Dec 20, 2003, in forum: C++
    Replies:
    4
    Views:
    566
    Kees Hoogendijk
    Dec 21, 2003
  2. entropy123

    [C++] Odd Problem with "substr"

    entropy123, Jun 1, 2005, in forum: C++
    Replies:
    7
    Views:
    554
    Stephen Howe
    Jun 2, 2005
  3. David Resnick
    Replies:
    1
    Views:
    318
    Victor Bazarov
    Apr 10, 2006
  4. sks

    String - substr query

    sks, Jul 12, 2006, in forum: C++
    Replies:
    6
    Views:
    438
  5. Alextophi
    Replies:
    2
    Views:
    155
    Alextophi
    Sep 14, 2005
Loading...

Share This Page