string which embed '\0' char

Discussion in 'C++' started by jeremie fouche, Jul 26, 2010.

  1. Hi

    I would like to know if this is valid to embed binary data (which can
    contain '\0') in a std::string. As I'm in a code review, I suggested
    to change this to std::vector<char>, but the dev guys told me they
    tested and it was OK.
    I tested with MSVC 2005 and Mingw-gcc4.4.0 and it was fine. But I'ms
    still not sure about this. I'm afraid that the '\0' char could
    terminate the string in some (?) methods.

    Here is the code I tested :

    #include <string>
    #include <iostream>

    int main(void)
    {
    std::string s;
    s.push_back(0);
    s += "sdfsdf\0sdqsd";

    std::cout << s.length(); // 12

    std::string s2 = s;

    std::cout << s2.length(); // 12

    return 0;
    }

    What is your point of view ?
    Thanks
    --
    Jérémie
     
    jeremie fouche, Jul 26, 2010
    #1
    1. Advertising

  2. Leigh Johnston wrote:
    >
    >
    > "jeremie fouche" <> wrote in message
    > news:...
    >> Hi
    >>
    >> I would like to know if this is valid to embed binary data (which can
    >> contain '\0') in a std::string. As I'm in a code review, I suggested
    >> to change this to std::vector<char>, but the dev guys told me they
    >> tested and it was OK.
    >> I tested with MSVC 2005 and Mingw-gcc4.4.0 and it was fine. But I'ms
    >> still not sure about this. I'm afraid that the '\0' char could
    >> terminate the string in some (?) methods.
    >>
    >> Here is the code I tested :
    >>
    >> #include <string>
    >> #include <iostream>
    >>
    >> int main(void)
    >> {
    >> std::string s;
    >> s.push_back(0);
    >> s += "sdfsdf\0sdqsd";
    >>
    >> std::cout << s.length(); // 12
    >>
    >> std::string s2 = s;
    >>
    >> std::cout << s2.length(); // 12
    >>
    >> return 0;
    >> }
    >>
    >> What is your point of view ?
    >> Thanks
    >> --

    >
    > There is nothing wrong with storing \0 in a std::string object, the only
    > issue to be aware of is that strlen(s.c_str()) != s.size().
    >
    > /Leigh

    I think I'm right in saying that the example provided by the OP won't work,
    though, because of the embedded null in the string literal.

    James
     
    James Lothian, Jul 26, 2010
    #2
    1. Advertising

  3. Leigh Johnston <>, on 26/07/2010 15:58:32, wrote:

    >
    >
    > "jeremie fouche" <> wrote in message
    > news:...
    >> Hi
    >>
    >> I would like to know if this is valid to embed binary data (which can
    >> contain '\0') in a std::string. As I'm in a code review, I suggested
    >> to change this to std::vector<char>, but the dev guys told me they
    >> tested and it was OK.
    >> I tested with MSVC 2005 and Mingw-gcc4.4.0 and it was fine. But I'ms
    >> still not sure about this. I'm afraid that the '\0' char could
    >> terminate the string in some (?) methods.
    >>
    >> Here is the code I tested :
    >>
    >> #include <string>
    >> #include <iostream>
    >>
    >> int main(void)
    >> {
    >> std::string s;
    >> s.push_back(0);
    >> s += "sdfsdf\0sdqsd";
    >>
    >> std::cout << s.length(); // 12
    >>
    >> std::string s2 = s;
    >>
    >> std::cout << s2.length(); // 12
    >>
    >> return 0;
    >> }
    >>
    >> What is your point of view ?
    >> Thanks
    >> --

    >
    > There is nothing wrong with storing \0 in a std::string object, the only
    > issue to be aware of is that strlen(s.c_str()) != s.size().


    I'm sure you're well aware of this, but for sake of further
    clarification for the OP's advantage I want to point out that if the \0
    appears in the middle of the string, such as in "before\0after" and in
    the OP example, a function taking a C-style string will not "see" the
    "after" part - unless the function is designed to expect a specific
    number of interleaved null characters, but that's another story.

    --
    FSC - http://userscripts.org/scripts/show/59948
    http://fscode.altervista.org - http://sardinias.com
     
    Francesco S. Carta, Jul 26, 2010
    #3
  4. * jeremie fouche, on 26.07.2010 16:42:
    >
    > I would like to know if this is valid to embed binary data (which can
    > contain '\0') in a std::string.


    Yes.

    It can cause problems if you convert the string to zero-terminated, because
    functions that deal with zero-terminated will naturally regard the \0 as string
    termination.

    But for std::string itself there is no problem.

    However, for non-textual data it's better to use a std::vector (as you
    suggested, snipped).

    That's because the data type should reflect what it's used for. And while
    std::string is low level it's not that level: it's not a type meant to be used
    for anything but textual data. Thus, the type indicates The Wrong Thing, unless
    by "binary data" you just mean text with embedded null-characters.


    Cheers & hth.,

    - Alf

    --
    blog at <url: http://alfps.wordpress.com>
     
    Alf P. Steinbach /Usenet, Jul 26, 2010
    #4
  5. James Lothian <>, on 26/07/2010
    16:13:46, wrote:

    > Leigh Johnston wrote:
    >>
    >>
    >> "jeremie fouche" <> wrote in message
    >> news:...
    >>> Hi
    >>>
    >>> I would like to know if this is valid to embed binary data (which can
    >>> contain '\0') in a std::string. As I'm in a code review, I suggested
    >>> to change this to std::vector<char>, but the dev guys told me they
    >>> tested and it was OK.
    >>> I tested with MSVC 2005 and Mingw-gcc4.4.0 and it was fine. But I'ms
    >>> still not sure about this. I'm afraid that the '\0' char could
    >>> terminate the string in some (?) methods.
    >>>
    >>> Here is the code I tested :
    >>>
    >>> #include <string>
    >>> #include <iostream>
    >>>
    >>> int main(void)
    >>> {
    >>> std::string s;
    >>> s.push_back(0);
    >>> s += "sdfsdf\0sdqsd";
    >>>
    >>> std::cout << s.length(); // 12
    >>>
    >>> std::string s2 = s;
    >>>
    >>> std::cout << s2.length(); // 12
    >>>
    >>> return 0;
    >>> }
    >>>
    >>> What is your point of view ?
    >>> Thanks
    >>> --

    >>
    >> There is nothing wrong with storing \0 in a std::string object, the only
    >> issue to be aware of is that strlen(s.c_str()) != s.size().
    >>
    >> /Leigh

    > I think I'm right in saying that the example provided by the OP won't work,
    > though, because of the embedded null in the string literal.


    You're right, the code outputs "77" with my compiler (that would be, the
    length of the strings is "7", printed twice).

    In order to successfully add a null character one would need to push it
    back as the OP did or use += '\0' - or an equivalent statement.

    --
    FSC - http://userscripts.org/scripts/show/59948
    http://fscode.altervista.org - http://sardinias.com
     
    Francesco S. Carta, Jul 26, 2010
    #5
  6. jeremie fouche

    Jonathan Lee Guest

    On Jul 26, 10:42 am, jeremie fouche <> wrote:
    > Hi
    >
    > I would like to know if this is valid to embed binary data (which can
    > contain '\0') in a std::string. As I'm in a code review, I suggested
    > to change this to std::vector<char>, but the dev guys told me they
    > tested and it was OK.
    > I tested with MSVC 2005 and Mingw-gcc4.4.0 and it was fine. But I'ms
    > still not sure about this. I'm afraid that the '\0' char could
    > terminate the string in some (?) methods.
    >
    > Here is the code I tested :
    >
    > What is your point of view ?
    > Thanks
    > --
    > Jérémie


    You got 12s? 'Cause I got a couple of 7s, which is what it ought
    to be. As Francesco explained.

    G++ 4.1.2

    --Jonathan
     
    Jonathan Lee, Jul 26, 2010
    #6
  7. On 26 juil, 17:13, James Lothian
    <> wrote:
    > Leigh Johnston wrote:
    >
    > I think I'm right in saying that the example provided by the OP won't work,
    > though, because of the embedded null in the string literal.
    >
    > James


    Oups, you're right, sorry
    My first try gave 12... but it was an other sample
    The string is created from a char[] buffer, so i'll not see this (OT)
    problem.
    Thanks a lot everybody for the explaination
    --
    Jérémie
     
    jeremie fouche, Jul 26, 2010
    #7
  8. Geoff <>, on 26/07/2010 11:18:31, wrote:

    > On Mon, 26 Jul 2010 07:42:47 -0700 (PDT), jeremie fouche
    > <> wrote:
    >
    >> Hi
    >>
    >> I would like to know if this is valid to embed binary data (which can
    >> contain '\0') in a std::string. As I'm in a code review, I suggested
    >> to change this to std::vector<char>, but the dev guys told me they
    >> tested and it was OK.
    >> I tested with MSVC 2005 and Mingw-gcc4.4.0 and it was fine. But I'ms
    >> still not sure about this. I'm afraid that the '\0' char could
    >> terminate the string in some (?) methods.
    >>
    >> Here is the code I tested :
    >>
    >> #include<string>
    >> #include<iostream>
    >>
    >> int main(void)
    >> {
    >> std::string s;
    >> s.push_back(0);
    >> s += "sdfsdf\0sdqsd";
    >>
    >> std::cout<< s.length(); // 12
    >>
    >> std::string s2 = s;
    >>
    >> std::cout<< s2.length(); // 12
    >>
    >> return 0;
    >> }
    >>
    >> What is your point of view ?
    >> Thanks

    >
    > It fails on every compiler I tried.
    >
    > Did you try to see what this does?
    >
    > std::cout<< s.length()<< std::endl; // should be 12
    > std::cout<< s<< std::endl;
    >
    > std::string s2 = s;
    >
    > std::cout<< s2.length()<< std::endl; // should be 12
    > std::cout<< s2<< std::endl;
    >
    > Fire your dev group.


    I don't understand why he should. Notice that we have seen only the
    example code posted by the OP, we have no idea of what the dev team is
    doing with those strings that contain binary data (and hence,
    potentially, also the null character).

    The thing to ascertain is /what exactly/ they are doing with those strings.

    If I parse a whole binary file to a std::string and then I pass c_str()
    to a function that expects a const char* pointer _along with_ the
    correct size of the pointed-to data, I am doing something perfectly legit.

    If they're passing those c_str() to a function that expects a simple
    C-style null terminated string, _then_ they ought to be corrected, and
    fired only as a last resort if they happen to reiterate writing such
    (hypothetical) crap code.

    Just my point of view, of course :)

    --
    FSC - http://userscripts.org/scripts/show/59948
    http://fscode.altervista.org - http://sardinias.com
     
    Francesco S. Carta, Jul 26, 2010
    #8
  9. Geoff <>, on 26/07/2010 12:27:00, wrote:

    > On Mon, 26 Jul 2010 20:28:33 +0200, "Francesco S. Carta"
    > <> wrote:
    >
    >> I don't understand why he should. Notice that we have seen only the
    >> example code posted by the OP, we have no idea of what the dev team is
    >> doing with those strings that contain binary data (and hence,
    >> potentially, also the null character).

    >
    > I agree, and I was being facetious about firing the dev group. It
    > appears they are keeping their binary data in a string but it seems to
    > me a misuse/abuse of the class. A maintainer might be justified in
    > calling a string function against the data and getting erroneous
    > results unless it is thoroughly documented. std::vector would be more
    > proper, IMHO.


    Full ack, now that you make me think about it, also I would be better
    reading files into a std::vector - I happen to habitually read files
    into std::string and since I never had any problem I never really
    questioned it.

    --
    FSC - http://userscripts.org/scripts/show/59948
    http://fscode.altervista.org - http://sardinias.com
     
    Francesco S. Carta, Jul 26, 2010
    #9
  10. jeremie fouche

    Lynn McGuire Guest

    > I would like to know if this is valid to embed binary data (which can
    > contain '\0') in a std::string. As I'm in a code review, I suggested
    > to change this to std::vector<char>, but the dev guys told me they
    > tested and it was OK.


    What happens when your app goes unicode and all std::string variables
    are changed into std::wstring ?
    http://stackoverflow.com/questions/402283/stdwstring-vs-stdstring

    Lynn
     
    Lynn McGuire, Jul 26, 2010
    #10
  11. On 26 juil, 22:32, Lynn McGuire <> wrote:
    > > I would like to know if this is valid to embed binary data (which can
    > > contain '\0') in a std::string. As I'm in a code review, I suggested
    > > to change this to std::vector<char>, but the dev guys told me they
    > > tested and it was OK.

    >
    > What happens when your app goes unicode and all std::string variables
    > are changed into std::wstring ?
    >    http://stackoverflow.com/questions/402283/stdwstring-vs-stdstring


    The App is UNICODE, but the interface defined for the module is based
    on char (as it's for a binary file). It's not a problem.
     
    jeremie fouche, Jul 27, 2010
    #11
  12. On 26 juil, 20:28, "Francesco S. Carta" <> wrote:
    >
    > I don't understand why he should. Notice that we have seen only the
    > example code posted by the OP, we have no idea of what the dev team is
    > doing with those strings that contain binary data (and hence,
    > potentially, also the null character).
    >
    > The thing to ascertain is /what exactly/ they are doing with those strings.


    It /should/ be a binary data container, that's all.

    > If I parse a whole binary file to a std::string and then I pass c_str()
    > to a function that expects a const char* pointer _along with_ the
    > correct size of the pointed-to data, I am doing something perfectly legit..


    That's what I understood reading the whole thread.

    >
    > If they're passing those c_str() to a function that expects a simple
    > C-style null terminated string, _then_ they ought to be corrected, and
    > fired only as a last resort if they happen to reiterate writing such
    > (hypothetical) crap code.


    This is the real problem. I hope that everybody will understand what
    is this string for (it must be well commented). That's why a
    vector<char> seems to be the good solution for a simpler maintenance
    of the code.

    > Just my point of view, of course :)


    Like mine :)
    --
    Jérémie
     
    jeremie fouche, Jul 27, 2010
    #12
  13. jeremie fouche

    James Kanze Guest

    On Jul 26, 7:28 pm, "Francesco S. Carta" <> wrote:
    > Geoff <>, on 26/07/2010 11:18:31, wrote:
    > > On Mon, 26 Jul 2010 07:42:47 -0700 (PDT), jeremie fouche
    > > <> wrote:


    > >> I would like to know if this is valid to embed binary data
    > >> (which can contain '\0') in a std::string. As I'm in a code
    > >> review, I suggested to change this to std::vector<char>,
    > >> but the dev guys told me they tested and it was OK.
    > >> I tested with MSVC 2005 and Mingw-gcc4.4.0 and it was fine.
    > >> But I'ms still not sure about this. I'm afraid that the
    > >> '\0' char could terminate the string in some (?) methods.


    [...]
    > If I parse a whole binary file to a std::string and then
    > I pass c_str() to a function that expects a const char*
    > pointer _along with_ the correct size of the pointed-to data,
    > I am doing something perfectly legit.


    Formally. In such cases, however, I'd use std::string::data()
    to get the pointer.

    More generally, std::string means text to most programmers, most
    of the time, and std::vector<char> (or even std::vector<unsigned
    char>) should be preferred for binary data. Similarly, if I'm
    interfacing to C, I'll use std::string::c_str() if the
    C function expects pointer to a '\0' terminated string, and
    std::string::data(), std::string::size() if it expects a pointer
    and a length.

    --
    James Kanze
     
    James Kanze, Jul 27, 2010
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. wwj
    Replies:
    7
    Views:
    582
  2. wwj
    Replies:
    24
    Views:
    2,551
    Mike Wahler
    Nov 7, 2003
  3. Ben Pfaff
    Replies:
    5
    Views:
    492
    Tristan Miller
    Jan 17, 2004
  4. Steffen Fiksdal

    void*, char*, unsigned char*, signed char*

    Steffen Fiksdal, May 8, 2005, in forum: C Programming
    Replies:
    1
    Views:
    610
    Jack Klein
    May 9, 2005
  5. lovecreatesbeauty
    Replies:
    1
    Views:
    1,106
    Ian Collins
    May 9, 2006
Loading...

Share This Page