<string> class with support of Null-Bytes?

Discussion in 'C++' started by Karl Ebener, Dec 14, 2004.

  1. Karl Ebener

    Karl Ebener Guest

    Hi!

    I asked a similar question before but then changed everything to using
    char-Arrays instead of the string class, but I would rather not do this
    again.

    So, does anyone know of a string-Class similar to the STL-<string> that
    supports null-bytes?

    I tried with standard <string> but this definitely does not support
    them... :(

    Tnx
    Karl
     
    Karl Ebener, Dec 14, 2004
    #1
    1. Advertising

  2. * Karl Ebener:
    >
    > I asked a similar question before but then changed everything to using
    > char-Arrays instead of the string class, but I would rather not do this
    > again.
    >
    > So, does anyone know of a string-Class similar to the STL-<string> that
    > supports null-bytes?
    >
    > I tried with standard <string> but this definitely does not support
    > them... :(


    Depends what you mean by "support", but with usual definitions that's
    not correct.

    Perhaps post a simple program that shows what you mean by "not support"?

    Then we can see whether the problem is in the code or with std::string,
    and give better suggestions on how to proceeed.

    --
    A: Because it messes up the order in which people normally read text.
    Q: Why is it such a bad thing?
    A: Top-posting.
    Q: What is the most annoying thing on usenet and in e-mail?
     
    Alf P. Steinbach, Dec 14, 2004
    #2
    1. Advertising

  3. Karl Ebener

    Karl Ebener Guest

    Little change:

    > I tried with standard <string> but this definitely does not support
    > them... :(


    -> I tried using length()-method which stops at null-bytes and c_str()
    of course extracts only part till null-byte.
    Have I only not seen any possibility to extract the content as char* ?

    Tnx
    Karl
     
    Karl Ebener, Dec 14, 2004
    #3
  4. * Karl Ebener:
    > Little change:
    >
    > > I tried with standard <string> but this definitely does not support
    > > them... :(

    >
    > -> I tried using length()-method which stops at null-bytes


    It doesn't.


    > and c_str() of course extracts only part till null-byte.


    It doesn't, see ยง21.3.6/1.


    > Have I only not seen any possibility to extract the content as char* ?


    Post some code.

    --
    A: Because it messes up the order in which people normally read text.
    Q: Why is it such a bad thing?
    A: Top-posting.
    Q: What is the most annoying thing on usenet and in e-mail?
     
    Alf P. Steinbach, Dec 14, 2004
    #4
  5. Karl Ebener

    Karl Ebener Guest

    Alf P. Steinbach schrieb:
    > Depends what you mean by "support", but with usual definitions that's
    > not correct.
    >
    > Perhaps post a simple program that shows what you mean by "not support"?
    >
    > Then we can see whether the problem is in the code or with std::string,
    > and give better suggestions on how to proceeed.
    >

    Okay, this is my test program.
    What I want to do finally, is read a complete (binary) file into a
    string and then send this via using socket to/from server.
    I am using socket-routines that use strings because it is much easier
    this way and I would love to leave it at that and not recode everything...

    Tnx
    Karl

    #include <string>
    #include <iostream>

    using namespace std;

    int main()
    {
    string abc = "abc\0abc\0"; // string contains Null-bytes
    cout << abc << ":" << abc.length() << endl; // output is: 3
    FILE* fp;

    fp = fopen("ABC", "w");
    fwrite(abc.c_str(), 8, 1, fp); // file will contain: "abc" and Garbage
    fclose(fp);
    }
     
    Karl Ebener, Dec 14, 2004
    #5
  6. Karl Ebener

    Rolf Magnus Guest

    Karl Ebener wrote:

    > Alf P. Steinbach schrieb:
    >> Depends what you mean by "support", but with usual definitions that's
    >> not correct.
    >>
    >> Perhaps post a simple program that shows what you mean by "not support"?
    >>
    >> Then we can see whether the problem is in the code or with std::string,
    >> and give better suggestions on how to proceeed.
    >>

    > Okay, this is my test program.
    > What I want to do finally, is read a complete (binary) file into a
    > string and then send this via using socket to/from server.
    > I am using socket-routines that use strings because it is much easier
    > this way and I would love to leave it at that and not recode everything...
    >
    > Tnx
    > Karl
    >
    > #include <string>
    > #include <iostream>
    >
    > using namespace std;
    >
    > int main()
    > {
    > string abc = "abc\0abc\0"; // string contains Null-bytes


    No. Your literal contains 0-bytes. The conversion constructor from C style
    strings to std::string of course has to stop at \0, since that's the value
    that marks the end of a C style string. Try:

    const char c[] = "abc\0abc\0";

    string abc(c, sizeof(c));

    This tells the constructor to not stop at \0, but read the specified number
    of characters.

    > cout << abc << ":" << abc.length() << endl; // output is: 3


    That's because only the first 3 characters were actually copied into the
    string.

    > FILE* fp;
    >
    > fp = fopen("ABC", "w");
    > fwrite(abc.c_str(), 8, 1, fp); // file will contain: "abc" and Garbage


    Again, that's because the string only contains the first 3 characters.

    > fclose(fp);
    > }
     
    Rolf Magnus, Dec 14, 2004
    #6
  7. * Karl Ebener:
    >
    > #include <string>
    > #include <iostream>
    >
    > using namespace std;
    >
    > int main()
    > {
    > string abc = "abc\0abc\0"; // string contains Null-bytes
    > cout << abc << ":" << abc.length() << endl; // output is: 3
    > FILE* fp;
    >
    > fp = fopen("ABC", "w");
    > fwrite(abc.c_str(), 8, 1, fp); // file will contain: "abc" and Garbage
    > fclose(fp);
    > }


    The problem in the abc declaration is that you invoke the constructor
    that takes a C string as argument, and by definition that C string ends
    at the first nullbyte.

    Try


    #include <string>
    #include <iostream>

    #define ELEMCOUNT( array ) (sizeof(array)/sizeof(*array))

    int main()
    {
    static char const abc_data[] = "abc\0abc\0";
    std::string abc( abc_data, ELEMCOUNT( abc_data );

    std::cout << abc << ":" << abc.length() << std::endl;
    }

    But you might instead (for efficiency) want to use std::vector<char>.

    Also, the file should be opened in binary mode.

    --
    A: Because it messes up the order in which people normally read text.
    Q: Why is it such a bad thing?
    A: Top-posting.
    Q: What is the most annoying thing on usenet and in e-mail?
     
    Alf P. Steinbach, Dec 14, 2004
    #7
  8. Karl Ebener wrote:

    > Okay, this is my test program.


    My guess is that std::string's functions (including constructors) that take
    a C-Style string as an argument, *do* treat it as a C-style (i.e.
    null-terminated) string.

    Makes sense, doesn't it? You don't want

    char s[15] = "sth";
    string s1(s);

    to allocate 11 extra null characters in s1 for no reason :)

    If, OTOH, you put a '\0' in an std::string, it will not be treated as a
    terminating character.

    Check out this example to see what I mean:

    #include <iostream>
    #include <string>

    int main(){
    std::string s("abc\0abc\0");
    std::cout<<s.length()<<std::endl; //prints 3, not 9
    std::string s2;
    s2.push_back('a');
    s2.push_back('\0');
    s2.push_back('b');
    std::cout<<s2.length()<<std::endl; //prints 3, not 1
    }


    Note: c_string() will return a const char *, which means that the string
    returned will always stop at the first null byte, for any code that cares
    about it (e.g. strlen or strcpy). Better use a vector<char> if you want
    byte semantics.
     
    Dimitris Kamenopoulos, Dec 14, 2004
    #8
  9. Karl Ebener

    Dave O'Hearn Guest

    Karl Ebener wrote:
    > fwrite(abc.c_str(), 8, 1, fp); // file will contain: "abc"
    > // and Garbage


    As a separate issue, data() would be better than c_str() here. c_str()
    may expand the string's internal buffer, to make room for an extra null
    character past the end. You don't need a null-terminated C-string to
    call fwrite, so you can just use data().

    --
    Dave O'Hearn
     
    Dave O'Hearn, Dec 14, 2004
    #9
  10. Karl Ebener

    Rolf Magnus Guest

    Dimitris Kamenopoulos wrote:

    > Karl Ebener wrote:
    >
    >> Okay, this is my test program.

    >
    > My guess is that std::string's functions (including constructors) that
    > take a C-Style string as an argument, *do* treat it as a C-style (i.e.
    > null-terminated) string.
    >
    > Makes sense, doesn't it? You don't want
    >
    > char s[15] = "sth";
    > string s1(s);
    >
    > to allocate 11 extra null characters in s1 for no reason :)


    That's not the main point. The constructor takes a pointer, which doesn't
    contain any information about the size of the array pointed to. So the \0
    is the _only_ way at all to know where a C style string ends.
     
    Rolf Magnus, Dec 14, 2004
    #10
  11. Karl Ebener

    Paul Guest

    "Karl Ebener" <> wrote in message
    news:41bec2d2$0$29843$-online.net...
    > Little change:
    >
    > > I tried with standard <string> but this definitely does not support
    > > them... :(

    >
    > -> I tried using length()-method which stops at null-bytes and c_str()
    > of course extracts only part till null-byte.


    What you are saying is totally false. std::string fully supports strings
    with embedded NULLs. You just need to know the functions to use.

    First, use the right constructor. The std::string has a few constructors --
    a good C++ book that goes into the standard library will show you the
    various constructors. The proper constructor is the one that takes a const
    char * and an integer denoting the number of characters.

    #include <string>
    std::string s("abc\0123", 7);

    Second, use the std::string::data( ) member function instead of
    std::string::c_str(). This respects the length of the string and does not
    terminate on the first NULL.

    Third, if you need to add binary data to a std::string, use the append( )
    function. If you need to reassign binary data, use the
    std::string::append() on an empty string, or the std::string::assign( )
    member function.

    Paul
     
    Paul, Dec 14, 2004
    #11
  12. Karl Ebener

    Ron Natalie Guest

    Karl Ebener wrote:
    > Little change:
    >
    >> I tried with standard <string> but this definitely does not support
    >> them... :(

    >
    >
    > -> I tried using length()-method which stops at null-bytes and c_str()
    > of course extracts only part till null-byte.
    > Have I only not seen any possibility to extract the content as char* ?


    Multibyte does not contain nulls. I'm confused as what you are asking.
    Neither c_str() nor length() cares anything about embedded nulls.

    Now that being said, there is NO real multibyte handling in std::string
    either.
     
    Ron Natalie, Dec 14, 2004
    #12
  13. Karl Ebener

    Ron Natalie Guest

    Karl Ebener wrote:

    > So, does anyone know of a string-Class similar to the STL-<string> that
    > supports null-bytes?


    std:string handles null bytes just fine. The only thing that you have to
    be careful with is that if you use the conversions to/from char*, you need
    to pass/retrieve the actual length because the default strlen() calculations
    won't work.

    std::string s;
    s.push_back('a');
    s.push_back('\0');
    s.push_back('\b');

    cout << s.size(); // prints 3
    const char* cp = s.c_str();

    cout << cp[0] << cp[2]; // prints ab
     
    Ron Natalie, Dec 14, 2004
    #13
  14. Karl Ebener

    Old Wolf Guest

    Paul wrote:
    > "Karl Ebener" <> wrote:
    >
    > #include <string>
    > std::string s("abc\0123", 7);


    Undefined behaviour. "abc\0123" is an array of 6 chars:
    {'a', 'b', 'c', '\012', '3', '\0'}

    > Second, use the std::string::data( ) member function instead of
    > std::string::c_str(). This respects the length of the string
    > and does not terminate on the first NULL.


    std::string::c_str() does not terminate on the first null
    character. The only difference between c_str() and data()
    is that c_str() appends a null character.

    std::string s("abc\0def", 7);
    std::cout << (s.c_str() + 4) << std::endl;

    will output "def".
    BTW, the macro NULL is not really relevant to null characters.
     
    Old Wolf, Dec 14, 2004
    #14
  15. Karl Ebener

    Old Wolf Guest

    Paul wrote:
    > "Karl Ebener" <> wrote:
    >
    > #include <string>
    > std::string s("abc\0123", 7);


    Undefined behaviour. "abc\0123" is an array of 6 chars:
    {'a', 'b', 'c', '\012', '3', '\0'}

    > Second, use the std::string::data( ) member function instead of
    > std::string::c_str(). This respects the length of the string
    > and does not terminate on the first NULL.


    std::string::c_str() does not terminate on the first null
    character. The only difference between c_str() and data()
    is that c_str() appends a null character.

    std::string s("abc\0def", 7);
    std::cout << (s.c_str() + 4) << std::endl;

    will output "def".
    BTW, the macro NULL is not really relevant to null characters.
     
    Old Wolf, Dec 14, 2004
    #15
  16. Karl Ebener

    Ron House Guest

    Karl Ebener wrote:
    > Alf P. Steinbach schrieb:


    >> Perhaps post a simple program that shows what you mean by "not support"?


    > #include <string>
    > #include <iostream>
    >
    > using namespace std;
    >
    > int main()
    > {
    > string abc = "abc\0abc\0"; // string contains Null-bytes
    > cout << abc << ":" << abc.length() << endl; // output is: 3
    >...
    > }


    Nothing wrong with string. You lost your trailing data because C-style
    string literals end at the first '\0'. This one works:

    #include <string>
    #include <iostream>

    using namespace std;

    int main()
    {
    string abc = "abcdabcd";
    abc[3] = abc[7] = '\0';
    cout << abc << ":" << abc.length() << endl;
    return 0;
    }

    Prints:

    abcabc:8

    --
    Ron House
    http://www.sci.usq.edu.au/staff/house
     
    Ron House, Dec 15, 2004
    #16
  17. Karl Ebener

    Paul Guest

    "Old Wolf" <> wrote in message
    news:...
    > Paul wrote:
    > > "Karl Ebener" <> wrote:
    > >
    > > #include <string>
    > > std::string s("abc\0123", 7);

    >
    > Undefined behaviour. "abc\0123" is an array of 6 chars:
    > {'a', 'b', 'c', '\012', '3', '\0'}
    >

    Sorry, that was my attempt to put together a string in haste. The following
    is what I meant:

    #include <string>
    int main( )
    {
    char s1[] = {'0','1','2',0,'4','5','6'};
    std::string s(s1, 7);
    }

    Paul
     
    Paul, Dec 15, 2004
    #17
  18. Karl Ebener

    Paavo Helde Guest

    Karl Ebener <> wrote in
    news:41bec4ab$0$29853$-online.net:

    > What I want to do finally, is read a complete (binary) file into a
    > string and then send this via using socket to/from server.
    > I am using socket-routines that use strings because it is much easier
    > this way and I would love to leave it at that and not recode
    > everything...


    OK, in case of large and/or binary strings assign(), append() and swap()
    member functions are your friends. E.g.

    void read_from_file(std::string& content) {
    char buffer[N];
    std::string collector;
    while(!eof(the_file)) {
    // ... read chunk of file into the buffer, say of length n.
    collector.append(buffer, n);
    }
    content.swap(collector);
    }

    void send_to_socket() {
    std::string packet;
    read_from_file(packet);
    // assume there is a nice C++ object around called socket:
    socket.write(packet.data(), packet.length());
    }

    Note that using c_str() instead of data() might imply a performance
    penalty here as the c_str() function might have to add a NUL terminator
    at the end of the buffer, which can cause a reallocation and extra
    unneeded copy of the whole string. As you must be managing the lengths
    anyway explicitly the terminating NUL is not needed.

    OK, swap() is not really necessary in this example, but it might be
    useful in other similar situations where you have a large string to be
    passed around.

    In case of binary data the first rule is to avoid all std::string member
    functions which take a single char* pointer - there is no way to specify
    the actual length of data for such parameter.

    HTH
    Paavo
     
    Paavo Helde, Dec 23, 2004
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jason Collins
    Replies:
    3
    Views:
    6,100
    Jason Collins
    Feb 18, 2004
  2. Replies:
    5
    Views:
    27,588
    Mike Schilling
    Mar 29, 2006
  3. mrby

    4-bytes or 8-bytes alignment?

    mrby, Nov 2, 2004, in forum: C Programming
    Replies:
    8
    Views:
    451
    Mark McIntyre
    Nov 2, 2004
  4. Pep
    Replies:
    13
    Views:
    696
  5. abcd
    Replies:
    1
    Views:
    566
    Gabriel Genellina
    Apr 12, 2007
Loading...

Share This Page