string which embed '\0' char

J

jeremie fouche

Hi

I would like to know if this is valid to embed binary data (which can
contain '\0') in a std::string. As I'm in a code review, I suggested
to change this to std::vector<char>, but the dev guys told me they
tested and it was OK.
I tested with MSVC 2005 and Mingw-gcc4.4.0 and it was fine. But I'ms
still not sure about this. I'm afraid that the '\0' char could
terminate the string in some (?) methods.

Here is the code I tested :

#include <string>
#include <iostream>

int main(void)
{
std::string s;
s.push_back(0);
s += "sdfsdf\0sdqsd";

std::cout << s.length(); // 12

std::string s2 = s;

std::cout << s2.length(); // 12

return 0;
}

What is your point of view ?
Thanks
 
J

James Lothian

Leigh said:
There is nothing wrong with storing \0 in a std::string object, the only
issue to be aware of is that strlen(s.c_str()) != s.size().

/Leigh
I think I'm right in saying that the example provided by the OP won't work,
though, because of the embedded null in the string literal.

James
 
F

Francesco S. Carta

There is nothing wrong with storing \0 in a std::string object, the only
issue to be aware of is that strlen(s.c_str()) != s.size().

I'm sure you're well aware of this, but for sake of further
clarification for the OP's advantage I want to point out that if the \0
appears in the middle of the string, such as in "before\0after" and in
the OP example, a function taking a C-style string will not "see" the
"after" part - unless the function is designed to expect a specific
number of interleaved null characters, but that's another story.
 
A

Alf P. Steinbach /Usenet

* jeremie fouche, on 26.07.2010 16:42:
I would like to know if this is valid to embed binary data (which can
contain '\0') in a std::string.

Yes.

It can cause problems if you convert the string to zero-terminated, because
functions that deal with zero-terminated will naturally regard the \0 as string
termination.

But for std::string itself there is no problem.

However, for non-textual data it's better to use a std::vector (as you
suggested, snipped).

That's because the data type should reflect what it's used for. And while
std::string is low level it's not that level: it's not a type meant to be used
for anything but textual data. Thus, the type indicates The Wrong Thing, unless
by "binary data" you just mean text with embedded null-characters.


Cheers & hth.,

- Alf
 
F

Francesco S. Carta

on 26/07/2010 said:
I think I'm right in saying that the example provided by the OP won't work,
though, because of the embedded null in the string literal.

You're right, the code outputs "77" with my compiler (that would be, the
length of the strings is "7", printed twice).

In order to successfully add a null character one would need to push it
back as the OP did or use += '\0' - or an equivalent statement.
 
J

Jonathan Lee

Hi

I would like to know if this is valid to embed binary data (which can
contain '\0') in a std::string. As I'm in a code review, I suggested
to change this to std::vector<char>, but the dev guys told me they
tested and it was OK.
I tested with MSVC 2005 and Mingw-gcc4.4.0 and it was fine. But I'ms
still not sure about this. I'm afraid that the '\0' char could
terminate the string in some (?) methods.

Here is the code I tested :

What is your point of view ?
Thanks

You got 12s? 'Cause I got a couple of 7s, which is what it ought
to be. As Francesco explained.

G++ 4.1.2

--Jonathan
 
J

jeremie fouche

Leigh Johnston wrote:

I think I'm right in saying that the example provided by the OP won't work,
though, because of the embedded null in the string literal.

James

Oups, you're right, sorry
My first try gave 12... but it was an other sample
The string is created from a char[] buffer, so i'll not see this (OT)
problem.
Thanks a lot everybody for the explaination
 
F

Francesco S. Carta

It fails on every compiler I tried.

Did you try to see what this does?

std::cout<< s.length()<< std::endl; // should be 12
std::cout<< s<< std::endl;

std::string s2 = s;

std::cout<< s2.length()<< std::endl; // should be 12
std::cout<< s2<< std::endl;

Fire your dev group.

I don't understand why he should. Notice that we have seen only the
example code posted by the OP, we have no idea of what the dev team is
doing with those strings that contain binary data (and hence,
potentially, also the null character).

The thing to ascertain is /what exactly/ they are doing with those strings.

If I parse a whole binary file to a std::string and then I pass c_str()
to a function that expects a const char* pointer _along with_ the
correct size of the pointed-to data, I am doing something perfectly legit.

If they're passing those c_str() to a function that expects a simple
C-style null terminated string, _then_ they ought to be corrected, and
fired only as a last resort if they happen to reiterate writing such
(hypothetical) crap code.

Just my point of view, of course :)
 
F

Francesco S. Carta

I agree, and I was being facetious about firing the dev group. It
appears they are keeping their binary data in a string but it seems to
me a misuse/abuse of the class. A maintainer might be justified in
calling a string function against the data and getting erroneous
results unless it is thoroughly documented. std::vector would be more
proper, IMHO.

Full ack, now that you make me think about it, also I would be better
reading files into a std::vector - I happen to habitually read files
into std::string and since I never had any problem I never really
questioned it.
 
J

jeremie fouche

I don't understand why he should. Notice that we have seen only the
example code posted by the OP, we have no idea of what the dev team is
doing with those strings that contain binary data (and hence,
potentially, also the null character).

The thing to ascertain is /what exactly/ they are doing with those strings.

It /should/ be a binary data container, that's all.
If I parse a whole binary file to a std::string and then I pass c_str()
to a function that expects a const char* pointer _along with_ the
correct size of the pointed-to data, I am doing something perfectly legit..

That's what I understood reading the whole thread.
If they're passing those c_str() to a function that expects a simple
C-style null terminated string, _then_ they ought to be corrected, and
fired only as a last resort if they happen to reiterate writing such
(hypothetical) crap code.

This is the real problem. I hope that everybody will understand what
is this string for (it must be well commented). That's why a
Just my point of view, of course :)

Like mine :)
 
J

James Kanze

[...]
If I parse a whole binary file to a std::string and then
I pass c_str() to a function that expects a const char*
pointer _along with_ the correct size of the pointed-to data,
I am doing something perfectly legit.

Formally. In such cases, however, I'd use std::string::data()
to get the pointer.

More generally, std::string means text to most programmers, most
of the time, and std::vector<char> (or even std::vector<unsigned
char>) should be preferred for binary data. Similarly, if I'm
interfacing to C, I'll use std::string::c_str() if the
C function expects pointer to a '\0' terminated string, and
std::string::data(), std::string::size() if it expects a pointer
and a length.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top