<string> class with support of Null-Bytes?

K

Karl Ebener

Hi!

I asked a similar question before but then changed everything to using
char-Arrays instead of the string class, but I would rather not do this
again.

So, does anyone know of a string-Class similar to the STL-<string> that
supports null-bytes?

I tried with standard <string> but this definitely does not support
them... :(

Tnx
Karl
 
A

Alf P. Steinbach

* Karl Ebener:
I asked a similar question before but then changed everything to using
char-Arrays instead of the string class, but I would rather not do this
again.

So, does anyone know of a string-Class similar to the STL-<string> that
supports null-bytes?

I tried with standard <string> but this definitely does not support
them... :(

Depends what you mean by "support", but with usual definitions that's
not correct.

Perhaps post a simple program that shows what you mean by "not support"?

Then we can see whether the problem is in the code or with std::string,
and give better suggestions on how to proceeed.
 
K

Karl Ebener

Little change:
I tried with standard <string> but this definitely does not support
them... :(

-> I tried using length()-method which stops at null-bytes and c_str()
of course extracts only part till null-byte.
Have I only not seen any possibility to extract the content as char* ?

Tnx
Karl
 
K

Karl Ebener

Alf said:
Depends what you mean by "support", but with usual definitions that's
not correct.

Perhaps post a simple program that shows what you mean by "not support"?

Then we can see whether the problem is in the code or with std::string,
and give better suggestions on how to proceeed.
Okay, this is my test program.
What I want to do finally, is read a complete (binary) file into a
string and then send this via using socket to/from server.
I am using socket-routines that use strings because it is much easier
this way and I would love to leave it at that and not recode everything...

Tnx
Karl

#include <string>
#include <iostream>

using namespace std;

int main()
{
string abc = "abc\0abc\0"; // string contains Null-bytes
cout << abc << ":" << abc.length() << endl; // output is: 3
FILE* fp;

fp = fopen("ABC", "w");
fwrite(abc.c_str(), 8, 1, fp); // file will contain: "abc" and Garbage
fclose(fp);
}
 
R

Rolf Magnus

Karl said:
Okay, this is my test program.
What I want to do finally, is read a complete (binary) file into a
string and then send this via using socket to/from server.
I am using socket-routines that use strings because it is much easier
this way and I would love to leave it at that and not recode everything...

Tnx
Karl

#include <string>
#include <iostream>

using namespace std;

int main()
{
string abc = "abc\0abc\0"; // string contains Null-bytes

No. Your literal contains 0-bytes. The conversion constructor from C style
strings to std::string of course has to stop at \0, since that's the value
that marks the end of a C style string. Try:

const char c[] = "abc\0abc\0";

string abc(c, sizeof(c));

This tells the constructor to not stop at \0, but read the specified number
of characters.
cout << abc << ":" << abc.length() << endl; // output is: 3

That's because only the first 3 characters were actually copied into the
string.
FILE* fp;

fp = fopen("ABC", "w");
fwrite(abc.c_str(), 8, 1, fp); // file will contain: "abc" and Garbage

Again, that's because the string only contains the first 3 characters.
 
A

Alf P. Steinbach

* Karl Ebener:
#include <string>
#include <iostream>

using namespace std;

int main()
{
string abc = "abc\0abc\0"; // string contains Null-bytes
cout << abc << ":" << abc.length() << endl; // output is: 3
FILE* fp;

fp = fopen("ABC", "w");
fwrite(abc.c_str(), 8, 1, fp); // file will contain: "abc" and Garbage
fclose(fp);
}

The problem in the abc declaration is that you invoke the constructor
that takes a C string as argument, and by definition that C string ends
at the first nullbyte.

Try


#include <string>
#include <iostream>

#define ELEMCOUNT( array ) (sizeof(array)/sizeof(*array))

int main()
{
static char const abc_data[] = "abc\0abc\0";
std::string abc( abc_data, ELEMCOUNT( abc_data );

std::cout << abc << ":" << abc.length() << std::endl;
}

But you might instead (for efficiency) want to use std::vector<char>.

Also, the file should be opened in binary mode.
 
D

Dimitris Kamenopoulos

Karl said:
Okay, this is my test program.

My guess is that std::string's functions (including constructors) that take
a C-Style string as an argument, *do* treat it as a C-style (i.e.
null-terminated) string.

Makes sense, doesn't it? You don't want

char s[15] = "sth";
string s1(s);

to allocate 11 extra null characters in s1 for no reason :)

If, OTOH, you put a '\0' in an std::string, it will not be treated as a
terminating character.

Check out this example to see what I mean:

#include <iostream>
#include <string>

int main(){
std::string s("abc\0abc\0");
std::cout<<s.length()<<std::endl; //prints 3, not 9
std::string s2;
s2.push_back('a');
s2.push_back('\0');
s2.push_back('b');
std::cout<<s2.length()<<std::endl; //prints 3, not 1
}


Note: c_string() will return a const char *, which means that the string
returned will always stop at the first null byte, for any code that cares
about it (e.g. strlen or strcpy). Better use a vector<char> if you want
byte semantics.
 
D

Dave O'Hearn

Karl said:
fwrite(abc.c_str(), 8, 1, fp); // file will contain: "abc"
// and Garbage

As a separate issue, data() would be better than c_str() here. c_str()
may expand the string's internal buffer, to make room for an extra null
character past the end. You don't need a null-terminated C-string to
call fwrite, so you can just use data().
 
R

Rolf Magnus

Dimitris said:
Karl said:
Okay, this is my test program.

My guess is that std::string's functions (including constructors) that
take a C-Style string as an argument, *do* treat it as a C-style (i.e.
null-terminated) string.

Makes sense, doesn't it? You don't want

char s[15] = "sth";
string s1(s);

to allocate 11 extra null characters in s1 for no reason :)

That's not the main point. The constructor takes a pointer, which doesn't
contain any information about the size of the array pointed to. So the \0
is the _only_ way at all to know where a C style string ends.
 
P

Paul

Karl Ebener said:
Little change:


-> I tried using length()-method which stops at null-bytes and c_str()
of course extracts only part till null-byte.

What you are saying is totally false. std::string fully supports strings
with embedded NULLs. You just need to know the functions to use.

First, use the right constructor. The std::string has a few constructors --
a good C++ book that goes into the standard library will show you the
various constructors. The proper constructor is the one that takes a const
char * and an integer denoting the number of characters.

#include <string>
std::string s("abc\0123", 7);

Second, use the std::string::data( ) member function instead of
std::string::c_str(). This respects the length of the string and does not
terminate on the first NULL.

Third, if you need to add binary data to a std::string, use the append( )
function. If you need to reassign binary data, use the
std::string::append() on an empty string, or the std::string::assign( )
member function.

Paul
 
R

Ron Natalie

Karl said:
Little change:



-> I tried using length()-method which stops at null-bytes and c_str()
of course extracts only part till null-byte.
Have I only not seen any possibility to extract the content as char* ?

Multibyte does not contain nulls. I'm confused as what you are asking.
Neither c_str() nor length() cares anything about embedded nulls.

Now that being said, there is NO real multibyte handling in std::string
either.
 
R

Ron Natalie

Karl said:
So, does anyone know of a string-Class similar to the STL-<string> that
supports null-bytes?

std:string handles null bytes just fine. The only thing that you have to
be careful with is that if you use the conversions to/from char*, you need
to pass/retrieve the actual length because the default strlen() calculations
won't work.

std::string s;
s.push_back('a');
s.push_back('\0');
s.push_back('\b');

cout << s.size(); // prints 3
const char* cp = s.c_str();

cout << cp[0] << cp[2]; // prints ab
 
O

Old Wolf

Paul said:
#include <string>
std::string s("abc\0123", 7);

Undefined behaviour. "abc\0123" is an array of 6 chars:
{'a', 'b', 'c', '\012', '3', '\0'}
Second, use the std::string::data( ) member function instead of
std::string::c_str(). This respects the length of the string
and does not terminate on the first NULL.

std::string::c_str() does not terminate on the first null
character. The only difference between c_str() and data()
is that c_str() appends a null character.

std::string s("abc\0def", 7);
std::cout << (s.c_str() + 4) << std::endl;

will output "def".
BTW, the macro NULL is not really relevant to null characters.
 
O

Old Wolf

Paul said:
#include <string>
std::string s("abc\0123", 7);

Undefined behaviour. "abc\0123" is an array of 6 chars:
{'a', 'b', 'c', '\012', '3', '\0'}
Second, use the std::string::data( ) member function instead of
std::string::c_str(). This respects the length of the string
and does not terminate on the first NULL.

std::string::c_str() does not terminate on the first null
character. The only difference between c_str() and data()
is that c_str() appends a null character.

std::string s("abc\0def", 7);
std::cout << (s.c_str() + 4) << std::endl;

will output "def".
BTW, the macro NULL is not really relevant to null characters.
 
R

Ron House

Karl said:
Alf P. Steinbach schrieb:
#include <string>
#include <iostream>

using namespace std;

int main()
{
string abc = "abc\0abc\0"; // string contains Null-bytes
cout << abc << ":" << abc.length() << endl; // output is: 3
...
}

Nothing wrong with string. You lost your trailing data because C-style
string literals end at the first '\0'. This one works:

#include <string>
#include <iostream>

using namespace std;

int main()
{
string abc = "abcdabcd";
abc[3] = abc[7] = '\0';
cout << abc << ":" << abc.length() << endl;
return 0;
}

Prints:

abcabc:8
 
P

Paul

Old Wolf said:
Undefined behaviour. "abc\0123" is an array of 6 chars:
{'a', 'b', 'c', '\012', '3', '\0'}
Sorry, that was my attempt to put together a string in haste. The following
is what I meant:

#include <string>
int main( )
{
char s1[] = {'0','1','2',0,'4','5','6'};
std::string s(s1, 7);
}

Paul
 
P

Paavo Helde

What I want to do finally, is read a complete (binary) file into a
string and then send this via using socket to/from server.
I am using socket-routines that use strings because it is much easier
this way and I would love to leave it at that and not recode
everything...

OK, in case of large and/or binary strings assign(), append() and swap()
member functions are your friends. E.g.

void read_from_file(std::string& content) {
char buffer[N];
std::string collector;
while(!eof(the_file)) {
// ... read chunk of file into the buffer, say of length n.
collector.append(buffer, n);
}
content.swap(collector);
}

void send_to_socket() {
std::string packet;
read_from_file(packet);
// assume there is a nice C++ object around called socket:
socket.write(packet.data(), packet.length());
}

Note that using c_str() instead of data() might imply a performance
penalty here as the c_str() function might have to add a NUL terminator
at the end of the buffer, which can cause a reallocation and extra
unneeded copy of the whole string. As you must be managing the lengths
anyway explicitly the terminating NUL is not needed.

OK, swap() is not really necessary in this example, but it might be
useful in other similar situations where you have a large string to be
passed around.

In case of binary data the first rule is to avoid all std::string member
functions which take a single char* pointer - there is no way to specify
the actual length of data for such parameter.

HTH
Paavo
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top