Feeding string into ostringstream only uses up to the first null?

C

coomberjones

I have a few std::strings that I am using to store raw binary data,
each of which may very well include null bytes at any point or
points. I want to slap them together into a single string, so I tried
a std::eek:stringstream:

std::eek:stringstream oss;
oss << x << y << z;
std::string result ( oss.str() );

The result shows that feeding the ostringstream with a string just
takes the string's data up to (and not including) the first null
character.

Stroustrop's book is vague about what should happen (at least in the
sole reference I could find); it merely says "The << operator writes a
string to an ostream".

Obviously I could just concatenate the strings using the + operator.
But I'm wondering - is there some other kind of stream that is
supposed to be used for the purpose I want? Or some stream
manipulator?

And I'm also wondering whether the observed behavior is correct in the
first place. Like I said, the sole reference I could find is vague,
but I would lean towards my original assumption when interpreting it.
And I certainly hope the behavior is not different from compiler to
compiler!

Thanks.
 
C

Christopher

I have a few std::strings that I am using to store raw binary data,
each of which may very well include null bytes at any point or
points. I want to slap them together into a single string, so I tried
a std::eek:stringstream:

std::eek:stringstream oss;
oss << x << y << z;
std::string result ( oss.str() );

The result shows that feeding the ostringstream with a string just
takes the string's data up to (and not including) the first null
character.

Stroustrop's book is vague about what should happen (at least in the
sole reference I could find); it merely says "The << operator writes a
string to an ostream".

Obviously I could just concatenate the strings using the + operator.
But I'm wondering - is there some other kind of stream that is
supposed to be used for the purpose I want? Or some stream
manipulator?

And I'm also wondering whether the observed behavior is correct in the
first place. Like I said, the sole reference I could find is vague,
but I would lean towards my original assumption when interpreting it.
And I certainly hope the behavior is not different from compiler to
compiler!

Thanks.


std::strings are designed for text as is std::stringstream, so your
first problem lies in the fact that you want to stream binary data as
text. I'd fix that design flaw before proceeding, else you would just
be hacking around the problem.

Your second problem, which may or may not be, is why are you wanting
to _stream_ binary data? Is it necessary?
I can think of some cases where it is, but it might not be in your
case.

If you look at the iostream hierarchy you can spot your problem.

Depending what your binary data represents, what you want to do with
it, and how complicated you want your solution, I'd look into either
using a different existing iostream type or designing my own through
derivation.
 
E

Eric Pruneau

I have a few std::strings that I am using to store raw binary data,
each of which may very well include null bytes at any point or
points. I want to slap them together into a single string, so I tried
a std::eek:stringstream:

std::eek:stringstream oss;
oss << x << y << z;
std::string result ( oss.str() );

The result shows that feeding the ostringstream with a string just
takes the string's data up to (and not including) the first null
character.

Stroustrop's book is vague about what should happen (at least in the
sole reference I could find); it merely says "The << operator writes a
string to an ostream".

Obviously I could just concatenate the strings using the + operator.
But I'm wondering - is there some other kind of stream that is
supposed to be used for the purpose I want? Or some stream
manipulator?

And I'm also wondering whether the observed behavior is correct in the
first place. Like I said, the sole reference I could find is vague,
but I would lean towards my original assumption when interpreting it.
And I certainly hope the behavior is not different from compiler to
compiler!

Thanks.

You may want to consider using bitset if you don;t need to dynamically
change the size of your container. It is specifically desing to hold binary
data.

If you need iterator(something that bitset doesn't have), an other option is
deque<bool> which is an STL container. But the underlying memory isn't
contiguous.

Finally there is vector<bool> but there is 2 problems with that one:
1. it is not an STL container
2. it doesn't contain bool (by the way deque<bool> really hold bool)

Eric Pruneau
 
J

James Kanze

I have a few std::strings that I am using to store raw binary
data, each of which may very well include null bytes at any
point or points.

As others have pointed out, that's probably a design error.
However...
I want to slap them together into a single string, so I tried
a std::eek:stringstream:

And concatenation would be a lot more reasonable.
std::eek:stringstream oss;
oss << x << y << z;
std::string result ( oss.str() );
The result shows that feeding the ostringstream with a string
just takes the string's data up to (and not including) the
first null character.

With what implementation? As far as ostringstream and string
are concerned, '\0' is just a character, like any other. I just
did a quick test on four different implementations, and I can't
find one where this doesn't work correctly. (IIRC, VC++ 6.0 had
some problems with '\0' in strings. But they generally resulted
a program crash. And of course, no one uses such an old
compiler.)
Stroustrop's book is vague about what should happen (at least
in the sole reference I could find); it merely says "The <<
operator writes a string to an ostream".

What more should it say?

Note that an arbitrary ostream may not be able to handle a '\0';
an ofstream opened in text mode, for example, is only required
to handle printable characters and a small set of control
characters ('\n', '\t', etc.), and some of these (e.g. '\n') may
have special behavior. But an ostringstream can handle anything
a string can handle, and a string can obviously handle '\0'.
Obviously I could just concatenate the strings using the +
operator. But I'm wondering - is there some other kind of
stream that is supposed to be used for the purpose I want? Or
some stream manipulator?

Well, the basic stream abstraction is text formatting, so using
an ostringstream here seems a bit of abuse. But there's no
reason the exact code you post would fail.
And I'm also wondering whether the observed behavior is
correct in the first place. Like I said, the sole reference I
could find is vague, but I would lean towards my original
assumption when interpreting it. And I certainly hope the
behavior is not different from compiler to compiler!

I'm wondering how you actually determined that there was a
problem. Did you, per chance, use some other function which
does treat '\0' specially? Because it's clear that in this
case, '\0' is just a character like another, and I can't find an
implementation where this isn't the case.
 
C

coomberjones

As others have pointed out, that's probably a design error.
However...

I guess I don't understand why. Strings are designed to be able to
handle binary data, including nulls.
And concatenation would be a lot more reasonable.

I guess I don't understand why. I understand that concatenation will
achieve the result I want, but I don't get why it's "a lot more
reasonable". They both (seemingly) should accomplish the same thing:
slap these three strings together into one string.
With what implementation?

Microsoft Visual C++ 6.0.
And of course, no one uses such an old compiler.

Unfortunately for me, you're wrong.
What more should it say?

If the behavior is supposed to be as VC++ 6.0 is doing (and to be
clear, I'm not saying it's supposed to be that way), then it should
explicitly say "writes the contents of a string, up to but not
including the first null, to an ostream". Because it is NOT doing
what it says - i.e. "writing a string". It's writing PART of a
string.
Note that an arbitrary ostream may not be able to handle a '\0';
an ofstream opened in text mode, for example, is only required
to handle printable characters and a small set of control
characters ('\n', '\t', etc.), and some of these (e.g. '\n') may
have special behavior. But an ostringstream can handle anything
a string can handle, and a string can obviously handle '\0'.

Which is what I had in mind when I originally coded it. But that's
not how it has worked out.
I'm wondering how you actually determined that there was a
problem.

I did what I said: I had three strings, one of which contained a null,
and I fed them into an ostringstream, exactly as described. And the
result was that, for the string that contained a null, the output only
included up to the null. I'm having a hard time understanding what
you're not getting about that.
Did you, per chance, use some other function which
does treat '\0' specially?

I did exactly what I said: Slapped three strings together using an
ostringstream.
Because it's clear that in this
case, '\0' is just a character like another, and I can't find an
implementation where this isn't the case.

Well, there is one.
 
C

Christopher

I guess I don't understand why. Strings are designed to be able to
handle binary data, including nulls.

According to who? Not according to any book that I've read.
Especially, not according to the book written by a fellow who was
involved in writing the standard for the I/O portion of C++. I believe
he had, "The Standard C++ IO Library" in the title, although I can't
remember the exact title off the top of my head. Good book.

std::strings are in no way shape or form _designed_ to handle binary
data. Some _streams_ are, but once you cross over into the string side
of the IO library, you are dealing with objects specifically designed
to format, translate, and transport text.

A stringstream implements formatting of text, translation of text ,
and transport of text from an external device to an internal buffer
and vica versa (with memory being the "external device"). Your design
problems are in that binary does not require formatting or translation
from an external format to an internal one, unless you are going
across machines or endianess, whereas text does.

I do not have to convert 0x0FA4 to something else in another local.
Nor do I have to convert a byte to a tab character or figure out that
tab character is 3 spaces. I do not have to figure out that a 0x0000
really means the end of a c style string. etc.

Microsoft Visual C++ 6.0


Unfortunately for me, you're wrong.


Well, if you are using VC6, then that is a problem. Since it causes a
plethora of undefined and unexpected behavior, it would be a waste of
time for anyone to try and decypher what is really happening. The best
solution would be to stop using VC6. Especially since MS is offering
free express edition IDEs free of charge that do, (for the most part),
conform to standards. If you are required to use an IDE that is more
than a decade old by an employer, it might be time to change
employers.

To summarize you are using the wrong development tool and you have
chosen the wrong STL data type. Whether or not you chose to believe us
is up to you.
 
J

James Kanze

I guess I don't understand why. Strings are designed to be
able to handle binary data, including nulls.

It gives the wrong message. For better or for worse, the name
string suggests text data of some sort; there might be reasons
for inserting '\0' characters into text data, but it is still
text data.

Of course, the actual interface of std::string doesn't really do
much to support text (as opposed to just any data), but then, it
doesn't really do much to make it preferable to std::vector---in
fact, I find in practice that I'm often drawn to using
I guess I don't understand why.

Because that's the way you concatenate strings, normally.
ostringstream is for formatting: converting non-text into text,
more or less (but also e.g. ensuring field widths, etc.). If
you're not actually formatting, using it passes the wrong
message to the reader.
I understand that concatenation will achieve the result I
want, but I don't get why it's "a lot more reasonable". They
both (seemingly) should accomplish the same thing: slap these
three strings together into one string.

That's not what ostringstream says. Ostringstream says format
this data to a specific textual format.
Microsoft Visual C++ 6.0.

Don't put '\0' characters in an std::string with VC++ 6.0.
Period. It's not just ostringstream which doesn't work; it's a
lot of the functions. And it often results in program crashes,
not just incorrect results. This is a known bug, which has been
fixed.
Unfortunately for me, you're wrong.

No one should. The compiler is something like 10 years old. It
was a very good compiler when it came out, but the situation has
evolved considerably since then: we have an ISO standard, and we
make far more extensive use of templates than we did back then.

And of course, Microsoft offers the newer versions free, so
there's absolutely no reason for not upgrading.
If the behavior is supposed to be as VC++ 6.0 is doing (and to
be clear, I'm not saying it's supposed to be that way), then
it should explicitly say "writes the contents of a string, up
to but not including the first null, to an ostream". Because
it is NOT doing what it says - i.e. "writing a string". It's
writing PART of a string.

Yes and no. VC++ 6.0 doesn't support null characters in
strings, period. That's an error in the library implementation.
But since they don't support null characters in strings, they're
effectively copying all of the string.
Which is what I had in mind when I originally coded it. But
that's not how it has worked out.

That's because you're using a pre-standard compiler. There are
a lot of things that "won't work out as expected" with VC++ 6.0,
if you expect standard C++. (Remember, the compiler is older
than the standard.)
I did what I said: I had three strings, one of which contained
a null, and I fed them into an ostringstream, exactly as
described. And the result was that, for the string that
contained a null, the output only included up to the null.
I'm having a hard time understanding what you're not getting
about that.

What output? What did you do to determine that the result was
shorter than it should be? (Of course, if you're using VC++
6.0, the only thing that surprises me here is that it didn't
actually crash.)
I did exactly what I said: Slapped three strings together
using an ostringstream.

But how did you determine the results? From the code you
posted, it's impossible to say what you're actually seeing.
Well, there is one.

There was one. There were, in fact, a lot of them, many years
back. I don't know of any in the last ten years, however.
 
J

James Kanze

* Christopher:
According to who?

You make an interesting point, in a certain sense[1].

Very much. I've yet to really figure out what std::string was
designed for: it doesn't really have much support for text
(despite its name), and as a more general data container, I
can't imagine a case where std::vector wouldn't be superior.
(I've been playing around with UTF-8 a lot lately, and I've
found that although the interface uses std::string, internally,
std::vector< Byte >, where Byte is a typedef for unsigned char,
works a lot better, most of the time.)

Of course, if you're talking more generally, the word "string"
is usually associated with text, and I wouldn't normally expect
a string to be able to handle binary data (although it should be
able to contain any character data, including that which
contains a '\0' character).
 
J

Jerry Coffin

(e-mail address removed)>, (e-mail address removed)
says...

[ ... ]
Of course, the actual interface of std::string doesn't really do
much to support text (as opposed to just any data), but then, it
doesn't really do much to make it preferable to std::vector---in
fact, I find in practice that I'm often drawn to using
std::vector< char > for my strings, because it corresponds
better to what I'm doing.

This is one place Ada did things right, IMO. Most languages have arrays
and strings that have special capabilities. For Ada they just designed
enough capabilities into arrays to allow an array of characters to be a
usable string.

[ ... use VC++ 6.0 ]
No one should. The compiler is something like 10 years old. It
was a very good compiler when it came out, but the situation has
evolved considerably since then: we have an ISO standard, and we
make far more extensive use of templates than we did back then.

And of course, Microsoft offers the newer versions free, so
there's absolutely no reason for not upgrading.

Unfortunately, this isn't true. While Microsoft's newer _compilers_ are
substantially improved, their current IDEs are complete garbage compared
to VC++ 6.0. For developing Windows applications the newer IDEs lose a
_great_ deal more productivity than you gain from the newer compilers.

I realize you'd generally advise using emacs instead of either.
Personally, I'd as soon find a rewarding new career as a speed bump or
a test subject for experimental dental procedures.
 
J

James Kanze

(e-mail address removed)>, (e-mail address removed)
says...
[ ... ]
Of course, the actual interface of std::string doesn't
really do much to support text (as opposed to just any
data), but then, it doesn't really do much to make it
preferable to std::vector---in fact, I find in practice that
I'm often drawn to using std::vector< char > for my strings,
because it corresponds better to what I'm doing.
This is one place Ada did things right, IMO. Most languages
have arrays and strings that have special capabilities. For
Ada they just designed enough capabilities into arrays to
allow an array of characters to be a usable string.

You mean you can to things like case indifferent comparisons
(locale dependent, of course) on an array in Ada?

This doesn't mean that I think that Ada did the wrong thing.
I'm not sure we know enough, even today, to be able to
reasonably specify what a class representing text strings should
look like. And at least the Ada solution is honest, and doesn't
pretend to offer something it doesn't, nor does it commit the
language to something that is likely to turn out wrong in the
long run.
[ ... use VC++ 6.0 ]
No one should. The compiler is something like 10 years old.
It was a very good compiler when it came out, but the
situation has evolved considerably since then: we have an
ISO standard, and we make far more extensive use of
templates than we did back then.
And of course, Microsoft offers the newer versions free, so
there's absolutely no reason for not upgrading.
Unfortunately, this isn't true. While Microsoft's newer
_compilers_ are substantially improved, their current IDEs are
complete garbage compared to VC++ 6.0. For developing Windows
applications the newer IDEs lose a _great_ deal more
productivity than you gain from the newer compilers.

And all of the Microsoft IDE's lose a great deal of productivity
when compared to a real development system (with powerful
scripting languages to automate a lot of the tasks).
I realize you'd generally advise using emacs instead of either.

Actually, I don't use emacs unless I have to. But it is a
powerful editor; a powerful editor is an important part of a
development environment, but I've been told that Microsoft's
code editor is also very powerful. (I've never used it, since I
prefer using the same editor everywhere, and it's not available
on most of the platforms I work on.) But there's more to a
development environment than just the editor.
Personally, I'd as soon find a rewarding new career as a speed
bump or a test subject for experimental dental procedures.

Emacs isn't quite that bad, but it is a good way to get carpal
tunnel syndrome. (At least for me---others don't seem to have
that problem.) You do need something that is as powerful as
emacs for editing tasks, however.
 
J

Jerry Coffin

[ ... ]
You mean you can to things like case indifferent comparisons
(locale dependent, of course) on an array in Ada?

Yes, I believe so. My experience was (mostly) with Ada 83, which had the
right capabilities for its arrays that you could do this, but you had to
write all the actual locales and such yourself.

If I'm not mistaken, Ada 95 added a fairly reasonable character handling
package to handle things like case conversion on a locale-dependent
basis.

Doing a bit of looking confirms that there is, in fact, an
ada.characters.handling package. Glancing it over, it looks like it's at
least on the same general order of capabilities as those in C++, though
I don't see any immediate indication that it's drastically better.
This doesn't mean that I think that Ada did the wrong thing.
I'm not sure we know enough, even today, to be able to
reasonably specify what a class representing text strings should
look like. And at least the Ada solution is honest, and doesn't
pretend to offer something it doesn't, nor does it commit the
language to something that is likely to turn out wrong in the
long run.

Right -- my point wasn't that Ada has anything beyond the state of the
art elsewhere, only that its arrays provide most of the capabilities to
do string handling that's on a par with most other languages.

[ ... ]
And all of the Microsoft IDE's lose a great deal of productivity
when compared to a real development system (with powerful
scripting languages to automate a lot of the tasks).

Microsoft's IDEs allow you to write scripts in a variant of Visual Basic
that seems to be adequate for most tasks. If you really want to, you can
also write code in C, C++, etc., as a plug in.

If there's another environment that really provides greater
productivity, I'd love to know about it -- but so far, nothing else
anybody's suggested has worked out particularly well for me.
 
J

James Kanze

[ ... ]
And all of the Microsoft IDE's lose a great deal of productivity
when compared to a real development system (with powerful
scripting languages to automate a lot of the tasks).
Microsoft's IDEs allow you to write scripts in a variant of
Visual Basic that seems to be adequate for most tasks.

But not for running them on a Sparc under Solaris:).
(Seriously, I suspect that that would answer most objections.
Except the portability one, of course.
If you really want to, you can also write code in C, C++,
etc., as a plug in.
If there's another environment that really provides greater
productivity, I'd love to know about it -- but so far, nothing
else anybody's suggested has worked out particularly well for
me.

Well, the best environment is generally the one you know best.
I certainly wouldn't be as productive with Microsoft's IDE as I
am with my Unix based toolkit---at least until I got to know it
as well.

(Another real problem in my case is age. Learning a new
editor---to the point where it is your fingers which do the
thinking---is about like learning a musical instrument, and at
60, it's a lot harder than at 20.)
 
J

Jerry Coffin

On Jun 1, 5:27 pm, Jerry Coffin <[email protected]> wrote:

[ ... ]
But not for running them on a Sparc under Solaris:).
(Seriously, I suspect that that would answer most objections.
Except the portability one, of course.

Yup -- no question that it's not portable. At the same time, if somebody
really wanted it on Unix, I can't see where it'd really be substantially
more difficult than a lot of other things to duplicate. OTOH, much of
what it does is more or less Windows-specific in any case -- X doesn't
really have/support a direct analog of a Windows message handler, so if
you supported development for Unix, you'd nearly have to make some
fairly substantial changes in how things work in any case.

[ ... ]
Well, the best environment is generally the one you know best.
I certainly wouldn't be as productive with Microsoft's IDE as I
am with my Unix based toolkit---at least until I got to know it
as well.

(Another real problem in my case is age. Learning a new
editor---to the point where it is your fingers which do the
thinking---is about like learning a musical instrument, and at
60, it's a lot harder than at 20.)

No doubt about that -- I'm only in my 40's, but I already find it harder
to learn some new things than I used to (especially, as you point out,
some things that involve muscle memory).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,060
Latest member
BuyKetozenseACV

Latest Threads

Top