Interesting string.resize behavior

V

v4vijayakumar

#include <string>
#include <iostream>
using namespace std;

int main()
{
string str;
str.resize(5);

str[0] = 't';
str[1] = 'e';
str[2] = 's';
str[3] = 't';
str[4] = '\0';

str += "-test2";
str += "-test3";

cout << str << endl;

return 0;
}
 
V

Victor Bazarov

v4vijayakumar said:
#include <string>
#include <iostream>
using namespace std;

int main()
{
string str;
str.resize(5);

str[0] = 't';
str[1] = 'e';
str[2] = 's';
str[3] = 't';
str[4] = '\0';

str += "-test2";
str += "-test3";

cout << str << endl;

return 0;
}

What do you find "interesting" about it? The string is appended
to its *end* keeping all characters that it already has, not to the
"last character before trailing null characters".

I am guessing that you find this behaviour different from that of
a C string. Yes, it's different. Since a C string cannot have any
other way of knowing its size, it has to keep track of the null chars
(since they are considered *terminating*). The C++ 'std::string' has
no such special meaning for the null character. It keeps track of
its size differently.

V
 
C

Clark Cox

You're going to have to state exactly what it is that you find
interesting about this.

#include <string>
#include <iostream>
using namespace std;

int main()
{
string str;
str.resize(5);

str[0] = 't';
str[1] = 'e';
str[2] = 's';
str[3] = 't';
str[4] = '\0';
OK, str now contains the string "test\0".
str += "-test2";
str now contains the string "test\0-test2".
str += "-test3";
str now contains the string "test\0-test2-test3".
 
V

v4vijayakumar

You're going to have to state exactly what it is that you find
interesting about this.
...
string str;
str.resize(5);

str[0] = 't';
str[1] = 'e';
str[2] = 's';
str[3] = 't';
str[4] = '\0';

str += "-test2";
str += "-test3";
cout << str << endl;
...

Well, surprise!

Output is not, "test-test1-test2", but just "test". :)

[ Tried in MS VS 6.0. ]
 
V

Victor Bazarov

v4vijayakumar said:
You're going to have to state exactly what it is that you find
interesting about this.
...
string str;
str.resize(5);

str[0] = 't';
str[1] = 'e';
str[2] = 's';
str[3] = 't';
str[4] = '\0';

str += "-test2";
str += "-test3";
cout << str << endl;
...

Well, surprise!

Output is not, "test-test1-test2", but just "test". :)

[ Tried in MS VS 6.0. ]

That's looks and sounds like a buggy compiler or library. Have
you tried it on any more recent (or just different) one?

V
 
Z

Zeppe

v4vijayakumar said:
You're going to have to state exactly what it is that you find
interesting about this.
...
string str;
str.resize(5);

str[0] = 't';
str[1] = 'e';
str[2] = 's';
str[3] = 't';
str[4] = '\0';

str += "-test2";
str += "-test3";
cout << str << endl;
...

Well, surprise!

Output is not, "test-test1-test2", but just "test". :)

[ Tried in MS VS 6.0. ]
I'm sorry to tell you that is not an interesting string::resize
behaviour. It's a visual studio 6 behaviour. and I would call it *buggy*
behaviour, not interesting ;)

Regards,

Zeppe
 
?

=?ISO-8859-1?Q?Erik_Wikstr=F6m?=

You're going to have to state exactly what it is that you find
interesting about this.
...
string str;
str.resize(5);

str[0] = 't';
str[1] = 'e';
str[2] = 's';
str[3] = 't';
str[4] = '\0';

str += "-test2";
str += "-test3";
cout << str << endl;
...

Well, surprise!

Output is not, "test-test1-test2", but just "test". :)

[ Tried in MS VS 6.0. ]

Well, surprise

Output is test -test2-test3

[ Tried in MS VS2005 ]

:)

As a rule, don't trust anything VS6 does.
 
S

shadowman

v4vijayakumar said:
You're going to have to state exactly what it is that you find
interesting about this.
...
string str;
str.resize(5);

str[0] = 't';
str[1] = 'e';
str[2] = 's';
str[3] = 't';
str[4] = '\0';

str += "-test2";
str += "-test3";
cout << str << endl;
...

Well, surprise!

Output is not, "test-test1-test2", but just "test". :)

[ Tried in MS VS 6.0. ]
with gcc version 3.3.3 (A few years old itself):
> g++ -o resize resize.cpp
> ./resize test-test2-test3
>

Update your compiler.
 
V

Victor Bazarov

Gennaro said:
Erik said:
Well, surprise

Output is test -test2-test3

[ Tried in MS VS2005 ]

:)

With a space... interesting...

That's how that particular cout handles outputting null character.
Nothing to do with the language, I suppose. Implementation- and
platform-specific behaviour.

V
 
J

James Kanze

v4vijayakumar said:
You're going to have to state exactly what it is that you find
interesting about this.
...
string str;
str.resize(5);
str[0] = 't';
str[1] = 'e';
str[2] = 's';
str[3] = 't';
str[4] = '\0';
str += "-test2";
str += "-test3";
cout << str << endl;
...
Well, surprise!
Output is not, "test-test1-test2", but just "test". :)
[ Tried in MS VS 6.0. ]
That's looks and sounds like a buggy compiler or library.

The implementation of std::string in VC++ 6.0 didn't handle '\0'
in std::string correctly. In many cases, in fact, a string with
a '\0' would crash the program.
Have you tried it on any more recent (or just different) one?

This problem isn't present in the current version of the
compiler (at least in the cases where I'd seen it---the code
which didn't work then works now).

Note that a '\0' character in a string can have curious effects
on an output device. You're not allowed to output it to a
stream opened in text mode (like cout), so his results don't
actually prove a bug anywhere. To be sure, he should open a
file in binary mode, output to it, and then verify the contents.
It's quite possible that the phenomena that he is observing has
nothing to do with the bug I mention.
 
J

James Kanze

Gennaro said:
Erik Wikström wrote:
Well, surprise
Output is test -test2-test3
[ Tried in MS VS2005 ]
:)
With a space... interesting...
That's how that particular cout handles outputting null character.
Nothing to do with the language, I suppose. Implementation- and
platform-specific behaviour.

Writing a '\0' character to a text stream is implementation
defined, yes. In practice, I suspect that the system is just
copying the bytes directly to the output device, and that it is
the tty device which determines what you see. But an
implementation can map the '\0' character to something else in a
text file, or even use some system API which treats the buffer
as a null terminated string. Text streams are for text, and
'\0' is not text.
 
V

Victor Bazarov

James said:
[..writing '\0' to a text stream is implementation-defined..]
an
implementation can map the '\0' character to something else in a
text file, or even use some system API which treats the buffer
as a null terminated string. Text streams are for text, and
'\0' is not text.

Can you back this up with anything? Why is '\0' not text? I
cannot find any explicit definition of "text" in the Standard
that would say that '\0' is not text. It's not part of the
basic character set, but that has nothing to do with "not text",
or does it?

V
 
M

Marcus Kwok

James Kanze said:
Note that a '\0' character in a string can have curious effects
on an output device. You're not allowed to output it to a
stream opened in text mode (like cout), so his results don't
actually prove a bug anywhere. To be sure, he should open a
file in binary mode, output to it, and then verify the contents.

I am working on a project where we have to deal with unprintable
characters. The incoming strings are in 7-bit ASCII, so it was easy to
write a function that replaced '\0' with "<NUL>", '\x0D' with "<CR>",
etc. so that we could easily see what characters were being received,
without having to look at hex output all the time.

For example, the original string "test\0-test2-test3" would then be
output as "test<NUL>-test2-test3". Obviously, there is the ambiguity of
whether the original string actually had the sequence '<', 'N', 'U',
'L', '>' or whether it was '\0', but since this was just for our
internal display purposes it wasn't an issue.
 
D

Default User

Victor said:
James said:
[..writing '\0' to a text stream is implementation-defined..]
an
implementation can map the '\0' character to something else in a
text file, or even use some system API which treats the buffer
as a null terminated string. Text streams are for text, and
'\0' is not text.

Can you back this up with anything? Why is '\0' not text? I
cannot find any explicit definition of "text" in the Standard
that would say that '\0' is not text. It's not part of the
basic character set, but that has nothing to do with "not text",
or does it?

Fairer to say that it's not a printable character.




Brian
 
V

Victor Bazarov

Default said:
Victor said:
James said:
[..writing '\0' to a text stream is implementation-defined..]
an
implementation can map the '\0' character to something else in a
text file, or even use some system API which treats the buffer
as a null terminated string. Text streams are for text, and
'\0' is not text.

Can you back this up with anything? Why is '\0' not text? I
cannot find any explicit definition of "text" in the Standard
that would say that '\0' is not text. It's not part of the
basic character set, but that has nothing to do with "not text",
or does it?

Fairer to say that it's not a printable character.

Yes, but does it mean that it cannot be output to a stream opened
in "text" mode?

V
 
D

Default User

Victor said:
Default said:
Victor said:
James Kanze wrote:
[..writing '\0' to a text stream is implementation-defined..]
an
implementation can map the '\0' character to something else in a
text file, or even use some system API which treats the buffer
as a null terminated string. Text streams are for text, and
'\0' is not text.

Can you back this up with anything? Why is '\0' not text? I
cannot find any explicit definition of "text" in the Standard
that would say that '\0' is not text. It's not part of the
basic character set, but that has nothing to do with "not text",
or does it?

Fairer to say that it's not a printable character.

Yes, but does it mean that it cannot be output to a stream opened
in "text" mode?

I think it almost certainly can. I didn't really find much in the C++
standard on the topic, so I fell back to the C standard (C99 draft).

[#2] A text stream is an ordered sequence of characters
composed into lines, each line consisting of zero or more
characters plus a terminating new-line character. Whether
the last line requires a terminating new-line character is
implementation-defined. Characters may have to be added,
altered, or deleted on input and output to conform to
differing conventions for representing text in the host
environment. Thus, there need not be a one-to-one
correspondence between the characters in a stream and those
in the external representation. Data read in from a text
stream will necessarily compare equal to the data that were
earlier written out to that stream only if: the data consist |
only of printing characters and the control characters
horizontal tab and new-line; no new-line character is
immediately preceded by space characters; and the last
character is a new-line character. Whether space characters
that are written out immediately before a new-line character
appear when read in is implementation-defined.


So there seems to be no problem writing a character of any type to a
text stream, although conversion of some characters may take place
(CRLF of course). I think the contention of \0 perhaps being altered is
correct, but I don't think it's correct to call it "not text". Control
characters, of which \0 is one, can be written to text streams.

If we're talking about displays:

5.2.2 Character display semantics

[#1] The active position is that location on a display
device where the next character output by the fputc or
fputwc function would appear. The intent of writing a |
printing character (as defined by the isprint or iswprint
function) to a display device is to display a graphic
representation of that character at the active position and
then advance the active position to the next position on the
current line. The direction of writing is locale-specific.
If the active position is at the final position of a line
(if there is one), the behavior is unspecified.

This is followed by a discussion of several control characters and
their defined behavior, but \0 is not one of them. I'm hesitant to
declare that writing \0 to a display device is undefined behavior. I'd
have to take it to those more expert in reading the standard than I am.




Brian (standards diving on a Friday afternoon)
 
J

James Kanze

James said:
[..writing '\0' to a text stream is implementation-defined..]
an
implementation can map the '\0' character to something else in a
text file, or even use some system API which treats the buffer
as a null terminated string. Text streams are for text, and
'\0' is not text.
Can you back this up with anything?

The standard. The only thing which has defined behavior when
output to a file opened in text mode are printable characters,
horizontal tab and new line. In addition, what happens with
trailing space in a line is not specified, and it's
implementation defined whether you're allowed to close a
non-empty file if the last character written was not a '\n'.

(The C++ standard defines file semantics by reference to the C
standard; this is in §7.9.2/2 of C90.)
Why is '\0' not text?

Because the standard says so.
I
cannot find any explicit definition of "text" in the Standard
that would say that '\0' is not text.

The standard doesn't define text. (At least I don't think it
does.) It defines the required semantics of a file opened in
text mode.
It's not part of the
basic character set, but that has nothing to do with "not text",
or does it?

I don't think so. The only possibly vague point is "printable
character"; I would interpret that to mean something for which
isprint() returns true, at least in some locale.

Note that most implementations actually do define a little bit
more:). In both Unix and Windows, most characters will
probably pass transparently, even in text mode (suppose "C"
locale, anyway). But there can be surprises: try writing a file
with 0x1A somewhere in the middle, then rereading it.
 
J

James Kanze

I think it almost certainly can. I didn't really find much in the C++
standard on the topic, so I fell back to the C standard (C99 draft).
[#2] A text stream is an ordered sequence of characters
composed into lines, each line consisting of zero or more
characters plus a terminating new-line character. Whether
the last line requires a terminating new-line character is
implementation-defined. Characters may have to be added,
altered, or deleted on input and output to conform to
differing conventions for representing text in the host
environment. Thus, there need not be a one-to-one
correspondence between the characters in a stream and those
in the external representation. Data read in from a text
stream will necessarily compare equal to the data that were
earlier written out to that stream only if: the data consist |
only of printing characters and the control characters
horizontal tab and new-line; no new-line character is
immediately preceded by space characters; and the last
character is a new-line character. Whether space characters
that are written out immediately before a new-line character
appear when read in is implementation-defined.
So there seems to be no problem writing a character of any type to a
text stream, although conversion of some characters may take place
(CRLF of course).

That's not what's written above. The standard explicitly
doesn't define any semantics for writing non-printable
characters other than new line and horizontal tab. And when the
standard doesn't define the semantics of something, or
specifically say that it is unspecified or implementation
defined, the behavior is undefined.
I think the contention of \0 perhaps being altered is
correct, but I don't think it's correct to call it "not text". Control
characters, of which \0 is one, can be written to text streams.

But the behavior becomes undefined when you do so.

In practice, I've never had problems with '\0'. But under
Windows, '\032' traditionally did some funny things. The
wording above was introduced specifically to allow such funny
things.
 
V

Victor Bazarov

James said:
James said:
[..writing '\0' to a text stream is implementation-defined..]
an
implementation can map the '\0' character to something else in a
text file, or even use some system API which treats the buffer
as a null terminated string. Text streams are for text, and
'\0' is not text.
Can you back this up with anything?

The standard. The only thing which has defined behavior when
output to a file opened in text mode are printable characters,
horizontal tab and new line. In addition, what happens with
trailing space in a line is not specified, and it's
implementation defined whether you're allowed to close a
non-empty file if the last character written was not a '\n'.

(The C++ standard defines file semantics by reference to the C
standard; this is in §7.9.2/2 of C90.)

I have to admit that I don't have C90 handy, could you *please*
quote it? Thanks! Also, *please* quote the part of the C++
Standard that says that 'ostream' for text output is governed by
the same rules as C streams. Thanks a bunch!
Because the standard says so.


The standard doesn't define text. (At least I don't think it
does.) It defines the required semantics of a file opened in
text mode.

Where?

V
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,040
Latest member
papereejit

Latest Threads

Top