Mystery String

J

Jonathan Shan

Hello everyone,

What is this string?

{name = "172.16.0.240\000\000\000"}

name was declared using this:
char name[16];
Why do I count more than 16 elements?

Jonathan Shan
 
F

Frederick Gotham

Jonathan Shan posted:
Hello everyone,

What is this string?

{name = "172.16.0.240\000\000\000"}


This is effectively a series of null-terminated strings one after another
in memory. It could be constructed manually as follows:

char *Construct(void)
{
char const str1[] = "172.16.0.240";
char const str2[] = "00";
char const str3[] = "00";
char const str4[] = "00";

char static buf[sizeof str1 + sizeof str2 + sizeof str3 + sizeof str4];

char *p = buf;
char const *source;

for(source = str1; *p++ = *source++; );
for(source = str2; *p++ = *source++; );
for(source = str3; *p++ = *source++; );
for(source = str4; *p++ = *source++; );

return buf;
}

Why do I count more than 16 elements?


"sizeof" will tell you that you have 22 elements.

"strlen" will report a string length of 12.

It's to do with "null terminator".
 
A

Andrew Poelstra

Hello everyone,

What is this string?

{name = "172.16.0.240\000\000\000"}

Looks to be the same as "172.16.0.240" with a bunch of '0's and 0s
after the end.
name was declared using this:
char name[16];
Why do I count more than 16 elements?

Because the person who wrote the code was dropping acid at the time.
Where did you find this?
 
J

Jonathan Shan

Andrew said:
Because the person who wrote the code was dropping acid at the time.
Where did you find this?

Given a file called "new.txt". Inside the file is this:
172.16.0.240: alot of text here. more more more more more.....

Here is the code that exhibits the problem:

FILE *input;
input = fopen("new.txt", "r");
if (input == NULL)
{
printf("error in opening file \n");
exit(1);
}
char inputname[16];
fscanf(input, "%[^:]", inputname);

What I am trying to do is get exactly "172.16.0.240" to be inside the
inputname array.

Jonathan Shan
 
C

Christoph Schweers

Jonathan said:
Hello everyone,

What is this string?

{name = "172.16.0.240\000\000\000"}

name was declared using this:
char name[16];
Why do I count more than 16 elements?

because you're wrong? ;-)
"\000" is the octal representation of a byte with the Value '0'
 
E

Eric Sosman

Frederick Gotham wrote On 07/25/06 12:47,:
Jonathan Shan posted:

If this is used in a context like

char name[] = "172.16.0.240\000\000\000";

.... it is equivalent to

char name[] = { '1', '7', '2', '.',
'1', '6', '.',
'0', '.',
'2', '4', '0',
0, 0, 0, /* explicit */
0 /* implied */
};
This is effectively a series of null-terminated strings one after another
in memory. It could be constructed manually as follows:

char *Construct(void)
{
char const str1[] = "172.16.0.240";
char const str2[] = "00";
char const str3[] = "00";
char const str4[] = "00";

No; each of these three should be initialized with "".
char static buf[sizeof str1 + sizeof str2 + sizeof str3 + sizeof str4];

char *p = buf;
char const *source;

for(source = str1; *p++ = *source++; );
for(source = str2; *p++ = *source++; );
for(source = str3; *p++ = *source++; );
for(source = str4; *p++ = *source++; );

return buf;
}


Why do I count more than 16 elements?



"sizeof" will tell you that you have 22 elements.

... if you're counting in base seven ;-) The string
literal specifies fifteen characters (nine digits, three
decimal points, and three zero bytes), with an additional
trailing zero byte donated by the compiler (assuming a
suitable context).
"strlen" will report a string length of 12.

It's to do with "null terminator".

.... which can be spelled in several ways in a literal.
Here are some examples:

char[] a = "(\0)"; /* the usual form */
char[] b = "(\00)";
char[] c = "(\000)"; /* as in the example code */
char[] d = "(\x0)";
char[] e = "(\x00)";
char[] f = "(\x000000000000000000000000)";

/* all produce the same value as ... */
char[] g = { '(', 0, ')', 0 };
 
E

Eric Sosman

Andrew Poelstra wrote On 07/25/06 12:47,:
Looks to be the same as "172.16.0.240" with a bunch of '0's and 0s
after the end.

Almost: there are no trailing '0' digits, but four
trailing zero bytes (three explicit, one implied in most
contexts). The escape sequence

\ octaldigit octaldigit octaldigit

in a string literal designates only one character.
 
L

Lew Pitcher

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Jonathan said:
Hello everyone,

What is this string?

{name = "172.16.0.240\000\000\000"}

It is a 12 character text, followed by 4 \0 ("nul") characters
name was declared using this:
char name[16];
Why do I count more than 16 elements?

Because you don't understand string escape sequences??

I count 15 explicitly specified elements,
'1', '7', '2', '.', '1', '6', '.', '0', '.', '2', '4', '0',
\000, \000, \000
and 1 implicit element (the \0 termination character for the string)

At least, that's the way I read it.

- --
Lew Pitcher

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (MingW32) - WinPT 0.11.12

iD8DBQFExlS+agVFX4UWr64RAsIwAJ9Tg2rqy/QMWk+UmtbgX933Xh9OnQCeMwNG
H/iHlgiYfloOJMEIok9DS7s=
=8q8s
-----END PGP SIGNATURE-----
 
E

Eric Sosman

Jonathan Shan wrote On 07/25/06 12:55,:
Andrew Poelstra wrote:

Because the person who wrote the code was dropping acid at the time.
Where did you find this?


Given a file called "new.txt". Inside the file is this:
172.16.0.240: alot of text here. more more more more more.....

Here is the code that exhibits the problem:

FILE *input;
input = fopen("new.txt", "r");
if (input == NULL)
{
printf("error in opening file \n");
exit(1);
}
char inputname[16];
fscanf(input, "%[^:]", inputname);

What I am trying to do is get exactly "172.16.0.240" to be inside the
inputname array.

What do you mean by "exactly?" Assuming no I/O errors,
inputname[] will contain the twelve characters you mentioned,
but will also contain a '\0' in the thirteenth position (at
inputname[12]) and indeterminate junk in the final three
positions (inputname[13], [14], [15]).

If you want your twelve characters and nothing else, you
are doomed: The array has sixteen elements, and there is no
way to make the final four just vanish somehow. If the tenant
moves out of apartment 11B, the room does not cease to exist.

However, if you intend to use string-oriented functions
like strlen() and printf("%s", inputname) and so on, the '\0'
byte is necessary and the three junk bytes are harmless. What
do you want to do with inputname after the code above?
 
A

Andrew Poelstra

Andrew said:
Because the person who wrote the code was dropping acid at the time.
Where did you find this?

Given a file called "new.txt". Inside the file is this:
172.16.0.240: alot of text here. more more more more more.....

Here is the code that exhibits the problem:

FILE *input;
input = fopen("new.txt", "r");
if (input == NULL)
{
printf("error in opening file \n");
exit(1);
}
char inputname[16];
fscanf(input, "%[^:]", inputname);

What I am trying to do is get exactly "172.16.0.240" to be inside the
inputname array.

Where did you get "172.16.0.240\000\000" from?
 
A

Andrew Poelstra

Andrew Poelstra wrote On 07/25/06 12:47,:

Almost: there are no trailing '0' digits, but four
trailing zero bytes (three explicit, one implied in most
contexts). The escape sequence

\ octaldigit octaldigit octaldigit

in a string literal designates only one character.

Now I feel stupid. :)

"172.16.0.240\000\000\000" is indeed 16 bytes, and so there aren't any
problems. The OP (and myself and a few others) counted wrong.
 
S

spibou

Jonathan said:
Given a file called "new.txt". Inside the file is this:
172.16.0.240: alot of text here. more more more more more.....

Here is the code that exhibits the problem:

FILE *input;
input = fopen("new.txt", "r");
if (input == NULL)
{
printf("error in opening file \n");
exit(1);
}
char inputname[16];
fscanf(input, "%[^:]", inputname);

What I am trying to do is get exactly "172.16.0.240" to be inside the
inputname array.

You have probably realized by now that there is really no problem.
As Eric Sosman pointed out what you do get is a string which
for all practical purposes contains "exactly" 172.16.0.240

The only thing that I find somewhat surprising is that the trailing
bytes of the string are all 0.

Spiros Bousbouras
 
A

Al Balmer

Jonathan said:
Given a file called "new.txt". Inside the file is this:
172.16.0.240: alot of text here. more more more more more.....

Here is the code that exhibits the problem:

FILE *input;
input = fopen("new.txt", "r");
if (input == NULL)
{
printf("error in opening file \n");
exit(1);
}
char inputname[16];
fscanf(input, "%[^:]", inputname);

What I am trying to do is get exactly "172.16.0.240" to be inside the
inputname array.

You have probably realized by now that there is really no problem.
As Eric Sosman pointed out what you do get is a string which
for all practical purposes contains "exactly" 172.16.0.240

The only thing that I find somewhat surprising is that the trailing
bytes of the string are all 0.

Not surprising. The program was probably loaded into cleared memory. I
wouldn't count on it happening on the next iteration :)
 
R

Rod Pemberton

Eric Sosman said:
Andrew Poelstra wrote On 07/25/06 12:47,:

Almost: there are no trailing '0' digits, but four
trailing zero bytes (three explicit, one implied in most
contexts). The escape sequence

What do you mean by: "one implied in most contexts" ? I'm interested in the
contexts where there wouldn't be an implict zero byte. C90? C99? I think
it'd be pretty hard for a compiler to implement string literals without a
terminating nul... The nul's in the string are embedded nul's, so I don't
think the compiler could legally get away with ignoring the implicit nul.


from N1124:
6.4.5 String literals
"5 In translation phase 7, a byte or code of value zero is appended to each
multibyte
character sequence that results from a string literal or literals.66) "
....
"66) A character string literal need not be a string (see 7.1.1), because a
null character may be embedded in
it by a \0 escape sequence."



Mystery string indeed...


Rod Pemberton
 
E

Eric Sosman

Rod said:
What do you mean by: "one implied in most contexts" ? I'm interested in the
contexts where there wouldn't be an implict zero byte. C90? C99? I think
it'd be pretty hard for a compiler to implement string literals without a
terminating nul... The nul's in the string are embedded nul's, so I don't
think the compiler could legally get away with ignoring the implicit nul.

char name[15] = "172.16.0.240\000\000\000";

.... has only the three explicit trailing zero bytes, not an
implicit fourth. C99 section 6.7.8 paragraph 14; note the
phrase "if there is room."
 
K

Keith Thompson

Eric Sosman said:
Rod Pemberton wrote: [...]
What do you mean by: "one implied in most contexts" ? I'm
interested in the
contexts where there wouldn't be an implict zero byte. C90? C99? I think
it'd be pretty hard for a compiler to implement string literals without a
terminating nul... The nul's in the string are embedded nul's, so I don't
think the compiler could legally get away with ignoring the implicit nul.

char name[15] = "172.16.0.240\000\000\000";

... has only the three explicit trailing zero bytes, not an
implicit fourth. C99 section 6.7.8 paragraph 14; note the
phrase "if there is room."

Or, less confusingly:

char s[3] = "abc";
 
R

Rod Pemberton

Keith Thompson said:
Eric Sosman said:
Rod Pemberton wrote: [...]
What do you mean by: "one implied in most contexts" ? I'm
interested in the
contexts where there wouldn't be an implict zero byte. C90? C99? I think
it'd be pretty hard for a compiler to implement string literals without a
terminating nul... The nul's in the string are embedded nul's, so I don't
think the compiler could legally get away with ignoring the implicit
nul.

char name[15] = "172.16.0.240\000\000\000";

... has only the three explicit trailing zero bytes, not an
implicit fourth. C99 section 6.7.8 paragraph 14; note the
phrase "if there is room."

Or, less confusingly:

char s[3] = "abc";

So, the answer to my question is: none in any context. With the specific
example, there is never "one implied in most contexts"...


Rod Pemberton
 
K

Keith Thompson

Rod Pemberton said:
Keith Thompson said:
Eric Sosman said:
Rod Pemberton wrote: [...]
What do you mean by: "one implied in most contexts" ? I'm
interested in the contexts where there wouldn't be an implict
zero byte. C90? C99? I think it'd be pretty hard for a
compiler to implement string literals without a terminating
nul... The nul's in the string are embedded nul's, so I don't
think the compiler could legally get away with ignoring the
implicit nul.

char name[15] = "172.16.0.240\000\000\000";

... has only the three explicit trailing zero bytes, not an
implicit fourth. C99 section 6.7.8 paragraph 14; note the
phrase "if there is room."

Or, less confusingly:

char s[3] = "abc";

So, the answer to my question is: none in any context. With the specific
example, there is never "one implied in most contexts"...

Presumably the phrase "most contexts" refers to contexts beyond the
specific example.

See C99 6.7.8p14:

An array of character type may be initialized by a character
string literal, optionally enclosed in braces. Successive
characters of the character string literal (including the
terminating null character if there is room or if the array is of
unknown size) initialize the elements of the array.

So the implicit '\0' is (theoretically) always there in the string
literal itself, but it won't necessarily be copied to an initialized
object. Of course in a case like

char s[3] = "abc";

the compiler is free to omit the '\0' from the generated code.
 
E

Eric Sosman

Rod said:
Eric Sosman said:
Rod Pemberton wrote:
[...]

What do you mean by: "one implied in most contexts" ? I'm
interested in the
contexts where there wouldn't be an implict zero byte. C90? C99? I
think
it'd be pretty hard for a compiler to implement string literals without
a
terminating nul... The nul's in the string are embedded nul's, so I
don't
think the compiler could legally get away with ignoring the implicit
nul.
char name[15] = "172.16.0.240\000\000\000";

... has only the three explicit trailing zero bytes, not an
implicit fourth. C99 section 6.7.8 paragraph 14; note the
phrase "if there is room."

Or, less confusingly:

char s[3] = "abc";


So, the answer to my question is: none in any context. With the specific
example, there is never "one implied in most contexts"...

Aha! I think I see where we've failed to communicate.
I wrote that the confusing string had "four trailing zero
bytes (three explicit, one implied in most contexts"). You
may have thought this meant there was a one-valued (or maybe
'1'-valued) byte implied in most contexts, but what I meant
was

- There are four zero-valued trailing bytes.

- Three of those four are explicit in the string literal,
each denoted by an escape sequence \000.

- One more zero-valued byte, the fourth, is implicit,
appended by the compiler in most contexts in which a
string literal appears.

("Most" because there is a situation in which the compiler
does not supply a fourth zero, namely, when the literal is used
as an initializer for an array of known size that doesn't have
enough room for the implied zero. There may be a metaphysical
question about whether the literal does or does not posess that
final zero, which is then elided during compilation and has no
visible "image" in the run-time code; I don't do metaphysics.)

So: When I wrote "one" I was not referring to a one-valued
byte; I was referring to "one" of the four zero-valued bytes.
Chalk it up to the ambiguities of natural language.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top