substring

A

Al Bowers

pete said:

In that case s1, satisfies the defintion for "pointer to a string"

N869

7.1.1 Definitions of terms

[#1] A string is a contiguous sequence of characters
terminated by and including the first null character. The
term multibyte string is sometimes used instead to emphasize
special processing given to multibyte characters contained
in the string or to avoid confusion with a wide string. A
pointer to a string is a pointer to its initial (lowest
addressed) character.

No.
Consider the declaration and initialization.
char s1[3] = "123";

You have declared an array of 3 characters and assigned the characters
'1','2','3' to this array. s1[0] has the value '1'. s1[1] has the
value '2'. s1[3] has the value '3'.

Where in this character array is there a contigous sequence of
characters terminated by and including the first null character?

Answer: There is no null character in the array, thus the array
does not represent a string.
 
P

pete

Al said:
In that case s1, satisfies the defintion for "pointer to a string"

N869

7.1.1 Definitions of terms

[#1] A string is a contiguous sequence of characters
terminated by and including the first null character. The
term multibyte string is sometimes used instead to emphasize
special processing given to multibyte characters contained
in the string or to avoid confusion with a wide string. A
pointer to a string is a pointer to its initial (lowest
addressed) character.

No.
Consider the declaration and initialization.
char s1[3] = "123";

You have declared an array of 3 characters and assigned the characters
'1','2','3' to this array. s1[0] has the value '1'. s1[1] has the
value '2'. s1[3] has the value '3'.

Where in this character array is there a contigous sequence of
characters terminated by and including the first null character?

Nowhere in the array.
However, since strings are not confined to arrays,
what difference does it make ?
Answer: There is no null character in the array, thus the array
does not represent a string.

That's been my point all along.
But you snipped the relevant part of the post,
which describes the specific case under dsicussion.
Specifically, we're discussing the case where the program
has determined whether or not s1 and s2 are contiguous,
and only the case where they are contiguous.

char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
puts(s1);
}
 
D

Dan Pop

In said:
Specifically, we're discussing the case where the program
has determined whether or not s1 and s2 are contiguous,
and only the case where they are contiguous.

char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
puts(s1);
}

I entirely agree that, according to the wording of the standard, this
code snippet is correct. There is no consensus in comp.std.c on whether
this is the intent of the standard or not, but the actual wording is
unambiguous. However, if you replace the puts call by a printf call:

printf("%s\n", s1);

you're right into undefined behaviour, because:

s The argument shall be a pointer to an array of character type.
Characters from the array are written up to (but not including) a
terminating null character; if the precision is specified, no more
than that many characters are written. If the precision is not
specified or is greater than the size of the array, the array shall
contain a null character.

Dan
 
R

rihad

In said:
Specifically, we're discussing the case where the program
has determined whether or not s1 and s2 are contiguous,
and only the case where they are contiguous.

char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
puts(s1);
}

I entirely agree that, according to the wording of the standard, this
code snippet is correct. There is no consensus in comp.std.c on whether
this is the intent of the standard or not, but the actual wording is
unambiguous. However, if you replace the puts call by a printf call:

printf("%s\n", s1);

you're right into undefined behaviour, because:

s The argument shall be a pointer to an array of character type.
Characters from the array are written up to (but not including) a
terminating null character; if the precision is specified, no more
than that many characters are written. If the precision is not
specified or is greater than the size of the array, the array shall
contain a null character.

How is it different from the puts() call above? Surely, "array" means a
consecutive "string" of characters, not a C array, otherwise, char *s1 coming
from a successful calloc() would cause undefined behaviour right away when fed
to printf().
 
M

Mark McIntyre

Nowhere in the array.
However, since strings are not confined to arrays,
what difference does it make ?

It makes the difference that puts requires a string, and s1 is not a
string since it has no null terminator. So when puts is putting chars
to stdout, it will read from memory beyond the region allocated to s1,
and this is disallowed.

The fact that somewhere in memory nearby there is a null doesn't mean
that magically s1 becomes a string. It merely means that puts will by
good luck stop sending data to stdout.
That's been my point all along.

I'm confused. Are you agreeing that this is UB or not?
 
T

Thomas Stegen

Mark said:
I'm confused. Are you agreeing that this is UB or not?

I think it is quite clear that this is not as clear as one might
wish.

Have a closer look at the examples given and note that s1 and
s2 indeed does constitute a string if they happen to be
contigous in memory. In particular note that the standard
does not include the mention of the word array when it defines
the term string.
 
C

CBFalconer

rihad said:
pete said:
Specifically, we're discussing the case where the program
has determined whether or not s1 and s2 are contiguous,
and only the case where they are contiguous.

char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
puts(s1);
}

I entirely agree that, according to the wording of the standard,
this code snippet is correct. There is no consensus in
comp.std.c on whether this is the intent of the standard or not,
but the actual wording is unambiguous. However, if you replace
the puts call by a printf call:

printf("%s\n", s1);

you're right into undefined behaviour, because:

s The argument shall be a pointer to an array of character type.
Characters from the array are written up to (but not including)
a terminating null character; if the precision is specified, no
more than that many characters are written. If the precision
is not specified or is greater than the size of the array, the
array shall contain a null character.

How is it different from the puts() call above? Surely, "array" means
a consecutive "string" of characters, not a C array, otherwise, char
*s1 coming from a successful calloc() would cause undefined behaviour
right away when fed to printf().

Very simply. The first checks the contiguity of the two arrays
(not guaranteed) before using the first as a string. The second
simply uses it as a string. The expression "s1 + sizeof s1" is
specifically valid because it points one beyond the actual array.
The expression s2 is valid by definition. You can replace the
puts() call with the printf() call (and vice-versa) without
altering the validity/invalidity of the two fragments.
 
R

Robert Stankowic

Dan Pop said:
In said:
Specifically, we're discussing the case where the program
has determined whether or not s1 and s2 are contiguous,
and only the case where they are contiguous.

char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Just curious - if I don't misunderstand 6.5.9 verse 6 (in N869) the above
statement is correct, and if the expression "s2 == s1 + sizeof s1" yields
true, s1 and s2 form a continguous sequence of 7 char objects which contains
a '\0' in the last position.
Isn't in this case the resulting object the same (except of scope) as if we
wrote
char *s1 = malloc(7);
if(s1)
{
strcpy(s1, "123456");
}
Here we also did not explicitely define an array, but we definitely created
a string.
Do you know any possibility for an implementation to produce different
results (provided the expression "s2 == s1 + sizeof s1" yields true)?

Robert
 
A

Al Bowers

Dan said:
Specifically, we're discussing the case where the program
has determined whether or not s1 and s2 are contiguous,
and only the case where they are contiguous.

char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
puts(s1);
}


I entirely agree that, according to the wording of the standard, this
code snippet is correct. There is no consensus in comp.std.c on whether
this is the intent of the standard or not, but the actual wording is
unambiguous. However, if you replace the puts call by a printf call:

printf("%s\n", s1);

you're right into undefined behaviour, because:

s The argument shall be a pointer to an array of character type.
Characters from the array are written up to (but not including) a
terminating null character; if the precision is specified, no more
than that many characters are written. If the precision is not
specified or is greater than the size of the array, the array shall
contain a null character.

What about some of the string handling functions?

char buf[32];
char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
strcpy(buf,s1);
}

In section:
7.21.1 String function conventions
You have:
If an array is accessed beyond the end of an object, the behavior is
undefined.
 
D

Dan Pop

In said:
In said:
Specifically, we're discussing the case where the program
has determined whether or not s1 and s2 are contiguous,
and only the case where they are contiguous.

char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
puts(s1);
}

I entirely agree that, according to the wording of the standard, this
code snippet is correct. There is no consensus in comp.std.c on whether
this is the intent of the standard or not, but the actual wording is
unambiguous. However, if you replace the puts call by a printf call:

printf("%s\n", s1);

you're right into undefined behaviour, because:

s The argument shall be a pointer to an array of character type.
Characters from the array are written up to (but not including) a
terminating null character; if the precision is specified, no more
than that many characters are written. If the precision is not
specified or is greater than the size of the array, the array shall
contain a null character.

How is it different from the puts() call above?

puts doesn't expect an array, it merely expects a string.

The puts function writes the string pointed to by s to the stream
pointed to by stdout, and appends a new-line character to the output.
The terminating null character is not written.
Surely, "array" means a
consecutive "string" of characters, not a C array, otherwise, char *s1 coming

Array means whatever the standard defines as an array:

* An array type describes a contiguously allocated set of objects
with a particular member object type, called the element type. Array
types are characterized by their element type and by the number of
members of the array.
from a successful calloc() would cause undefined behaviour right away when fed
to printf().

Wrong:

The calloc function allocates space for an array of nmemb objects,
^^^^^
each of whose size is size.

Dan
 
D

Dan Pop

Dan Pop said:
In said:
Specifically, we're discussing the case where the program
has determined whether or not s1 and s2 are contiguous,
and only the case where they are contiguous.

char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Just curious - if I don't misunderstand 6.5.9 verse 6 (in N869) the above
statement is correct, and if the expression "s2 == s1 + sizeof s1" yields
true, s1 and s2 form a continguous sequence of 7 char objects which contains
a '\0' in the last position.
Isn't in this case the resulting object the same (except of scope) as if we
wrote
char *s1 = malloc(7);
if(s1)
{
strcpy(s1, "123456");
}
Here we also did not explicitely define an array, but we definitely created
a string.

The malloc call has created a *single* object that can be treated as an
array of 7 char, while the definitions of s1 and s2 create two different
objects that cannot be treated as a single object, even if adjacent.
Do you know any possibility for an implementation to produce different
results (provided the expression "s2 == s1 + sizeof s1" yields true)?

Of course. Any implementation doing array bound checking *properly*
should object in the s1/s2 case. The big challenge of such an
implementation is NOT to object to the puts call.

Dan
 
D

Dan Pop

Dan said:
Specifically, we're discussing the case where the program
has determined whether or not s1 and s2 are contiguous,
and only the case where they are contiguous.

char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
puts(s1);
}


I entirely agree that, according to the wording of the standard, this
code snippet is correct. There is no consensus in comp.std.c on whether
this is the intent of the standard or not, but the actual wording is
unambiguous. However, if you replace the puts call by a printf call:

printf("%s\n", s1);

you're right into undefined behaviour, because:

s The argument shall be a pointer to an array of character type.
Characters from the array are written up to (but not including) a
terminating null character; if the precision is specified, no more
than that many characters are written. If the precision is not
specified or is greater than the size of the array, the array shall
contain a null character.

What about some of the string handling functions?

char buf[32];
char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
strcpy(buf,s1);
}

In section:
7.21.1 String function conventions
You have:
If an array is accessed beyond the end of an object, the behavior is
undefined.

There is an unfortunate conflict between your identifiers and the
parameter names used by the standard in the description of strcpy.
I'm using s1 and s2 with their meanings in the C standard, below.

The strcpy function copies the string pointed to by s2 (including
^^^^^^^^^^
the terminating null character) into the array pointed to by s1.
^^^^^^^^^

Still no problem, since only the s1 argument is supposed to point to an
array. And buf is large enough to hold a 6 character string.

Dan
 
R

rihad

(By a "C array" I meant char a[] = "hello"; printf("%s\n", a);)
Array means whatever the standard defines as an array:

* An array type describes a contiguously allocated set of objects
with a particular member object type, called the element type. Array
types are characterized by their element type and by the number of
members of the array.

Then given char *p = calloc(1, 1); p points to an array of one char (barring
nomem)? And given char c = 0; &c points to an array of one char? Sorry, but I
read the above as if int i; meant array of 1 int.
Wrong:

The calloc function allocates space for an array of nmemb objects,
^^^^^
each of whose size is size.

Sorry, but I *really* fail to understand why substituting the puts(s1); call
below with printf("%s\n", s1); suddenly invokes undefined behaviour, as you have
pointed out. &s1[0] points to an array of objects. The array is ended by a
((char) 0). Nowhere in the range of [ (s1 + 0) .. (s1 + sizeof s1 + sizeof s2) )
is an uninitialized object being accessed for reading (assuming the if holds
true, which is just a compile time constant IIRC).

char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
puts(s1);
}
 
A

Al Bowers

Dan said:
Dan said:
Specifically, we're discussing the case where the program
has determined whether or not s1 and s2 are contiguous,
and only the case where they are contiguous.

char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
puts(s1);
}


I entirely agree that, according to the wording of the standard, this
code snippet is correct. There is no consensus in comp.std.c on whether
this is the intent of the standard or not, but the actual wording is
unambiguous. However, if you replace the puts call by a printf call:

printf("%s\n", s1);

you're right into undefined behaviour, because:

s The argument shall be a pointer to an array of character type.
Characters from the array are written up to (but not including) a
terminating null character; if the precision is specified, no more
than that many characters are written. If the precision is not
specified or is greater than the size of the array, the array shall
contain a null character.

What about some of the string handling functions?

char buf[32];
char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
strcpy(buf,s1);
}

In section:
7.21.1 String function conventions
You have:
If an array is accessed beyond the end of an object, the behavior is
undefined.


There is an unfortunate conflict between your identifiers and the
parameter names used by the standard in the description of strcpy.
I'm using s1 and s2 with their meanings in the C standard, below.

The strcpy function copies the string pointed to by s2 (including
^^^^^^^^^^
the terminating null character) into the array pointed to by s1.
^^^^^^^^^

Still no problem, since only the s1 argument is supposed to point to an
array. And buf is large enough to hold a 6 character string.

Dan

I agree.
What about?

char buf[32];
char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
strcpy(s1,"Hello");
}
Assuming equality(the equality expression yields 1).
 
R

rihad

rihad said:
&s1[0] points to an array of objects.
The array is ended by a ((char) 0).

The array is terminated by a ((char)'3')
char s1[3] = "123";

The array of objects terminated by a ((char) 0), not s1.
char s1[3] = "123";
char s2[4] = "456";


Given this:

char s[] = "123456", (*p3)[3] = &s;

is calling

printf("%s\n", p3[0]);

illegal, but

printf("%s\n", p3[1]);

is legal?

I'm pretty sure they are both legal, because nowhere is unowned/uninitialized
memory being accessed. Then why can't we assume that in the case of

char s1[3] = "123";
char s2[4] = "456";

and assert(s2 == s1 + sizeof s1);

there's some virtual object s that spans the two objects s1 and s2 and that
object s consitutes a valid C string?
 
D

Dan Pop

In said:
(By a "C array" I meant char a[] = "hello"; printf("%s\n", a);)

C array is *everything* the standard defines as such.
Then given char *p = calloc(1, 1); p points to an array of one char (barring
nomem)?

Yes, in common parlance. A pedant would say that p points to the first
character of an array of one char. However, given the type of p, there
is no place for confusion if one simply uses your wording.
And given char c = 0; &c points to an array of one char?
Yes.

Sorry, but I read the above as if int i; meant array of 1 int.

This is correct, too.

7 For the purposes of these operators, a pointer to an object that
is not an element of an array behaves the same as a pointer to
the first element of an array of length one with the type of
the object as its element type.
Wrong:

The calloc function allocates space for an array of nmemb objects,
^^^^^
each of whose size is size.

Sorry, but I *really* fail to understand why substituting the puts(s1); call
below with printf("%s\n", s1); suddenly invokes undefined behaviour, as you have
pointed out. &s1[0] points to an array of objects. The array is ended by a
((char) 0).

Nope. The array is ended by a character that is NOT a null character.
It is *only* the s2 array that ends with a null character.
Nowhere in the range of [ (s1 + 0) .. (s1 + sizeof s1 + sizeof s2) )
is an uninitialized object being accessed for reading (assuming the if holds
true, which is just a compile time constant IIRC).

It doesn't matter. The standard clearly states that %s expects an array
containing a null character. There is no such character in the s1 array,
therefore the printf call invokes undefined behaviour. It's as simple as
that, whether you get it or not.

An implementation doing array bounds checking *can* detect that the end
of the array has been reached without encountering any null character.
At this point, the implementation is free to do anything it wants,
including making demons fly out of your nose.
char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
puts(s1);
}

OTOH, this is fine because puts() does NOT expect an array. It expects
a sequence of characters terminated by a null characters and it does not
care about how this sequence of characters is allocated. Reread the
definition of "string".

Dan
 
D

Dan Pop

In said:
What about?

char buf[32];
char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
strcpy(s1,"Hello");
}
Assuming equality(the equality expression yields 1).

An obvious case of undefined behaviour: you're writing beyond the end of
the s1 array. A bounds checking implementation is not supposed to be
impressed by the fact that s2 == s1 + sizeof s1.

Dan
 
T

Thomas Stegen

rihad said:
rihad wrote:

&s1[0] points to an array of objects.
The array is ended by a ((char) 0).

The array is terminated by a ((char)'3')

char s1[3] = "123";


The array of objects terminated by a ((char) 0), not s1.

This makes no sense...
char s1[3] = "123";
char s2[4] = "456";


Given this:

char s[] = "123456", (*p3)[3] = &s;

Incompatible pointer types. &s is of type (*)[6].
is calling

printf("%s\n", p3[0]);

Illegal, %s expects a pointer to char not a pointer to
char[3]
illegal, but

printf("%s\n", p3[1]);

Same here.
is legal?

I'm pretty sure they are both legal, because nowhere is unowned/uninitialized
memory being accessed. Then why can't we assume that in the case of

Neither is legal.
char s1[3] = "123";
char s2[4] = "456";

and assert(s2 == s1 + sizeof s1);

In this case s1 and s2 is a valid string. printf expect a null
terminated array. A null terminated array is always a string, but
a string is not necessarily a null terminated array.
there's some virtual object s that spans the two objects s1 and s2 and that
object s consitutes a valid C string?

The definition of a string in C never mentions object nor array. Just
a contigous sequence of chars of which the last one is 0.

puts and printf are different because puts prints a string, while
printf explicitly takes a null terminated array.
 
J

Jeremy Yallop

Thomas said:
puts and printf are different because puts prints a string, while
printf explicitly takes a null terminated array.

I don't believe this is true. Consider the following text from C99
7.1.4 ("Use of library functions"):

If a function argument is described as being an array, the pointer
actually passed to the function shall have a value such that all
address computations and accesses to objects (that would be valid if
the pointer did point to the first element of such an array) are in
fact valid.

In the library section of the standard the word "array" is just a
convenient shorthand to denote array-like objects (including the
object returned from malloc(), for example). You can't draw any
conclusions from the fact that the description of fprintf() uses the
word "array" to describe the pointer-to-string passed as argument and
the description of puts() doesn't.

Jeremy.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top