ptrdiff_t

S

Stefan Ram

I have written a function to copy a substring:

char * salsub( char const * const s, char const * const p )
{ ptrdiff_t const l = p - s;
char * const m = malloc(( size_t )( l + 1 ));
if( m ){ strncpy( m, s,( size_t )l ); m[ l ]= 0; }
return m; }

Now, AFAIK, ISO-C does not guarantee that a difference
of pointers can be represented by a value of the type
"ptrdiff_t". I would like my function to return 0
(not allocating anything) if this is the case for the
difference between p and s here. How could I detect
this case?

Then, possibly PTRDIFF_MAX might be larger than
SIZE_MAX, so the cast ( size_t )( l + 1 ) or
( size_t )l might have the "wrong" value. I also would
like the function to return 0 and not to allocate
anything in this case. How could this be done?
Possibly casting to size_t and then back to ptrdiff_t,
to see if it still has the same value?
 
E

Eric Sosman

Stefan said:
I have written a function to copy a substring:

char * salsub( char const * const s, char const * const p )
{ ptrdiff_t const l = p - s;
char * const m = malloc(( size_t )( l + 1 ));
if( m ){ strncpy( m, s,( size_t )l ); m[ l ]= 0; }
return m; }

Now, AFAIK, ISO-C does not guarantee that a difference
of pointers can be represented by a value of the type
"ptrdiff_t". I would like my function to return 0
(not allocating anything) if this is the case for the
difference between p and s here. How could I detect
this case?

Then, possibly PTRDIFF_MAX might be larger than
SIZE_MAX, so the cast ( size_t )( l + 1 ) or
( size_t )l might have the "wrong" value. I also would
like the function to return 0 and not to allocate
anything in this case. How could this be done?
Possibly casting to size_t and then back to ptrdiff_t,
to see if it still has the same value?

It appears that the arguments `s' and `p' are
supposed to point to two characters in the same string,
and that `p' points to the later character. It follows
that `p - s' will be nonnegative and no greater than the
length of the original string, and since that length is
less than SIZE_MAX the pointer difference will fit in a
`size_t' with no trouble. I think you should make `l'
a `size_t' and stop worrying.

I'd also suggest that you use memcpy() instead of
strncpy(): you know how many characters you want to copy
and you don't need strncpy()'s extra feature of detecting
a zero byte, so why pay for it?
 
S

Stefan Ram

Eric Sosman said:
char * salsub( char const * const s, char const * const p )
{ ptrdiff_t const l = p - s;
char * const m = malloc(( size_t )( l + 1 ));
if( m ){ strncpy( m, s,( size_t )l ); m[ l ]= 0; }
return m; }
It appears that the arguments `s' and `p' are supposed to point
to two characters in the same string,

Yes, they are.
and that `p' points to
the later character. It follows that `p - s' will be
nonnegative and no greater than the length of the original
string, and since that length is less than SIZE_MAX the pointer
difference will fit in a `size_t' with no trouble. I think you
should make `l' a `size_t' and stop worrying.

I see, thank you.
I'd also suggest that you use memcpy() instead of strncpy():
you know how many characters you want to copy and you don't
need strncpy()'s extra feature of detecting a zero byte, so why
pay for it?

I think I will start to use "memcpy".
 
P

pete

Eric said:
Stefan said:
I have written a function to copy a substring:

char * salsub( char const * const s, char const * const p )
{ ptrdiff_t const l = p - s;
char * const m = malloc(( size_t )( l + 1 ));
if( m ){ strncpy( m, s,( size_t )l ); m[ l ]= 0; }
return m; }

Now, AFAIK, ISO-C does not guarantee that a difference
of pointers can be represented by a value of the type
"ptrdiff_t". I would like my function to return 0
(not allocating anything) if this is the case for the
difference between p and s here. How could I detect
this case?

Then, possibly PTRDIFF_MAX might be larger than
SIZE_MAX, so the cast ( size_t )( l + 1 ) or
( size_t )l might have the "wrong" value. I also would
like the function to return 0 and not to allocate
anything in this case. How could this be done?
Possibly casting to size_t and then back to ptrdiff_t,
to see if it still has the same value?

It appears that the arguments `s' and `p' are
supposed to point to two characters in the same string,
and that `p' points to the later character. It follows
that `p - s' will be nonnegative and no greater than the
length of the original string, and since that length is
less than SIZE_MAX the pointer difference will fit in a
`size_t' with no trouble.

However, the type of the expression (p - s), is ptrdiff_t.
If ptrdiff_t can't represent the difference between p and s,
then the problem will be in the (p - s) expression itself,
not just in the assignment to 'l'.
 
C

CBFalconer

Stefan said:
I have written a function to copy a substring:

char * salsub( char const * const s, char const * const p )
{ ptrdiff_t const l = p - s;
char * const m = malloc(( size_t )( l + 1 ));
if( m ){ strncpy( m, s,( size_t )l ); m[ l ]= 0; }
return m; }

Now, AFAIK, ISO-C does not guarantee that a difference
of pointers can be represented by a value of the type
"ptrdiff_t". I would like my function to return 0
(not allocating anything) if this is the case for the
difference between p and s here. How could I detect
this case?

Then, possibly PTRDIFF_MAX might be larger than
SIZE_MAX, so the cast ( size_t )( l + 1 ) or
( size_t )l might have the "wrong" value. I also would
like the function to return 0 and not to allocate
anything in this case. How could this be done?
Possibly casting to size_t and then back to ptrdiff_t,
to see if it still has the same value?

You are apparently attacking the problem from the wrong direction.
While you cannot legally compute a difference between two pointers
that do not point within the same object, there is no guarantee
that any such difference will be zero, in fact on most systems it
will not. The very act of attempting such a computation leads to
undefined behaviour, and the system is quite justified in
destroying itself and the machinery on which it runs.

Presumably the caller knows the capacities of the strings involved
(capacity is not the dynamic length). You should start from this
knowledge.
 
E

Eric Sosman

pete said:
However, the type of the expression (p - s), is ptrdiff_t.
If ptrdiff_t can't represent the difference between p and s,
then the problem will be in the (p - s) expression itself,
not just in the assignment to 'l'.

Thanks for the correction, but I'm going to ignore it
and I think the O.P. should, too. To guard against possible
undefined behavior in the pointer subtraction, I think you
would need to change

size_t l = p - s;

to something like

size_t l = 0;
const char *q = s;
while (q++ != p)
++l;

.... all on the off-chance that somebody wants to copy a
2GB (typically) substring out of a >2GB string *and* the
platform does something bizarre with the subtraction. The
precaution seems excessive, out of proportion to the risk.
Yet, there are people who line their hats with tin foil ...

The issue could be avoided by redesigning the function
interface along the lines implied by CB Falconer's reply.
IMHO, the two-pointer interface is poor not because of the
remote possibility of subtraction misbehaving, but because
of the greater possibility that the programmer could make
a mistake by passing two unrelated pointers, or passing
them in the wrong order. A signature like

char *substr(const char *string,
size_t start_offset, size_t limit_offset)

would be harder to get wrong and would allow the function
to do a certain amount of sanity checking, too.
 
S

Stefan Ram

Eric Sosman said:
IMHO, the two-pointer interface is poor not because of the
remote possibility of subtraction misbehaving, but because
of the greater possibility that the programmer could make
a mistake by passing two unrelated pointers, or passing
them in the wrong order. A signature like
char *substr(const char *string,
size_t start_offset, size_t limit_offset)
would be harder to get wrong and would allow the function
to do a certain amount of sanity checking, too.

The position of a substring to be extracted might have been
obtained as a result of a call to strstr. Thus, it would be a
char * as returned by strstr. The caller of substr with the
above interface would have to calculate the start_offset from
that char * himself and therefore have the same problem of
whether PTRDIFF_MAX <= SIZE_MAX.

Maybe it would be the best to document that reasonable
assumption:

int main( void )
{ if( PTRDIFF_MAX > SIZE_MAX )
fprintf( stderr, "Sorry. My C code was written for "
"implementations where SIZE_MAX >= PTRDIFF_MAX.\n" );
else /* ... */; }

Which leaves the question whether the comparison ( PTRDIFF_MAX
 
C

Charlie Gordon

Thanks for the correction, but I'm going to ignore it
and I think the O.P. should, too. To guard against possible
undefined behavior in the pointer subtraction, I think you
would need to change

size_t l = p - s;

to something like

size_t l = 0;
const char *q = s;
while (q++ != p)
++l;

That's ridiculous !
Unnecessarily slow, not to mention the possible case of q > p !
p and s point to the same array, you assume p > s, what can be wrong with
"size_t size = p - s" ?
Is this the kind of defensive programming that makes Java so slow ?
 
E

Eric Sosman

Charlie said:
That's ridiculous !
Unnecessarily slow, not to mention the possible case of q > p !
p and s point to the same array, you assume p > s, what can be wrong with
"size_t size = p - s" ?

I think you've overlooked some of the context in this
thread. The person who corrected me pointed out that the
very act of evaluating `p - s' can produce undefined behavior
if `p' and `s' point to array elements that are sufficiently
far apart. Also, if you'll re-read my recommendation you'll
discover that I'm advising the programmer to ignore this
possibility, to walk on the wild side, to live in hope that
the arrays will never be that big or that if they are the
undefined behavior will be "the right thing."

For absolute 100% safety (not a mere 99.999999999999%),
I can think of no alternative to the loop.

Is this the kind of defensive programming that makes Java so slow ?

Java deals with this particular issue by legislating it
out of existence. There is no pointer arithmetic, hence no
need for `ptrdiff_t'. The representations of all the atomic
types are specified; no padding bits or exotic number systems
are allowed for. Java even specifies what happens in cases
like arithmetic overflow, whereas C cannot.

In particular, Java arrays are limited to 2**31 elements.
Whether this was a good idea only time will tell.

</off-topic>
 
C

Charlie Gordon

Eric Sosman said:
I think you've overlooked some of the context in this
thread. The person who corrected me pointed out that the
very act of evaluating `p - s' can produce undefined behavior
if `p' and `s' point to array elements that are sufficiently
far apart.

That is also ridiculous !
for ptrdiff_t to accomodate all possible pointer differences (inside the same
array), it may need to be larger than size_t, yet the conversion of a positive
value back to a size_t should not produce undefined behaviour.
This unsigned stuff is so bogus !
Also, if you'll re-read my recommendation you'll
discover that I'm advising the programmer to ignore this
possibility, to walk on the wild side, to live in hope that
the arrays will never be that big or that if they are the
undefined behavior will be "the right thing."

For absolute 100% safety (not a mere 99.999999999999%),
I can think of no alternative to the loop.

Sure, but writing such extreme examples will lead cautious programmers to
believe that it might be "safer" of "needed" for reasons they cannot fathom...
There are too many newbies on this forum for such hair splitting discussions
IMHO.
<off-topic>

Actually, this is On Topic (see below ;-)
Java deals with this particular issue by legislating it
out of existence. There is no pointer arithmetic, hence no
need for `ptrdiff_t'. The representations of all the atomic
types are specified; no padding bits or exotic number systems
are allowed for. Java even specifies what happens in cases
like arithmetic overflow, whereas C cannot.

In particular, Java arrays are limited to 2**31 elements.
Whether this was a good idea only time will tell.

I was not making a reference to the java language itself, but to Sun's virtual
machine... implemented in C if I remember correctly ?
 
E

Eric Sosman

Charlie said:
Eric Sosman said:
[...]
I think you've overlooked some of the context in this
thread. The person who corrected me pointed out that the
very act of evaluating `p - s' can produce undefined behavior
if `p' and `s' point to array elements that are sufficiently
far apart.

That is also ridiculous !
for ptrdiff_t to accomodate all possible pointer differences (inside the same
array), it may need to be larger than size_t, yet the conversion of a positive
value back to a size_t should not produce undefined behaviour.

ptrdiff_t is not required to be able to accommodate all
possible pointer differences. See 6.5.6 paragraph 9; note the
conditionals in the third and fourth sentences. Also see
7.18.3 paragraph 2.
This unsigned stuff is so bogus !

It's the signedness of ptrdiff_t that makes the trouble,
not the unsignedness of size_t. Conversion of a valid value
to size_t is no problem; it's the original generation of an
invalid ptrdiff_t value that yields undefined behavior.
Actually, this is On Topic (see below ;-)

No, it's not.
I was not making a reference to the java language itself, but to Sun's virtual
machine... implemented in C if I remember correctly ?

No, you don't.
 
L

Lawrence Kirby

....

For absolute 100% safety (not a mere 99.999999999999%),
I can think of no alternative to the loop.

The solution would be to take a step backwards and design the code to
avoid the need for pointer differencing in the first place. E.g. represent
p as an offset from s, or at least maintain an offset in addition to p and
s. No doubt we could invent situations where this is difficult e.g. where
the pointers are provided by code we don't control but there are often
ways around the problem even then.

Lawrence
 
A

Albert van der Horst

You are apparently attacking the problem from the wrong direction.
While you cannot legally compute a difference between two pointers
that do not point within the same object, there is no guarantee
that any such difference will be zero, in fact on most systems it
will not. The very act of attempting such a computation leads to
undefined behaviour, and the system is quite justified in
destroying itself and the machinery on which it runs.

More realistically, a c-compiler from Paranoia inc, would abort
a program with a run time error

Module monkey.c line 211:
{ ptrdiff_t const l = p - s;
^
Fatal error 1793,
attempt to subtract pointers that do not point in the same
array.
Presumably the caller knows the capacities of the strings involved
(capacity is not the dynamic length). You should start from this
knowledge.

Agreed.

Groetjes Albert

--
 
S

Stefan Ram

Albert van der Horst said:
Module monkey.c line 211:
{ ptrdiff_t const l = p - s;
^
Fatal error 1793,
attempt to subtract pointers that do not point in the same
array.

I would be glad to know that the positions of two characters
within the same string always have a difference representable
as size_t.

But I can only confirm that size_t is defined to be the type
of the result of a sizeof-operation.

A null-terminated sequence of characters in memory, however,
does not have to be a char array. Its address might have
been obtained by a call to a system function.

One might be able to assume that the string is sufficiently
small, if it was created using an automatic variable using
"char string[ size ]" or using malloc "malloc( size )".

So, I possible should add this restriction to the
documentation of my function: "Use this substring function
only for strings, which are substrings of either char arrays
or regions of memory obtained by malloc."
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top