Decrement a given pointer.

Michael Press · Jul 9, 2013

Given a pointer, p, can I set pm1 = p-1 and use pm1
without worrying that an implementation will object
or do other than what one expects? The idea is to
get offset one arrays, e.g.,

void
f(int *p, int l)
{
int i;
int t;
int *pm1 = p - 1;

for(i = 1; i <= l; i++)
t = pm1;
}

int a[] = {1, 2, 3};
int na = sizeof a / sizeof *a;

void
doit(void)
{
int *b = a;

f(a, na);
f(b, na);
}

Keith Thompson · Jul 9, 2013

Michael Press said:
Given a pointer, p, can I set pm1 = p-1 and use pm1
without worrying that an implementation will object
or do other than what one expects? The idea is to
get offset one arrays, e.g.,

void
f(int *p, int l)
{
int i;
int t;
int *pm1 = p - 1;

for(i = 1; i <= l; i++)
t = pm1;
}

int a[] = {1, 2, 3};
int na = sizeof a / sizeof *a;

void
doit(void)
{
int *b = a;

f(a, na);
f(b, na);
}

No. Given a pointer to an array element, you can safely construct a
pointer to any element of the array. You can also safely construct a
pointer just past the end of the array, but you can't dereference it.
Using pointer arithmetic to construct a pointer outside the bounds of
the array has undefined behavior. (A single object is treated as a
one-element array for purposes of pointer arithmetic.)

It's fairly likely to work on most systems, but it's not guaranteed.

I seem to recall that the book "Numerical Recipes in C" used this
technique to translate Fortran code into C.

Joe Pfeiffer · Jul 9, 2013

Michael Press said:
Given a pointer, p, can I set pm1 = p-1 and use pm1
without worrying that an implementation will object
or do other than what one expects? The idea is to
get offset one arrays, e.g.,

<snip>

Keith has already given what I expect is the right answer to your
question, but I'd go on to ask "why?". Unless there's a *really* good
reason, you should simply use the language as designed.

Having said that, I'll mention that I have occasion to use what amount
to offset 1 arrays on a current project: I'm obtaining altimeter data
from an altimeter that has a parameter that goes from 1 to 9; it seems
less error-prone to me to use a 10 element array and just waste element
0 than to mess with macros or other code to add and subract 1 from an
index in multiple places. But you'll notice that this approach to it
doesn't depend on tricky code having undefined behavior do the right
thing.

Siri Cruise · Jul 9, 2013

Keith Thompson <[email protected]> said:
It's fairly likely to work on most systems, but it's not guaranteed.

Some CPUs use address registers and check address validity when the address
computed rather than waiting for address load. If the array is at the beginning
some memory partition, these kinds of CPUs can get an address fault.

Eric Sosman · Jul 9, 2013

Given a pointer, p, can I set pm1 = p-1 and use pm1
without worrying that an implementation will object
or do other than what one expects? The idea is to
get offset one arrays, e.g.,
[...]

This is Question 6.17 on the comp.lang.c Frequently
Asked Questions (FAQ) page at <http://www.c-faq.com/>.

Ian Collins · Jul 9, 2013

Siri said:
Some CPUs use address registers and check address validity when the address
computed rather than waiting for address load. If the array is at the beginning
some memory partition, these kinds of CPUs can get an address fault.

Not in the context of the OP, the address p-1 was never dereferenced.

Eric Sosman · Jul 9, 2013

Not in the context of the OP, the address p-1 was never dereferenced.

Even computing it (trying to compute it) yields undefined
behavior. FAQ 6.17.

Siri Cruise · Jul 9, 2013

Ian Collins said:
Not in the context of the OP, the address p-1 was never dereferenced.

It's not guarenteed to work because some CPUs validate addresses on computation
before dereference.

Keith Thompson · Jul 9, 2013

Ian Collins said:
Not in the context of the OP, the address p-1 was never dereferenced.

Siri Cruise said that the validity of the address is checked when it's
computed, not when it's dereferenced, so yes, that kind of CPU would get
an address fault.

(Which is consistent with my statement, since most CPUs don't do that.
Still, I certainly don't recommend counting on that.)

glen herrmannsfeldt · Jul 9, 2013

Siri Cruise said:
(snip on generating p-1 where p is a pointer to something)

It's not guarenteed to work because some CPUs validate addresses
on computation before dereference.

Click to expand...

Do you know of any actual such CPUs in current use?

The most popular CPU that uses anything similar wraps the
address on computation, such that it works.

I used protected mode on the 80286 in OS/2, and then on the 486 for
a while before OS/2 2.0 came out. On the 80286 in protected mode,
addresses consist of a segment selector, selecting a segment descriptor,
and a 16 bit offset into the segment. If you subtract, the offset will
wrap, and when you add one again, will wrap back again.

The CPU will validate a segment selector when loaded into a segment
register, except that segment 0 is the null segment selector.
(A special case in hardware.)

If a system does bounds checking, it is possible that the
bounds check will notice, but even then it is likely done
only at dereference.

But yes, it violates the standard but most likely will work.

-- glen

Siri Cruise · Jul 9, 2013

[QUOTE="Keith Thompson said:
Not in the context of the OP, the address p-1 was never dereferenced.

Siri Cruise said that the validity of the address is checked when it's
computed, not when it's dereferenced, so yes, that kind of CPU would get
an address fault.

(Which is consistent with my statement, since most CPUs don't do that.
Still, I certainly don't recommend counting on that.)[/QUOTE]

An alternative is something like
int A_[m,n];
#define A(j,k) A_[(j)-1,(k)-1]

James Kuyper · Jul 9, 2013

On 07/08/2013 10:44 PM, Siri Cruise wrote:
....

An alternative is something like
int A_[m,n];
#define A(j,k) A_[(j)-1,(k)-1]

That's equivalent to

int A_[n];
#define A(j,k) A_[(k)-1]

Were you thinking of Fortran?

Stephen Sprunk · Jul 9, 2013

Do you know of any actual such CPUs in current use?

AS/400 is commonly cited here as an example of such a system.

The most popular CPU that uses anything similar wraps the address on
computation, such that it works.

I used protected mode on the 80286 in OS/2, and then on the 486 for a
while before OS/2 2.0 came out. On the 80286 in protected mode,
addresses consist of a segment selector, selecting a segment
descriptor, and a 16 bit offset into the segment. If you subtract,
the offset will wrap, and when you add one again, will wrap back
again.

You seem to be assuming that the wrapped pointer will not exceed the
segment limit and generate an exception. That is probably true on x86
systems, where the segment limit is almost always (unsigned)(-1), but
probably not on other segmented systems.

The CPU will validate a segment selector when loaded into a segment
register, except that segment 0 is the null segment selector. (A
special case in hardware.)

True, but the selector would remain valid if the original pointer were
valid. i286 doesn't validate the offset part, which is likely to be
invalid in this case, before a load or store is performed; doing so
would be impossible since it doesn't have dedicated address registers.

S

glen herrmannsfeldt · Jul 9, 2013

(snip, someone wrote)

AS/400 is commonly cited here as an example of such a system.

Yes, they might do it. Do they have a C compiler?

You seem to be assuming that the wrapped pointer will not exceed the
segment limit and generate an exception. That is probably true on x86
systems, where the segment limit is almost always (unsigned)(-1), but
probably not on other segmented systems.

I am not sure at all what it does in huge mode, I never used that.
In large mode, the offset is in an ordinary register, and will
wrap back again before the dereference.

The segment selector has to exist when the value is loaded into
a segment register, but the offset isn't checked until an actual
dereference (load or store).

In 32 bit protected mode, I believe it is usual to set the limit to,
as you note, (unsigned)(-1), but in 16 bit mode, no. In 32 bit
mode, you have the PMMU to validate addresses, in 16 bit the only
validation is the segment selector limit.

I believe it is usual to do pointer assignment without loading
the pointer into a segment register. As above, loading the
segment register would require the segment be valid.

True, but the selector would remain valid if the original pointer were
valid. i286 doesn't validate the offset part, which is likely to be
invalid in this case, before a load or store is performed; doing so
would be impossible since it doesn't have dedicated address registers.

Well, there are instructions that load both a segment register and
another register with a segment/offset pair. In that case it could
be done, but I don't believe it is done. It would be extra work
that isn't necessary.

I don't know AS/400 addressing enough to know if it is necessary
to validate early.

-- glen

glen herrmannsfeldt · Jul 9, 2013

There are other problems that can happen besides faults
happening when you form the address.

Consider this loop to traverse an array of structures backwards:

struct huge *p;
struct huge bigarray[MAX];

/* WRONG! */
for (p = &bigarray[MAX-1]; p >= &bigarray[0]; p--) {
... do something with struct huge
pointed at by p ...;
}

In huge mode, it should work, but not in large mode.
Huge mode decrements the segment selector when the offset wraps.
(Much extra code to do that, so I try not to use it.)

The segment selector will be invalid, but I believe the arithmetic
is not done in segment registers. (There is no decrement operation
on segment registers.)

In order for this loop to stop p has to equal &bigarray[-1], and
this value has to be less than &bigarray[0]. In a situation where
(a) pointers are compared as unsigned numbers, (b) global data is
allocated starting around virtual address 0, and (c) there isn't
much other global data compared to the size of a struct huge,
&bigarray[-1] overflows to a large positive number. The loop never
terminates. (Well, when p is set to a large positive number there's
a good chance no memory is allocated there, so the body of the
loop will segfault.)

Again, large but not huge. Huge mode has to compare both the segment
and offset of the pointer. Large mode only the offset.

This problem was actually observed on a Motorola 68000 processor,
one that generally behaves like you "expect" rather than quirky
behavior the standard allows.

The 68000 is a 16 bit processor, but able to address more than 64K.
I don't remember quite how they did it.

Some time before the 68020, I used a 68010 system with a custom MMU.

-- glen

Siri Cruise · Jul 10, 2013

AS/400 is commonly cited here as an example of such a system.

Yes, they might do it. Do they have a C compiler?[/QUOTE]

It doesn't matter whether you think this is the result of stupid design. What
matters is some vendor with enough influence with ANSI got this caveat written
into the C standard. Code that violates it may run on 99% of all machines; but
it is still code not guaranteed for 100%. If you're happy with that and so are
your customers, go for it. I write code that only runs on Unix or even just
MacOSX. My customers pay for that, so I'm fine with being nonstandard.

I added my comment simply to explain why such an odd rule exists. I once worked
on CDC computers with address registers so I happen to be aware of these issues.
I have no comment on whether this is a good idea.

I avoid the issue by letting array indices go out of bounds instead of pointers,
such as
#define A(j,k) A_[(j)-1][(k)-1]
or
for (int j=n-1; j>=0; j--) f(B[j]);

Stephen Sprunk · Jul 10, 2013

The 68000 is a 16 bit processor, but able to address more than 64K. I
don't remember quite how they did it.

The m68k is a 32-bit processor, at least in the sense it presented
32-bit registers and a 32-bit address space to the programmer. The
first implementation used pairs of 16-bit registers with carry, but that
was invisible to the programmer; code continued to work as-is when
ported to later implementations that had true 32-bit registers.

The m68k had separate address and data registers (to save encoding
bits), so it was possible for the CPU to fault when loading or
manipulating an invalid pointer even without dereferencing it.

S

Michael Press · Jul 10, 2013

Joe Pfeiffer said:
<snip>

Keith has already given what I expect is the right answer to your
question, but I'd go on to ask "why?". Unless there's a *really* good
reason, you should simply use the language as designed.

Indexing into a heap. The top of the heap is heap[1].
The two subsidiary nodes to heap[k] are
heap[2 * k] and heap[2 * k + 1].

Having said that, I'll mention that I have occasion to use what amount
to offset 1 arrays on a current project: I'm obtaining altimeter data
from an altimeter that has a parameter that goes from 1 to 9; it seems
less error-prone to me to use a 10 element array and just waste element
0 than to mess with macros or other code to add and subract 1 from an
index in multiple places.

I often have arrays that naturally start at 1 and do the same
as you: waste the array entry at index 0. Sometimes I do not
have the choice.

But you'll notice that this approach to it
doesn't depend on tricky code having undefined behavior do the right
thing.

That is why I asked.

Michael Press · Jul 10, 2013

Do you know of any actual such CPUs in current use?

The most popular CPU that uses anything similar wraps the
address on computation, such that it works.

There are other problems that can happen besides faults
happening when you form the address.

Consider this loop to traverse an array of structures backwards:

struct huge *p;
struct huge bigarray[MAX];

/* WRONG! */
for (p = &bigarray[MAX-1]; p >= &bigarray[0]; p--) {
... do something with struct huge
pointed at by p ...;
}[/QUOTE]

So write

for (int k = MAX; k-- > 0; ) {

Joe Pfeiffer · Jul 10, 2013

Michael Press said:
Joe Pfeiffer said:

<snip>

Keith has already given what I expect is the right answer to your
question, but I'd go on to ask "why?". Unless there's a *really* good
reason, you should simply use the language as designed.

Click to expand...

Indexing into a heap. The top of the heap is heap[1].
The two subsidiary nodes to heap[k] are
heap[2 * k] and heap[2 * k + 1].

Ah, should have thought of that one. While it's not quite as elegant as
the standard scheme, using heap[2*k + 1] and heap[2*k + 2] works just
fine with 0-offset arrays and and doesn't involve weird messing with
pointers.

I often have arrays that naturally start at 1 and do the same
as you: waste the array entry at index 0. Sometimes I do not
have the choice.

And, of course, simply rooting your heap at heap[1] and wasting heap[0]
works just fine here as well.

Array of structs function pointer	10	Jul 16, 2023
Adding adressing of IPv6 to program	1	Feb 16, 2023
Comparison of Integer and Pointer (that's supposed to be an Integer). Where did I go wrong?	0	Nov 19, 2022
Drawing missing in bitmap in a pure C win32 program	4	Jun 3, 2023
What is the counterpart of this C pointer programme in C#?	2	Mar 18, 2021
Lexical Analysis on C++	1	Oct 31, 2023
pointer arithmetic	16	Feb 21, 2014
Need help! Following code isnt working fully Comparison of integer and pointer	0	Nov 20, 2022

Decrement a given pointer.

Michael Press

Keith Thompson

Joe Pfeiffer

Siri Cruise

Eric Sosman

Ian Collins

Eric Sosman

Siri Cruise

Keith Thompson

glen herrmannsfeldt

Siri Cruise

James Kuyper

Stephen Sprunk

glen herrmannsfeldt

glen herrmannsfeldt

Siri Cruise

Stephen Sprunk

Michael Press

Michael Press

Joe Pfeiffer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads