Checking for an array overflow

A

André Hänsel

Hello list,

while programming for an embedded platform in C, I have an array and a
pointer to a single element of the array:

uint8_t buffer[100], *bp = buffer;

*bp = U0RXBUF; // Read a character
bp++; // Move pointer forward
if (bp - buffer == sizeof(buffer)/sizeof(uint8_t)) // Buffer overflow?
bp = buffer; // Reset pointer to beginning

When I look at the disassembly there is no code generated for "bp =
buffer;".

Is the compiler allowed to assume that bp - buffer will never be 100
because the last element of buffer is 99 and optimize it away? Or did
I make some other mistake?

Regards,
André
 
I

Ian Collins

Hello list,

while programming for an embedded platform in C, I have an array and a
pointer to a single element of the array:

uint8_t buffer[100], *bp = buffer;

*bp = U0RXBUF; // Read a character
bp++; // Move pointer forward
if (bp - buffer == sizeof(buffer)/sizeof(uint8_t)) // Buffer overflow?
bp = buffer; // Reset pointer to beginning

When I look at the disassembly there is no code generated for "bp =
buffer;".

Is the compiler allowed to assume that bp - buffer will never be 100
because the last element of buffer is 99 and optimize it away?

It can't because bp - buffer can be 100. There isn't any array bounds
checking in C.
Or did I make some other mistake?

Is that the exact code?
 
T

Thad Smith

Ian said:
On 07/12/10 10:40 AM, André Hänsel wrote:
while programming for an embedded platform in C, I have an array and a
pointer to a single element of the array:

uint8_t buffer[100], *bp = buffer;

*bp = U0RXBUF; // Read a character
bp++; // Move pointer forward
if (bp - buffer == sizeof(buffer)/sizeof(uint8_t)) // Buffer overflow?
bp = buffer; // Reset pointer to beginning

When I look at the disassembly there is no code generated for "bp =
buffer;".

Is the compiler allowed to assume that bp - buffer will never be 100
because the last element of buffer is 99 and optimize it away?

It can't because bp - buffer can be 100.

More precisely, behavior for bp = buffer+100 is defined and (bp - buffer) is
well defined for that value.
Is that the exact code?

If it is, bp == buffer+1 when tested, so the compiler can safely omit the
assignment (I assume that the real code has a loop somewhere).
 
K

Keith Thompson

Ian Collins said:
while programming for an embedded platform in C, I have an array and a
pointer to a single element of the array:

uint8_t buffer[100], *bp = buffer;

*bp = U0RXBUF; // Read a character
bp++; // Move pointer forward
if (bp - buffer == sizeof(buffer)/sizeof(uint8_t)) // Buffer overflow?
bp = buffer; // Reset pointer to beginning

When I look at the disassembly there is no code generated for "bp =
buffer;".

Is the compiler allowed to assume that bp - buffer will never be 100
because the last element of buffer is 99 and optimize it away?

It can't because bp - buffer can be 100. There isn't any array bounds
checking in C.

More precisely, C doesn't *require* array bounds checking.
It indirectly *permits* it, by making the behavior of any program
that violates array bounds undefined; an implementation can take
advantage of that by defining the behavior any way it likes.

For code similar to the above, I can imagine the compiler assuming that
nothing whose behavior is undefined can have happened, and therefore
chooses to omit code that can only be executed if the behavior is
already undefined.

But for this particular code, I don't think that can happen.
buffer+100 is a perfectly legal value for bp (it points one past
the end of an array, so you can't dereference it, but you don't
try so that's ok).
Is that the exact code?

That's the question. It might also be better to examine the visible
behavior of the code rather than (or in addition to) the generated code.
 
N

nothing

When I look at the disassembly there is no code generated for "bp =
buffer;".

It's probably a conditional branch to the first assignment.

Also, I would have written the code as

#define BUFSIZE 100
uint8_t buffer[BUFSIZE];

void foo () {
static uint8_t *bp = buffer;

bp++ = U0RXBUF;
if(bp >= buffer + BUFSIZE)
bp = buffer;
}

But that's just personal choice.
 
E

Eric Sosman

When I look at the disassembly there is no code generated for "bp =
buffer;".

It's probably a conditional branch to the first assignment.

Also, I would have written the code as

#define BUFSIZE 100
uint8_t buffer[BUFSIZE];

void foo () {
static uint8_t *bp = buffer;

bp++ = U0RXBUF;
if(bp>= buffer + BUFSIZE)
bp = buffer;
}

But that's just personal choice.

Two notable points here: First, the original's division by
sizeof(uint8_t) was a bad idea, and would not have worked at all
had the buffer elements been of some type with a sizeof>1. Second,
the offered correction is missing a rather important asterisk, and
will not compile ...
 
B

Ben Bacarisse

[restoring the original code:]

| uint8_t buffer[100], *bp = buffer;
|
| *bp = U0RXBUF; // Read a character
| bp++; // Move pointer forward
| if (bp - buffer == sizeof(buffer)/sizeof(uint8_t)) // Buffer overflow?
| bp = buffer; // Reset pointer to beginning
[...] the original's division by
sizeof(uint8_t) was a bad idea, and would not have worked at all
had the buffer elements been of some type with a sizeof>1.

Just to clarify (because I was momentarily taken aback) you mean using a
fixed type name (uint8_t) is a bad idea because it would need to be
changed if the buffer element type were changed. I think your remark
could be misconstrued to mean that dividing by the element size is, of
itself, a bad idea in this situation.

<snip>
 
E

Eric Sosman

[restoring the original code:]

| uint8_t buffer[100], *bp = buffer;
|
| *bp = U0RXBUF; // Read a character
| bp++; // Move pointer forward
| if (bp - buffer == sizeof(buffer)/sizeof(uint8_t)) // Buffer overflow?
| bp = buffer; // Reset pointer to beginning
[...] the original's division by
sizeof(uint8_t) was a bad idea, and would not have worked at all
had the buffer elements been of some type with a sizeof>1.

Just to clarify (because I was momentarily taken aback) you mean using a
fixed type name (uint8_t) is a bad idea because it would need to be
changed if the buffer element type were changed. I think your remark
could be misconstrued to mean that dividing by the element size is, of
itself, a bad idea in this situation.

That is exactly the construction I intended -- but alas! I was
guilty of over-hasty reading, and was mistaken in my intent. The
original code was

uint8_t buffer[100], *bp = buffer;
...
if (bp - buffer == sizeof(buffer)/sizeof(uint8_t)) // Buffer overflow?

.... and I said to myself "The writer has forgotten that pointer
subtraction already `divides' by the element size!" Of course, he
had not: His `sizeof(buffer)/sizeof(uint8_t)' is a perfectly valid
calculation of the number of elements in `buffer', and I think I was
expecting to see the more usual `sizeof buffer / sizeof buffer[0]',
and it was evil companions, Your Honor, evil companions who led me
astray, and I ask for clemency because of my clean(ish) arrest
record, and I'll promise faithfully not to do it again -- until
next time.

I will now undertake to disengage my toenails from my tonsils.
 
N

nothing

*bp++, oops

Dean

When I look at the disassembly there is no code generated for "bp =
buffer;".

It's probably a conditional branch to the first assignment.

Also, I would have written the code as

#define BUFSIZE 100
uint8_t buffer[BUFSIZE];

void foo () {
static uint8_t *bp = buffer;

bp++ = U0RXBUF;
if(bp>= buffer + BUFSIZE)
bp = buffer;
}

But that's just personal choice.

Two notable points here: First, the original's division by
sizeof(uint8_t) was a bad idea, and would not have worked at all
had the buffer elements been of some type with a sizeof>1. Second,
the offered correction is missing a rather important asterisk, and
will not compile ...
 
A

André Hänsel

Sorry, sorry, I did not look properly.

It does actually generate code for the assignment, but it does it
entirely inside the registers and then my IDE does not show the
corresponding line of C code:
00E198 4F0D mov.w R15,R13
00E19A 8E0D sub.w R14,R13
00E19C 903D 0064 cmp.w #0x64,R13
00E1A0 2001 jne 0xE1A4
00E1A2 4E0F mov.w R14,R15 <--- Here it is!
00E1A4 4130 ret

Anyway, I thank you for the enjoyable discussion and in addition
learnt the more common "sizeof buffer / sizeof buffer[0]" construction.
 
K

Keith Thompson

André Hänsel said:
Sorry, sorry, I did not look properly.

It does actually generate code for the assignment, but it does it
entirely inside the registers and then my IDE does not show the
corresponding line of C code:
00E198 4F0D mov.w R15,R13
00E19A 8E0D sub.w R14,R13
00E19C 903D 0064 cmp.w #0x64,R13
00E1A0 2001 jne 0xE1A4
00E1A2 4E0F mov.w R14,R15 <--- Here it is!
00E1A4 4130 ret

Anyway, I thank you for the enjoyable discussion and in addition
learnt the more common "sizeof buffer / sizeof buffer[0]" construction.

The lesson here (sorry if I'm stating the obvious) is that the
visible behavior of a program is the only real criterion for
determining whether the compiler is generating correct code.
Examining the generated code can be useful, but as you've seen
optimizing compilers can be sufficiently clever as to obscure what's
going on unless you read the code *very* carefully. The CPU is
better at interpreting machine code than you are (certainly better
than I am).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,158
Latest member
Vinay_Kumar Nevatia
Top