Out-of-bounds access and UB

N

Nudge

Hello,

Someone posted the following code to a different group:

void function2(int a, int b, int c) {
int exchange;
char buffer[3];
char *bufPt;
buffer[0] = 'A';
buffer[3] = 'B';
buffer[40] = 'B';
bufPt = buffer;
bufPt = bufPt + 1;
}

Note: buffer[3] and buffer[40] are out-of-bounds.

I answered: "[...] I believe 'function2' has undefined behavior.
In other words, the compiler is free to do anything it wants [...]"

Then someone commented: "No. Technically the program is fully
conforming. It is executing the program that gives undefined behaviour."

What does "Technically the program is fully conforming" mean?
Was my wording incorrect?

Perhaps source code cannot have undefined behavior, only executable
code can? I am somewhat confused. Could you please shed some light?
 
P

pete

Nudge said:
Hello,

Someone posted the following code to a different group:

void function2(int a, int b, int c) {
int exchange;
char buffer[3];
char *bufPt;
buffer[0] = 'A';
buffer[3] = 'B';
buffer[40] = 'B';
bufPt = buffer;
bufPt = bufPt + 1;
}

Note: buffer[3] and buffer[40] are out-of-bounds.

I answered: "[...] I believe 'function2' has undefined behavior.

It does.
In other words, the compiler is free to do anything it wants [...]"

There are some limitations. See N869, 3.18, [#2]
Then someone commented: "No. Technically the program is fully
conforming.

It isn't.
It is executing the program that gives undefined behaviour."

What does "Technically the program is fully conforming" mean?
Was my wording incorrect?

Perhaps source code cannot have undefined behavior, only executable
code can? I am somewhat confused. Could you please shed some light?

The C program contains undefined behavior.
The source code is the C program.
The rules of C, do not require this particular type of
undefined behavior to be detected by the complier,
so no warnings are required to be generated by the compiler.

N869
3.18
[#1] undefined behavior
behavior, upon use of a nonportable or erroneous program
construct, of erroneous data, or of indeterminately valued
objects, for which this International Standard imposes no
requirements
[#2] NOTE Possible undefined behavior ranges from ignoring
the situation completely with unpredictable results, to
behaving during translation or program execution in a
documented manner characteristic of the environment (with or
without the issuance of a diagnostic message), to
terminating a translation or execution (with the issuance of
a diagnostic message).
 
M

Malcolm

Nudge said:
void function2(int a, int b, int c) {
int exchange;
char buffer[3];
char *bufPt;
buffer[0] = 'A';
buffer[3] = 'B';
buffer[40] = 'B';
bufPt = buffer;
bufPt = bufPt + 1;
}

Note: buffer[3] and buffer[40] are out-of-bounds.

I answered: "[...] I believe 'function2' has undefined behavior.
In other words, the compiler is free to do anything it wants [...]"

Then someone commented: "No. Technically the program is fully
conforming. It is executing the program that gives undefined behaviour."

What does "Technically the program is fully conforming" mean?
Was my wording incorrect?

Perhaps source code cannot have undefined behavior, only executable
code can? I am somewhat confused. Could you please shed some light?
A lot of compilers are pretty stupid, and will simply overwrite the memory
locations that would have corresponded to the right address, had the array
been large enough, with the values given. Needless to say the results of
doing this are unpredictable. If the memory is not used for anything else,
the program may work "correctly", if you overwrite the function return
address you will get a crash, if you corrupt over data you will get wrong
results somewhere else in the program.

The standard just tries to formalise this, by saying that the behaviour is
"undefined".

A clever compiler will detect the out of bounds access, and deliver a
warning. I don't know offhand whether it is allowed to reject the program,
or if it has to compile it to perform the illegal action. It doesn't much
matter, unless you are actually implementing a compiler.

Source code cannot be run, so cannot show any kind of behaviour, defined or
otherwise. A compiler can produce any executable code that does what the
standard says, so if fed a program containing undefined behaviour then
technically it could produce any output whatsoever. In practise it would
either blindly trash memory or, in a protected environment, terminate with
an error message.
 
J

jacob navia

Nudge said:
Hello,

Someone posted the following code to a different group:

void function2(int a, int b, int c) {
int exchange;
char buffer[3];
char *bufPt;
buffer[0] = 'A';
buffer[3] = 'B';
buffer[40] = 'B';
bufPt = buffer;
bufPt = bufPt + 1;
}

Using the lcc-win32 compiler you get:
D:\lcc\mc59\test>lcc -c tub.c
Warning tub.c: 6 indexing array buffer[3] out of bounds (3)
Warning tub.c: 7 indexing array buffer[40] out of bounds (3)
0 errors, 2 warnings
 
K

Keith Thompson

jacob navia said:
Nudge said:
Hello,
Someone posted the following code to a different group:
void function2(int a, int b, int c) {
int exchange;
char buffer[3];
char *bufPt;
buffer[0] = 'A';
buffer[3] = 'B';
buffer[40] = 'B';
bufPt = buffer;
bufPt = bufPt + 1;
}

Using the lcc-win32 compiler you get:
D:\lcc\mc59\test>lcc -c tub.c
Warning tub.c: 6 indexing array buffer[3] out of bounds (3)
Warning tub.c: 7 indexing array buffer[40] out of bounds (3)
0 errors, 2 warnings

Using *any* C compiler, you get undefined behavior. lcc-win32
apparently does some compile-time checking in some cases (I presume it
doesn't do run-time checking). That's a nice feature, but don't let
it lull you into complacency; I doubt that it would be able to issue
the warning if the array size and index value weren't determinable at
compilation time.

(And since Jacob's response is about the behavior of lcc-win32, not
about the C language, it's arguably off-topic -- unless it was meant
as an example of the kind of diagnostics a C compiler can issue.)
 
J

jacob navia

Keith said:
Using *any* C compiler, you get undefined behavior. lcc-win32
apparently does some compile-time checking in some cases (I presume it
doesn't do run-time checking). That's a nice feature, but don't let
it lull you into complacency; I doubt that it would be able to issue
the warning if the array size and index value weren't determinable at
compilation time.

(And since Jacob's response is about the behavior of lcc-win32, not
about the C language, it's arguably off-topic -- unless it was meant
as an example of the kind of diagnostics a C compiler can issue.)

It proves that it is possible to issue diagnostics in some cases
and that compilers could be more explicit about undefined behavior.

To do it at run time, you should use the vector container (that accepts
the [ ] notation but does check the indexes)

jacob
 
K

Keith Thompson

jacob navia said:
Keith said:
Using *any* C compiler, you get undefined behavior. lcc-win32
apparently does some compile-time checking in some cases (I presume it
doesn't do run-time checking). That's a nice feature, but don't let
it lull you into complacency; I doubt that it would be able to issue
the warning if the array size and index value weren't determinable at
compilation time.
(And since Jacob's response is about the behavior of lcc-win32, not
about the C language, it's arguably off-topic -- unless it was meant
as an example of the kind of diagnostics a C compiler can issue.)

It proves that it is possible to issue diagnostics in some cases
and that compilers could be more explicit about undefined behavior.

To do it at run time, you should use the vector container (that accepts
the [ ] notation but does check the indexes)

What vector container? Are you talking about C++ (off-topic here),
something specific to lcc-win32 (also off-topic here), or something
else?
 
J

jacob navia

Containers are data structures that are used to organize storage.
Typical containers are vectors, tables, lists, hash tables, etc.

As you know, those structures are used daily by any serious C programmer.

lcc-win32 supports containers with standard notation

get_element(vector,5)

or using operator overloading

vector[5]

This is surely not blessed by the C standard, but it is used
in other languages (C++, Fortran, C#, etc).
 
F

Flash Gordon

Containers are data structures that are used to organize storage.
Typical containers are vectors, tables, lists, hash tables, etc.

As you know, those structures are used daily by any serious C
programmer.

I've spent complete years without using vectors, tables, lists, hash
tables, etc. I used double buffering and arrays, but nothing involving
any complex data structures.
lcc-win32 supports containers with standard notation

get_element(vector,5)

or using operator overloading

vector[5]

This is surely not blessed by the C standard, but it is used
in other languages (C++, Fortran, C#, etc).

Since it is not part of standard C it is OT here. I'm sure this has been
mentioned to you before.
 
R

Richard Bos

pete said:
Nudge said:
I answered: "[...] I believe 'function2' has undefined behavior.

It does.
In other words, the compiler is free to do anything it wants [...]"

There are some limitations. See N869, 3.18, [#2]

No, there aren't. As the previous paragraph clearly says, undefined
behaviour is behaviour where the Standard makes _no_ demands. A
subsequent note listing some of the more common results of UB does not
invalidate the essential freedom of the implementation to do anything at
all, whether intentionally or accidentally, following UB.
It isn't.

That's a nonsensical remark. Yes, UB occurs when code is executed - but
in this case, we can predict that the code invoking the UB will _always_
be executed.

Richard
 
J

jacob navia

Flash said:
I've spent complete years without using vectors, tables, lists, hash
tables, etc. I used double buffering and arrays, but nothing involving
any complex data structures.

Ahhh OK.
My experience is completely different. Hash tables improve
performance of element access, and list and tables... well I
just can't conceive programming without them.

Sophisticated data structures are essential to programming.
It is difficult to conceive serious programming without them,
unless you just are interested in making a VGA game display...
and even there, display lists, matrix coordinates, determinants
and surface modelling require complex data structures.

But you can of course get away with it by reducing the quality of
the software.

Can you give me an exampkle of a serious program without
*any* of those elements?

jacob
 
P

pete

Richard said:
pete said:
Nudge said:
I answered: "[...] I believe 'function2' has undefined behavior.

It does.
In other words, the compiler is free to do anything it wants [...]"

There are some limitations. See N869, 3.18, [#2]

No, there aren't. As the previous paragraph clearly says, undefined
behaviour is behaviour where the Standard makes _no_ demands. A
subsequent note listing some of the more common results of UB does not
invalidate the essential freedom of the implementation to do anything
at all, whether intentionally or accidentally, following UB.

I'm of the opinion, from reading the note,
that if compilation is terminated,
then a diagnostic message must be issued.


[#2] NOTE Possible undefined behavior ranges from ignoring
the situation completely with unpredictable results, to
behaving during translation or program execution in a
documented manner characteristic of the environment (with or
without the issuance of a diagnostic message), to
terminating a translation or execution (with the issuance of
a diagnostic message).
 
C

CBFalconer

jacob said:
Flash Gordon wrote:
.... snip ...

Ahhh OK.
My experience is completely different. Hash tables improve
performance of element access, and list and tables... well I
just can't conceive programming without them.

Sophisticated data structures are essential to programming.
It is difficult to conceive serious programming without them,
unless you just are interested in making a VGA game display...
and even there, display lists, matrix coordinates, determinants
and surface modelling require complex data structures.

You are perfectly free to use them here, after you or someone else
builds them. These things do not exist in ISO standard C, as
discussed here, and are thus off-topic as language entities.
However there is nothing wrong with discussing use of defined
routines, [1] eg:

void *hshinsert(hshtbl *h, void *item);

with suitable description of the action of hshinsert, and the
definition of hshtbl*. You can even discuss the code that
implements such, which code (in portable standard C) should be
available for the usage discussions. Without that nobody can
adequately criticize the usage code.

[1] That particular routine is part of hashlib, freely available
at: <http://cbfalconer.home.att.net/download/hashlib.zip>
 
K

Keith Thompson

jacob navia said:
Containers are data structures that are used to organize
storage. Typical containers are vectors, tables, lists, hash tables,
etc.

As you know, those structures are used daily by any serious C programmer.

I wasn't asking what a container is; I was asking what "vector
container" you were referring to.
lcc-win32 supports containers with standard notation

get_element(vector,5)

or using operator overloading

vector[5]

This is surely not blessed by the C standard, but it is used
in other languages (C++, Fortran, C#, etc).

None of those other languages are topical in comp.lang.c. If
lcc-win32 supports operator overloading, it's compiling a language
that is not C, though it may or may be a superset of C.

If someone asks about out-of-bounds array access in comp.lang.c,
pointing them to an lcc-win32-specific language extension is no more
or less appropriate than pointing them to C++ vectors.

If you're going to advertise lcc-win32 here, could you at least add an
"[OT]" marker to the subject?
 
M

Mark McIntyre

Containers are data structures that are used to organize storage.
Typical containers are vectors, tables, lists, hash tables, etc.

I /think/ all your readers know this. Do not presume to teach your granny
to suck eggs...
As you know, those structures are used daily by any serious C programmer.

Erm? I've programmed C for nigh on 15 years now and rarely use
complications of the type you describe. Are we talking the same language?
This is surely not blessed by the C standard, but it is used
in other languages (C++, Fortran, C#, etc).

In those other languages, I surely do use such flummery though...
 
M

Mark McIntyre

Richard said:
pete said:
Nudge wrote:
I answered: "[...] I believe 'function2' has undefined behavior.

It does.

In other words, the compiler is free to do anything it wants [...]"

There are some limitations. See N869, 3.18, [#2]

This must be a completely different section in the actual standard. 3.18
is a definition of the symbol used to denote the least integer greater than
some real. :)
I'm of the opinion, from reading the note,
that if compilation is terminated,
then a diagnostic message must be issued.

The note I believe you're referring to, 3.4.3 (#2) is listing some of the
possible outcomes of UB. Its not intended to be a complete list of the
outcomes, as indeed 3.4.3 (#1) makes clear with the words "for which this
International Standard imposes no requirements."

By the way a diagnostic must be emitted if the compiler encounters certain
classes of errors (syntax error or constraint violation). UB isn't
necessarily one of either of these. See 5.1.1.3 in the ISO standard for
details.
 
F

Flash Gordon

Ahhh OK.
My experience is completely different. Hash tables improve
performance of element access,

Only if you are having to look things up by name. I happen to know, for
example, that all grey levels between 0 and 255 in a histogram of the
number of pixels at each grey level can be indexed simply by the grey
level. No need for hashing.

Similarly with most of the comms I used to deal with, messages where
placed in defined positions by other SW so my software looked at the
position of message type 28 if it wanted to look at message type 28. No
complex lookup involved.
and list and tables... well I
just can't conceive programming without them.

Sophisticated data structures are essential to programming.
It is difficult to conceive serious programming without them,
unless you just are interested in making a VGA game display...
and even there, display lists, matrix coordinates, determinants
and surface modelling require complex data structures.

But you can of course get away with it by reducing the quality of
the software.

Can you give me an exampkle of a serious program without
*any* of those elements?

Some parts of one major piece of SW included:

Working out the ideal gain and offset to apply to imagery from a
detector based on a histogram of grey levels extending a bit beyond the
displayed range.

Calculating the position of the horizon in the imagery based on the
aircraft attitude (the starting data is not a vector in some convenient
coordinate system) which is just running through a series of equations.
Also, it does not involve doing the same thing to any to items in the
data describing the aircraft attitude, so no form of vector operator
would help.

Some simple comms work shifting data around.

A number of other tasks also not involving complex data structures.

I didn't need any complex data structures for any of this. The most
complex things were some simple structures and some simple arrays.

I don't have access to this stuff any more, so I can't go in to any real
detail.
 
P

Peter Shaggy Haywood

Groovy hepcat Malcolm was jivin' on Sat, 18 Sep 2004 17:40:40 +0100 in
comp.lang.c.
Re: Out-of-bounds access and UB's a cool scene! Dig it!
Nudge said:
void function2(int a, int b, int c) {
int exchange;
char buffer[3];
char *bufPt;
buffer[0] = 'A';
buffer[3] = 'B';
buffer[40] = 'B';
bufPt = buffer;
bufPt = bufPt + 1;
}

Note: buffer[3] and buffer[40] are out-of-bounds.

I answered: "[...] I believe 'function2' has undefined behavior.
In other words, the compiler is free to do anything it wants [...]"

Then someone commented: "No. Technically the program is fully
conforming. It is executing the program that gives undefined behaviour."

What does "Technically the program is fully conforming" mean?
Was my wording incorrect?

Perhaps source code cannot have undefined behavior, only executable
code can? I am somewhat confused. Could you please shed some light?
A lot of compilers are pretty stupid, and will simply overwrite the memory
locations that would have corresponded to the right address, had the array
been large enough, with the values given. Needless to say the results of
doing this are unpredictable. If the memory is not used for anything else,
the program may work "correctly", if you overwrite the function return
address you will get a crash, if you corrupt over data you will get wrong
results somewhere else in the program.

The standard just tries to formalise this, by saying that the behaviour is
"undefined".

A clever compiler will detect the out of bounds access, and deliver a
warning. I don't know offhand whether it is allowed to reject the program,
or if it has to compile it to perform the illegal action. It doesn't much

Indeed it is allowed to reject the program. Undefined behaviour
means that anything is allowed.
matter, unless you are actually implementing a compiler.

Source code cannot be run, so cannot show any kind of behaviour, defined or

Not so. Though C is usually a compiled language, there could be C
interpreters. So C source code could be run on a C interpreter.
Besides, C source code has behaviour when it is being compiled (or
interpreted).

--

Dig the even newer still, yet more improved, sig!

http://alphalink.com.au/~phaywood/
"Ain't I'm a dog?" - Ronny Self, Ain't I'm a Dog, written by G. Sherry & W. Walker.
I know it's not "technically correct" English; but since when was rock & roll "technically correct"?
 
J

jacob navia

Mark said:
Containers are data structures that are used to organize storage.
Typical containers are vectors, tables, lists, hash tables, etc. [snip]
As you know, those structures are used daily by any serious C programmer.


Erm? I've programmed C for nigh on 15 years now and rarely use
complications of the type you describe. Are we talking the same language?

No, I am sorry. We use different languages. As your friend "Flash
Gordon" in this same tyhread, that says:

"I've spent complete years without using vectors, tables, lists, hash
tables, etc. I used double buffering and arrays, but nothing involving
any complex data structures."

You use a subset of C tailored for people that do not want to use
a lot their brains.

C is for dummies, C++ is for real programming, as everybody
should know by now. C should be kept in the basement and made
obsolete as fast as possible. Any discussion of real programming
and the real programming needs of people in 2004 should disappear
in favor of endless discussions about whether or not to use
scanf/fgets, etc.
 
M

Michael Mair

Hi there,

Containers are data structures that are used to organize storage.
Typical containers are vectors, tables, lists, hash tables, etc.
[snip]
As you know, those structures are used daily by any serious C
programmer.

Why is it that I see in my mind people clothed with their best suits and
grave expressions on their faces... *g*
Seriously, what a "serious programmer" does or does not use daily is not
necessarily yours to say. I certainly do some serious programming but
apart from lists I seldom use anything of the above just because I do
not need them.

No, I am sorry. We use different languages. As your friend "Flash
Gordon" in this same tyhread, that says:

"I've spent complete years without using vectors, tables, lists, hash
tables, etc. I used double buffering and arrays, but nothing involving
any complex data structures."

You use a subset of C tailored for people that do not want to use
a lot their brains.

C is for dummies, C++ is for real programming, as everybody
should know by now. C should be kept in the basement and made
obsolete as fast as possible. Any discussion of real programming
and the real programming needs of people in 2004 should disappear
in favor of endless discussions about whether or not to use
scanf/fgets, etc.

You know, after the usual trouble with pointers, sign errors and stuff
like forgotten dependencies in old makefiles the next likely source of
trouble in my case is handling file I/O.
That may sound strange but as I am stuck with a huge library and
application written in C which should be able to follow the "official"
CVS tree in some respects and should run on a variety of systems
including some clusters, there is not so much choice as to how to do
things.

If you have a problem with some regulars and are _very_ sure that it
is not only because both of you have too narrow views from the other
ones view, avoid them. Continued statements along the line "C is for
dummies" just result in your being seen as a troll and/or plonked.

Apart from that, I saw somewhere (maybe at some URL posted here)
results of a poll among C++ programmers which essentially said that
the vast majority of them think themselves better (C++) programmers
than the vast majority... So, one thing's to be said for C:
It teaches more humility ;-)


--Michael
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,900
Latest member
Nell636132

Latest Threads

Top