Out-of-bounds nonsense

  • Thread starter Frederick Gotham
  • Start date
O

Old Wolf

Frederick said:
Andrey Tarasevich:


Yes. I like control. I _love_ control. That's why I opt for _proper_
programming languages like C and C++, and not mickey-mouse
languages like Java.

What is mickey-mouse about Java ?

I, for one, would find such a pointer very useful for debugging.

Currently my compiler includes a tool that will warn when I
step outside the bounds of an allocated block of memory.
But that won't help with code like:

struct S {
int x[4];
int y[4];
};
struct S s;

if I accidentally access s.x[5] . Which, I should add, I would
consider a bug (some of you would consider it a feature,
apparently).
So how do we get our hands on a "Range-liberal pointer", a pointer without
armbands? Must we have an intermediate cast to something like a void* or a
char* in order to liberate the pointer from its range restriction?

No, casts don't affect the pointer range. What do you mean by
"range-liberal pointer" ? The C standard is quite clear that you
cannot portably point outside the bounds of an object. The only
thing we are debating here is whether it is OK if the pointer
leaves the bounds of the object it was pointing to, but it is
still within the bounds of an object of which the original object
were a sub-object.

It is not a part of C that you can use pointer arithmetic on
a pointer to move it around any part of some flat address
space you might imagine.
int *const p = (int*)(char unsigned*)&arr;

The second cast is unnecessary; the expression "&arr" implies
a range of anywhere inside the object designated by "arr",
(and one-after-the-end of course).
 
R

Richard Heathfield

Flash Gordon said:
Joe Wright wrote:

If I had..
int a[2][2];
.. and wanted to treat it cavalierly, I might use..
int *p = (int*)a;
..and then treat p[0]..p[3]. That's legal isn't it?

I believe that would be perfectly legal because a (as opposed to a[0])
decays to a pointer to the start of entire array of arrays

Well, a actually decays to &a[0], which is a pointer to the first element in
a, i.e. it is a pointer to an array of two int, and it has type int (*)[2].

Can I just ask that people not use a as an identifier in Usenet discussions?
One advantage of foo and bar is that they are trivial for the eye to
distinguish from indefinite articles without having to engage the conscious
brain.
 
P

pemo

Frederick said:
[ This post deals with both C and C++, but does not alienate either
language because the language feature being discussed is common to
both languages. ]

Over on comp.lang.c, we've been discussing the accessing of array
elements via subscript indices which may appear to be out of range.
In particular, accesses similar to the following:

<snip>

I've been following /some/ of this discussion, and I must admit that I was
surprised by the stds J2 quote [An array subscript is out of range, even if
an object is apparently accessible].

So, I wonder if I could ask whether I've got this right - the first loop
below has UB, the second if fine, and the third is also ok according to
6.5.6 (8)?

#include <stdio.h>

int main(void)
{
int q[4][3][2] =
{
{
{1,},
},
{
{2, 3},
},
{
{4, 5},
{6},
},
};

int n;
int i;
int j;

int * p;

/* UB */
for(n = 0; n < 24; ++n)
{
printf("%d ", q[0][0][n]);
}

puts("");

/* OK */
for(n = 0; n < 4; ++n)
{
for(i = 0; i < 3; ++i)
{
for(j = 0; j < 2; ++j)
{
printf("%d ", q[n][j]);
}
}

}

puts("");

/* ?? from 6.5.6 - 8 */
for(n = 0, p = (int *)&q[0]; n < 24; ++n)
{
printf("%d ", *(p + n));
}

return 0;
}
 
F

Frederick Gotham

Flash Gordon:
Wrong. As has been pointed out already look at the discussions on the
struct hack, also look at the defect report about it and the
justification for C99 including an officially sanctioned method for
solving the problem the struct hack is used to deal with.


If memory is mine to play with, I'll play with it however I like.
 
R

Richard Heathfield

Frederick Gotham said:
Flash Gordon:



If memory is mine to play with, I'll play with it however I like.

And C implementations often give you the power to do that, provided you are
prepared to pay the cost - i.e. that your program is not guaranteed to work
on /other/ C implementations.

Since memory is mine to play with, I can play with it like this:

void print(const char *s, int x, int y, unsigned char fg, unsigned char bg)
{
unsigned char attr = (bg << 4) | fg;
unsigned char *p = (unsigned char *)0xb8000000UL + 160 * y + 2 * x;
while(*s)
{
*p++ = *s++;
*p++ = attr;
}
}

and, provided my implementation plays ball, and provided I have a monitor
that supports, and is currently in, 80-column text mode, I have a
moderately fast way to write to the screen. But if it doesn't or I don't,
all I have is a moderately fast way to crash the program, or possibly the
entire machine.

Sure as eggs is eggs, the above code works just fine on my MS-DOS machine.
So can I complain when the mainframe barfs on it? Well, yeah, I can, but
the complaint is groundless, because I stepped outside the bounds of the
Standard. So if it works anyway, fabulous, but if it doesn't, that's my
problem, not ISO's or the implementor's.
 
F

Flash Gordon

Frederick said:
Flash Gordon:


If memory is mine to play with, I'll play with it however I like.

Well, don't go to India and try to get a job with the company my
employer has outsourced its development to.
 
F

Frederick Gotham

Richard Heathfield:
And C implementations often give you the power to do that, provided you
are prepared to pay the cost - i.e. that your program is not guaranteed
to work on /other/ C implementations.


In saying that my memory is mine to play with, I'm saying:

If I allocate some memory for my own use, be it via static duration
objects, automatic objects, or via malloc, then I can do whatever I like
with that memory. That's the way C is supposed to be, right?

struct { int a; int b; int c; } obj;

int *p = (int*)&obj;

*p++ = 1;
*p++ = 2;
*p++ = 3;

if (sizeof obj >= 4*sizeof*p) *p++ = 4;
if (sizeof obj >= 5*sizeof*p) *p++ = 5;
if (sizeof obj >= 6*sizeof*p) *p++ = 6;
if (sizeof obj >= 7*sizeof*p) *p++ = 7;


Do you think the behaviour of the above code is undefined? Sure, it might
not write to a, b, and c respectively. And sure, it might write to padding
bytes... but it is still perfectly OK.

The following code should always be OK:

SomeType1 obj1;
SomeType2 obj2;

memcpy(&obj1,&obj2,sizeof obj1);

Sure, you'll probably end up with gibberish, but the code is perfectly OK.
 
F

Flash Gordon

Richard said:
Flash Gordon said:
Joe Wright wrote:

If I had..
int a[2][2];
.. and wanted to treat it cavalierly, I might use..
int *p = (int*)a;
..and then treat p[0]..p[3]. That's legal isn't it?
I believe that would be perfectly legal because a (as opposed to a[0])
decays to a pointer to the start of entire array of arrays

Well, a actually decays to &a[0], which is a pointer to the first element in
a, i.e. it is a pointer to an array of two int, and it has type int (*)[2].

You are quite correct, I was sloppy. It decays as you specified, but you
are allowed to use it to access all of the memory allocated by the 2D
array declaration. So, taking in to account what you say below, I
believe that:
int foo[2][2];
int *ptr = (int*)foo;
ptr[3];
is valid since:

the pointer that foo decays to is guaranteed to be correctly alligend

the pointer that foo decays to points to a 2 element array where each
element is of type int[2] so I'm not exceeding the bounds allowed.
Can I just ask that people not use a as an identifier in Usenet discussions?
One advantage of foo and bar is that they are trivial for the eye to
distinguish from indefinite articles without having to engage the conscious
brain.

Agreed. I just followed on what others had done without thinking, which
was bad of me.
 
R

Richard Heathfield

Frederick Gotham said:
Richard Heathfield:



In saying that my memory is mine to play with, I'm saying:

If I allocate some memory for my own use, be it via static duration
objects, automatic objects, or via malloc, then I can do whatever I like
with that memory.

....within the requirements of the Standard. (If you choose to violate those
requirements for your own reasons, that's fine, but at that point the C
Standard no longer defines the behaviour of your program, and we lose the
common ground essential to discussion.)
That's the way C is supposed to be, right?

The way you think C is supposed to be is not necessarily in line with the
way ISO think it is supposed to be.

<weird code snipped>
 
R

Roberto Waltman

Frederick said:
...
The following code should always be OK:

SomeType1 obj1;
SomeType2 obj2;

memcpy(&obj1,&obj2,sizeof obj1);

Sure, you'll probably end up with gibberish, but the code is perfectly OK.

Not so:

/* in a system where sizeof long is 4 */
long obj1;
char obj2;
/* UB: attempts to copy 3 chars past obj2 */
memcpy(&obj1,&obj2,sizeof obj1);


Roberto Waltman
 
C

Charlton Wilbur

Frederick Gotham said:
In saying that my memory is mine to play with, I'm saying:

If I allocate some memory for my own use, be it via static duration
objects, automatic objects, or via malloc, then I can do whatever I like
with that memory. That's the way C is supposed to be, right?

No. That's the way C often is, but it's not guaranteed by the
standard and is likely to fail when you leave your cozy "all the world's
a Vax^WIntel x86 processor" enclave.

Do the wrong thing with your memory, and the computer crashes. How do
you know what the wrong thing is? The standard tells you.

Charlton
 
F

Frederick Gotham

Roberto Waltman:
Not so:

/* in a system where sizeof long is 4 */
long obj1;
char obj2;
/* UB: attempts to copy 3 chars past obj2 */
memcpy(&obj1,&obj2,sizeof obj1);


Of course, you're right.

Type1 obj1;
Type2 obj2;

if (sizeof obj2 >= sizeof obj1)
memcpy(&obj1,&obj2,sizeof obj1);
 
R

Richard Heathfield

Charlton Wilbur said:
Do the wrong thing with your memory, and the computer crashes.

Might crash. Or might do something more - um - creative. Or might do what
you expected. Or might do what you expected *and* something creative with
which to surprise you later on. Isn't programming exciting!?
How do
you know what the wrong thing is? The standard tells you.

Quite so.
 
C

Charlton Wilbur

Richard Heathfield said:
Charlton Wilbur said:


Might crash. Or might do something more - um - creative. Or might do what
you expected. Or might do what you expected *and* something creative with
which to surprise you later on. Isn't programming exciting!?

True. I learned C on MIPS machines with memory protection, and so
when I did the wrong thing with my memory, the program obligingly
crashed. As annoying as it was at the time, this was a great
pedagogical help.

Then I started working on Linux at home and MIPS or Alpha at school,
and learned what "portable" really meant -- and that was just among
different varieties of Unix.

Charlton
 
C

Clever Monkey

Frederick said:
Thanks Andrey, I've finally gotten the response I was looking for.

Andrey Tarasevich:


Yes. I like control. I _love_ control. That's why I opt for _proper_
programming languages like C and C++, and not mickey-mouse languages like
Java.
Oh please. Prefer another language over another all you like (heck, I
do), but blanket statements like this suggest you don't actually have
any significant experience with Java.

I hesitate to call any language a "Mickey-Mouse" language, including
venerable BASIC or Logo. Maybe Brainfuck or TRACEY can be considered
so, but the inventors of those languages are dead serious about what the
language is _for_, so even then I'm not so sure.

Having done more than my share of "porting" "portable" programs written
in C and C++, I'll take Java any day of the week for the sorts of
enterprise client-server development I do for a living.

C is a wonderful language exquisitely suitable for a great number of
purposes, but there is no way I want to maintain an internationalized,
multi-platform client-server app that requires (among other things)
robust UTF string handling, true exception handling and and easy way to
deliver fixes in the field. Been there, done that. Got the t-shirt.

Yes, all these things can be added to and approximated with C, but why
reinvent the wheel? Java is not just a language. It is a programming
environment for medium-to-large scale, multi-tiered computing. We get
paid to implement features and scale up to larger and larger iron.

Java allows that out of the box, without having to do stuff that
*doesn't* pay, like inventing specialized libraries or creating a
middle-tier. For this kind of computing no one cares whether or not a
multi-dimensional array decays to a pointer to contiguous memory or not.
It's just not important.

It's all about the right tool for the job. I like C. I'm the C guy
around here. But there is no way my company would be doing half the
business we are doing without Java (or some other similar toolset).

There are poorly written programs in any language, and there are poor
uses of a language. Sometimes, if you are extremely unlucky and the
gods hate you, there is a combination of the two.
 
M

Mark McIntyre

Mark McIntyre:
, the type of the object "arr" is: int[2][2]

No, thats its C declaration. Its type is array [2] of array[2] of
ints.


Its type is int[2][2].

*sigh*.

No, thats its declaration.
Its type is array two of array two of int.

If you don't realise the difference, you need to go back to basics.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
M

Mark McIntyre

x and y aren't *necessarily* contiguous; there could be a gap between
them. In the array case being discussed, the representation is
specified by the standard, and there can be no gap.

Yeah, I was going to chuck in "assume word-aligned memory and no
struct packing" but couldn't be donkeyed.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
H

Herbert Rosenau

Roberto Waltman:



Of course, you're right.

Type1 obj1;
Type2 obj2;

if (sizeof obj2 >= sizeof obj1)
memcpy(&obj1,&obj2,sizeof obj1);
May result in udefined behavior when Type1 != Type2 as the
representation of different types does not require that they are have
to have the same padding bits adn/or alignment requirements. memcpy
can fail in the lands of udefined behavior here. Accessing obje1
thereafter can end in anything but may not do what you thinks it
should do.

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 1.2 Deutsch ist da!
 
F

Flash Gordon

Herbert said:
May result in udefined behavior when Type1 != Type2 as the
representation of different types does not require that they are have
to have the same padding bits adn/or alignment requirements. memcpy
can fail in the lands of udefined behavior here.

No, memcpy is required to treat the data as unsigned char so there are
no alignment issues, no padding and no trap representations. At least,
not during the memcpy.
> Accessing obje1
thereafter can end in anything but may not do what you thinks it
should do.

That is indeed where you get problems of trap representations.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top