Trap representations

S

Spiros Bousbouras

If we do a = b and b contains a trap representation can someone
give me a real world example , preferably mentioning specific
hardware , of what can go wrong? Is there any possibility for b
to contain the trap representation when no undefined behavior
has occurred ealier in the programme? The fact that the standard
says

Thus, an automatic variable can be initialized to a
trap representation without causing undefined behavior,
but the value of the variable cannot be used until a proper
value is stored in it.

makes me think that yes but I don't see how.

Par. 5 of 6.2.6.1 says

If the stored value of an object has such a representation
and is read by an lvalue expression that does not have
character type, the behavior is undefined.

Shouldn't that be "unsigned character type" ?
 
P

Peter Nilsson

Spiros Bousbouras said:
If we do a = b and b contains a trap representation
can someone give me a real world example , preferably
mentioning specific hardware , of what can go wrong?

Have you heard of signalling NAN's in floating point?
They're not integers, but basically the same thing
could happen.
Is there any possibility for b to contain the trap
representation when no undefined behavior has
occurred ealier in the programme? The fact that the
standard says

     Thus, an automatic variable can be initialized
to a trap representation without causing
undefined behavior, but the value of the
variable cannot be used until a proper value
is stored in it.

makes me think that yes but I don't see how.

void foo(void)
{
int a; /* no UB so far */
int b = a; /* boom */
}
Par. 5 of 6.2.6.1 says

     If the stored value of an object has such a
representation and is read by an lvalue
expression that does not have character
type, the behavior is undefined.

Shouldn't that be "unsigned character type" ?

IMHO, yes. But the Committee wants to allow signed
character types to pottentially have padding bits.
Why? Probably hysterical reasons. [Note: C++ does
not allow character types to have padding bits.]
 
U

user923005

If we do a = b and b contains a trap representation can someone
give me a real world example , preferably mentioning specific
hardware , of what can go wrong?

Anything can go wrong. The behavior is undefined. Something specific
that could happen is that Scott Nudds could fly out of your left
nostril. This happens on the DeathStation 9000.
I have used compilers which will throw an exception when they
encounter an uninitialzed variable during program execution in debug
mode (I don't know how they accomplish this).
Another compiler might not behave that way.

An obvious example of the above behavior that happens on several
different hardware types is for uninitialized pointers. We have an
Itanium OpenVMS machine that has address calculation tests in the
compiler and it will (for instance) dump core if you access any
invalid address range.
Is there any possibility for b
to contain the trap representation when no undefined behavior
has occurred ealier in the programme? The fact that the standard
says

     Thus, an automatic variable can be initialized to a
     trap representation without causing undefined behavior,
     but the value of the variable cannot be used until a proper
     value is stored in it.

makes me think that yes but I don't see how.

The hardware throws an exception and the exception is unhandled. If a
data object is declared but not initialized it might contain anything,
including a trap respresentation. That is not surprising, because it
quite likely contains a random sequence of bits "in real life."
Par. 5 of 6.2.6.1 says

     If the stored value of an object has such a representation
     and is read by an lvalue expression that does not have
     character type, the behavior is undefined.

Shouldn't that be "unsigned character type" ?

It should. But that is a subset of the above, so strictly speaking I
would call it correct but incomplete.
 
G

gw7rib

Have you heard of signalling NAN's in floating point?
They're not integers, but basically the same thing
could happen.




  void foo(void)
  {
    int a;     /* no UB so far */
    int b = a; /* boom */
  }

I think you're missing the OP's point. If it's OK for a to have
whatever value it has, why is it problematic for b to also have that
value?

Of course, you could deliberately build a system which met the C
standard and nevertheless did undesirable things given the above code.
But would a non-maliciously-designed system give problems?
 
U

user923005

I think you're missing the OP's point. If it's OK for a to have
whatever value it has, why is it problematic for b to also have that
value?

It isn't if it is uninitialized. It is if you try to use it for
anything or even do this:
void foo(void)
{
int a; /* no UB so far */
a;
}
Of course, you could deliberately build a system which met the C
standard and nevertheless did undesirable things given the above code.
But would a non-maliciously-designed system give problems?

Yes. In fact, I would say that such a system is VASTLY SUPERIOR to
one that does not behave in this way for obvious reasons.
I wish that all machines had this capability and that every
unititialized variable DID hold a trap representation on purpose.
The result would be systems that are much easier to verify for
correctness.
 
L

lawrence.jones

Spiros Bousbouras said:
If we do a = b and b contains a trap representation can someone
give me a real world example , preferably mentioning specific
hardware , of what can go wrong?

The most likely symptom is some kind of machine exception (fault, trap,
interrupt, whatever you want to call it). The most common example is
floating point data formats where some bit patterns don't represent
valid floating point numbers. Loading such a bit pattern into a
floating point register (which is one way to implement an assignment)
can cause an exception.
Is there any possibility for b
to contain the trap representation when no undefined behavior
has occurred ealier in the programme?

Yes. One way is if b is uninitialized (in which case it could contain
*anything*). Another way would be to have used memcpy() to copy an
invalid bit pattern into b.
 
J

jameskuyper

I think you're missing the OP's point. If it's OK for a to have
whatever value it has, why is it problematic for b to also have that
value?

You're assuming that the bit pattern in a actually represents a value.
The C standard permits the possibility that it does not. For instance,
the bit pattern might be the one which would otherwise represent a
negative zero; an implementation is permitted to treat that as a trap
representation. This might actually be necessary, if the hardware
treats negative zeros badly.
 
J

jameskuyper

Spiros said:
If we do a = b and b contains a trap representation can someone
give me a real world example , preferably mentioning specific
hardware , of what can go wrong?

I'm not qualified to answer that, at least not in terms of specific
hardware and machine code instructions. I hope that someone else will
be able to do so.
.... Is there any possibility for b
to contain the trap representation when no undefined behavior
has occurred ealier in the programme?

Yes, plenty of ways. Each comment in the code below identifies the
name of a variable, and cites the section of the standard that
specifies that the value of that variable is indeterminate at that
point in the program. An indeterminate value may, among other things,
be a trap representations. Assuming that no unchecked error conditions
occur, this code should have well defined behavior (unless I've made a
mistake somewhere, which is certainly quite possible), despite the
fact that all these trap representations might have been created.

#include <setjmp.h>
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#define INTS 4096

int *trap_representation( int count, ...)
{
int b;
//b: 6.7.8p10
b = 1;
*(char*)&b = 2;
// b: 6.2.6.1p5
b = 3;

jmp_buf env;
if(setjmp(env))
{
// b: 7.13.2.1p3
va_list ap;
va_start(ap, count);
vprintf("%d\n", ap);
// ap: 7.15p3
va_end(ap);

return &b;
}

FILE * file = fopen("dummy.dat", "r");
if(file)
{

fclose(file);
// file: 7.19.3p4
}

int *buf = malloc(INTS*sizeof *buf);
if(buf)
{
// buf[0]: 7.20.3.3p2
int *temp = realloc(buf, 2*INTS*sizeof *temp);

if(temp)
{
// temp[INTS]: 7.20.3.4p2
buf = temp;
}
free(buf);
// buf: 6.2.4p2
}

longjmp(env, 4);
}

int main(void)
{
trap_representation(5, 6);
// the value returned by trap_representation: (6.2.4p2)
return 0;
}

Par. 5 of 6.2.6.1 says

If the stored value of an object has such a representation
and is read by an lvalue expression that does not have
character type, the behavior is undefined.

Shouldn't that be "unsigned character type" ?

I believe that it used to be so restricted, but was deliberately
changed to provide broader guarantees.
 
B

Ben Bacarisse

I think you're missing the OP's point. If it's OK for a to have
whatever value it has, why is it problematic for b to also have that
value?

I think that is the OP's point but we must wait and see.

As I understand it, objects in C don't have values per se. They only
have a value when the bits are read and interpreted as being the
representation of a value of some type or other.

Looked at this way, it makes not sense to wonder why "it is OK for a
to have whatever value it is". The object a has lots of potential
values and some of the might be problematic. Accesses that treat the
object as an array of unsigned char are always OK, but access it as an
int and you might get a "boom". The classic example of this is a
union.

By writing:

int a;
int b;
memcpy(&b, &a, sizeof a);

you can copy what is in "a" into "b" without ever triggering a trap.
Thus "b" can be given the same bits as "a" even when the value
(interpreted as in int) it a trap representation.
 
S

Spiros Bousbouras

I think you're missing the OP's point. If it's OK for a to have
whatever value it has, why is it problematic for b to also have that
value?

Consider a and b as floating point variables. The suppose CPU has
some special hardware to handle floating point: either several
floating point wegisters[1] or an ack-stay[1] of wegisters[1] that
are supposed to cause a trap if a signalling NaN is loaded into
them. IEEE goes to considerable effort to detail what signalling
NaNs are supposed to do.

If a and b are of different size, it is likely that the CPU will
load the value of a into a floating point wegister[1] and then store
it into b, if that's the fastest way of converting the size of
floating point types. If this CPU loads the unininitialized value
of a into a floating point wegister[1], KABOOM!

Unfortunately I don't know what a wegister or ack-stay is and
Wikipedia doesn't know either. Furthermore in your scenario is
it not possible that by accident variable a will have a
legitimate value ? But let's say that it contains a trap
representation ; is it the idea that your imaginary processor
automatically checks every floating value loaded onto a
wregister for having a trap represenation and does something
special if it does ?
SNIP
A similar issue exists with pointers. Load a ([3456]86 architecture)
segment register with the segment portion of an invalid pointer,
and it traps. Leave the pointer in memory, and it's harmless.

Great , a real world example. So what characterises an invalid
pointer and what does the processor do ?

(I know it's OOT but indulge me.)
 
S

Spiros Bousbouras

You're assuming that the bit pattern in a actually represents a value.

I think that when he says value he means bit pattern i.e. the
actual bits which exist on the physical memory regardless of
whether they count as a value according to the C standard.

So yes , that was part of my point. My question was inspired by
the fact that the standard devotes some space discussing trap
represenations and padding bits when it talks about the
representation of integer types. I assume this happens because
the members of the committee know of specific hardware where
integers in particular can have these trap representations yet
it's possible (presumably without significant performance hit)
for an implementation to arrange for unsigned char to access
those values without problem. So for example a 4 bytes int may
have a trap representation which might cause the hardware to
react badly but if you access the same 4 bytes as unsigned
char[4] then the hardware is happy. I'm hoping for a real world
example where things work this way.

The other thing is that if the hardware does not mind if a bit
pattern corresponding to a trap representation exists in memory
why does it mind when an assignment happens? An imaginary
example was given with floating point values and a real world
example with pointers but does anyone have an example with
integers ?
 
S

Spiros Bousbouras

I believe that it used to be so restricted, but was deliberately
changed to provide broader guarantees.

Does it actually provide any practical broader guarantees?
Par. 4 speaks about unsigned char and as far as I can tell
signed chars are allowed to have padding bits and trap
representations.
 
K

Keith Thompson

Spiros Bousbouras said:
The other thing is that if the hardware does not mind if a bit
pattern corresponding to a trap representation exists in memory
why does it mind when an assignment happens? An imaginary
example was given with floating point values and a real world
example with pointers but does anyone have an example with
integers ?

In principle, the same thing can happen as with floating-point and
pointer representations.

Suppose an implementation uses a 1's-complement representation and
makes the representation that would otherwise be -0 a trap
representation. An uninitialized object of type int might happen to
contain that bit pattern, and this in itself is harmless. But if you
read the value of the object, the representation will be loaded into a
register, and that (hypothetically) might cause a trap.

But note that a "trap representation" doesn't necessarily cause a
trap. It's really just a representation that the implementation
doesn't promise represents a value, and the standard therefore
refrains from defining the behavior for anything attempting to access
an object containing that representation. It could be simply that
some arithmitec operation is performed incorrectly.

Another example: Assume the hardware supports 16-bit 2's-complement
integers, with everything working as you'd expect. A C implementation
chooses to define INT_MAX as +32767 and INT_MIN as -32767, and the
documentation says that the representation that would otherwise have
represented the value -32768 is instead a trap representation. Now
it's a trap representation simply because the implementation says it
is, even though everything works properly. For example, this:
printf("%d\n", INT_MIN - 1);
might happen to print "-32768", but neither the implementation nor the
standard promises that it will do so.
 
N

Nate Eldredge

Spiros Bousbouras said:
void foo(void)
{
int a; /* no UB so far */
int b = a; /* boom */
}
I think you're missing the OP's point. If it's OK for a to have
whatever value it has, why is it problematic for b to also have that
value?

Consider a and b as floating point variables. The suppose CPU has
some special hardware to handle floating point: either several
floating point wegisters[1] or an ack-stay[1] of wegisters[1] that
are supposed to cause a trap if a signalling NaN is loaded into
them. IEEE goes to considerable effort to detail what signalling
NaNs are supposed to do.

If a and b are of different size, it is likely that the CPU will
load the value of a into a floating point wegister[1] and then store
it into b, if that's the fastest way of converting the size of
floating point types. If this CPU loads the unininitialized value
of a into a floating point wegister[1], KABOOM!

Unfortunately I don't know what a wegister or ack-stay is and
Wikipedia doesn't know either. Furthermore in your scenario is
it not possible that by accident variable a will have a
legitimate value ?
Certainly.

But let's say that it contains a trap
representation ; is it the idea that your imaginary processor
automatically checks every floating value loaded onto a
wregister for having a trap represenation and does something
special if it does ?

The imaginary processor I have in mind (let's call it the "Pintel 387")
does precisely that. If a signaling NaN is loaded into a floating-point
register, er, wegister, a CPU exception occurs, which if it is not
masked or handled will usually cause the OS to kill the program.

Now in the case of assignment, a compiler could do it by copying the
bytes directly (via a general integer register) and not needing to load
the value into a floating point register. In fact it probably would,
since this is generally faster. But a dumb compiler might not do so,
and a trap would happen.

If you assign your variable containing a signaling NaN to a variable of
a different type, so that a conversion is needed, this is usually done
through the FPU and would probably cause a register load.
SNIP
A similar issue exists with pointers. Load a ([3456]86 architecture)
segment register with the segment portion of an invalid pointer,
and it traps. Leave the pointer in memory, and it's harmless.

Great , a real world example. So what characterises an invalid
pointer and what does the processor do ?

A "far" pointer on the 386 consists of a 16-bit selector and a 32-bit
offset. In order to derefence such a pointer, you must load the
selector into a special segment register, and then reference that
register in instructions that will use the pointer.

The selector is an index into a table of segments. A segment refers to
some region of memory, and the table contains information (a
"descriptor") about the segment, such as its base address, size, and
permissions. When the selector is loaded, the CPU checks that the
segment referenced is valid and accessible, and if so it caches the
descriptor information for future use. If you load a selector which
isn't a valid index into the table, or refers to an entry which is
marked as invalid, or which your program doesn't have permission to
access, the CPU raises a "general protection exception", which is
handled by the OS and ordinarily would cause it to kill your program.

As before, the compiler would not have any reason to load the selector
part of a far pointer into a segment register unless it is going to
dereference the pointer, and would probably avoid doing so because this
is an expensive operation due to all the checks that take place. So
this is maybe not the best example in the world. I don't offhand know
of a better one, though.

I'll note that this mechanism is mostly obsolete these days; it still
exists, but is generally not used by modern operating systems because it
is inconvenient. They set up a single segment which encompasses all of
memory, and from then on treat the address space as flat.
 
P

Peter Nilsson

jameskuyper said:
I believe that it used to be so restricted, but was
deliberately changed to provide broader guarantees.

Since it doesn't, I wonder what good it is? Even simple
constants like '\x80', '\xFF' and 'é' can apparently
still invoke undefined behaviour.
 
G

Guest

You're assuming that the bit pattern in a actually represents a value.

I think that when he says value he means bit pattern i.e. the
actual bits which exist on the physical memory regardless of
whether they count as a value according to the C standard.

So yes , that was part of my point. My question was inspired by
the fact that the standard devotes some space discussing trap
represenations and padding bits when it talks about the
representation of integer types. I assume this happens because
the members of the committee know of specific hardware where
integers in particular can have these trap representations yet
it's possible (presumably without significant performance hit)
for an implementation to arrange for unsigned char to access
those values without problem. So for example a 4 bytes int may
have a trap representation which might cause the hardware to
react badly but if you access the same 4 bytes as unsigned
char[4] then the hardware is happy. I'm hoping for a real world
example where things work this way.

I've an implementation (though not of C) that had trap values.
It was a 1's complement machine that had a representation for -0
(negative zero). Reading -0 generated a trap.
I don't know for sure but I'd assumed it did this using hardware.
Though, as you note to get C to work correctly you'd have to
have a way of turning off the trap mechanism.

The other thing is that if the hardware does not mind if a bit
pattern corresponding to a trap representation exists in memory
why does it mind when an assignment happens?  An imaginary
example was given with floating point values and a real world
example with pointers but does anyone have an example with
integers ?

load R1 with AddrOf(a)
load R2 with ContentsOf(R1) <-- trap generated here
load R1 with AddrOf(b)
store R2 at ContentsOf(R1)
 
J

James Kuyper

Spiros said:
....
The other thing is that if the hardware does not mind if a bit
pattern corresponding to a trap representation exists in memory
why does it mind when an assignment happens?

Typically because the trap representation causes no problems as long as
it's stored in RAM, but causes a great deal of problems when stored in a
register. Copying it from one location to another requires loading it
from RAM into a register, and then writing it to RAM from a register.
... An imaginary
example was given with floating point values and a real world
example with pointers but does anyone have an example with
integers ?

The standard goes out it's way to explicitly give implementations that
use 1's complement or sign-magnitude representations for signed integers
permission to treat the bit pattern that would otherwise represent
negative zero as a trap representation. I doubt that they would have
gone to that trouble if there hadn't been at least one significant
platform with an implementation of C where negative zeros were
problematic in some fashion; but I have not idea which platform(s), or
the details of why it was problematic.
 
L

lawrence.jones

I think you're missing the OP's point. If it's OK for a to have
whatever value it has, why is it problematic for b to also have that
value?

It's not -- what's problematic is accessing a's (invalid) value with an
ordinary operator (like =) that requires a valid value.
 
N

Nate Eldredge

I've an implementation (though not of C) that had trap values.
It was a 1's complement machine that had a representation for -0
(negative zero). Reading -0 generated a trap.
I don't know for sure but I'd assumed it did this using hardware.
Though, as you note to get C to work correctly you'd have to
have a way of turning off the trap mechanism.

Interesting. Can you say what this machine is? And is this at the
level of assembly, or in some other higher-level language?
 
L

Larry Gates

Anything can go wrong. The behavior is undefined. Something specific
that could happen is that Scott Nudds could fly out of your left
nostril. This happens on the DeathStation 9000.
I have used compilers which will throw an exception when they
encounter an uninitialzed variable during program execution in debug
mode (I don't know how they accomplish this).
Another compiler might not behave that way.

It's been a long time since I've heard of Scott Nudds. I wonder how he's
doing with all the traffic he must have had in his sinuses.
--
larry gates

: And it goes against the grain of building small tools.
Innocent, Your Honor. Perl users build small tools all day long.
-- Larry Wall in <[email protected]>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top