De-referencing NULL

R

rahul

#include <stdio.h>

int
main (void) {
char *p = NULL;
printf ("%c\n", *p);
return 0;
}

This snippet prints 0(compiled with DJGPP on Win XP). Visual C++ 6.0
compiles the program with diagnostics and the program crashes when
ran. gcc on Linux gives SEGMENTATION FAULT.

This looks like a bug with DJGPP. NULL can have different
representations but how come the compiler allows de-referencing NULL
without any compile time or run-time errors?

P.S : I am not discussing DJGPP. I just want to know if this behaviour
is OK with C specs?
 
S

santosh

rahul said:
#include <stdio.h>

int
main (void) {
char *p = NULL;
printf ("%c\n", *p);
return 0;
}

This snippet prints 0(compiled with DJGPP on Win XP). Visual C++ 6.0
compiles the program with diagnostics and the program crashes when
ran. gcc on Linux gives SEGMENTATION FAULT.

This looks like a bug with DJGPP. NULL can have different
representations but how come the compiler allows de-referencing NULL
without any compile time or run-time errors?

P.S : I am not discussing DJGPP. I just want to know if this behaviour
is OK with C specs?

The behaviour is defined to be undefined. Here is the relevant bit from
n1256.pdf:

6.5.3.2

4 The unary * operator denotes indirection. If the operand points to a
function, the result is a function designator; if it points to an
object, the result is an lvalue designating the object. If the operand
has type ??pointer to type??, the result has type ??type??. If an
invalid value has been assigned to the pointer, the behavior of the
unary * operator is undefined.87)

Part of the indicated footnote (which is not normative):

Among the invalid values for dereferencing a pointer by the unary *
operator are a null pointer, [ ... ]

Since no constraint is violated, the implementation is not obliged to
emit a diagnostic, though one would be useful. But it's hard to do this
in all possible cases.
 
K

Keith Thompson

rahul said:
#include <stdio.h>

int
main (void) {
char *p = NULL;
printf ("%c\n", *p);
return 0;
}

This snippet prints 0(compiled with DJGPP on Win XP). Visual C++ 6.0
compiles the program with diagnostics and the program crashes when
ran. gcc on Linux gives SEGMENTATION FAULT.

This looks like a bug with DJGPP. NULL can have different
representations but how come the compiler allows de-referencing NULL
without any compile time or run-time errors?

P.S : I am not discussing DJGPP. I just want to know if this behaviour
is OK with C specs?

The behavior is undefined, which means that any behavior is ok as far
as the standard is concerned. If your implementation produced no
diagnostics but made demons fly out of your nose, it might violate the
laws of physics, but it wouldn't violate the standard.
 
S

santosh

rahul said:
#include <stdio.h>

int
main (void) {
char *p = NULL;
printf ("%c\n", *p);
return 0;
}

This snippet prints 0(compiled with DJGPP on Win XP). Visual C++ 6.0
compiles the program with diagnostics and the program crashes when
ran. gcc on Linux gives SEGMENTATION FAULT.

This looks like a bug with DJGPP. NULL can have different
representations but how come the compiler allows de-referencing NULL
without any compile time or run-time errors?

I have already provided the reason why no compile-time diagnostics are
required. As for runtime diagnostics/exceptions, note that the
segfaults you are getting from Windows XP (compiled with MSVC) and
Linux are a response of the operating system. DJGPP on the other hand
is designed to produce programs that run under DOS, which does not have
memory protection and hence fails to trap the access to memory address
zero.

All the above is information beyond the scope of the C Standard, which
by categorising the behaviour as undefined, allows different
implementations to do whatever makes the most sense for them and avoids
defining this complex issue. If it had defined the behaviour in any
manner, there would have been some systems which would then have had to
emulate this behaviour and hence suffer considerable performance and
implementation problems.

This is merely a special case of C allowing access beyond an object.
 
R

Richard Bos

rahul said:
char *p = NULL;
printf ("%c\n", *p);
This snippet prints 0(compiled with DJGPP on Win XP). Visual C++ 6.0
compiles the program with diagnostics and the program crashes when
ran. gcc on Linux gives SEGMENTATION FAULT.

This looks like a bug with DJGPP.

It isn't. Dereferencing a null pointer has undefined behaviour. Crashing
is legal; printing a message and then crashing is legal; and so is
printing any random value, as if it were a normal dereference.

Richard
 
N

Nick Keighley

#include <stdio.h>

int
main (void) {
  char *p = NULL;
  printf ("%c\n", *p);
  return 0;

}

This snippet prints 0(compiled with DJGPP on Win XP). Visual C++ 6.0
compiles the program with diagnostics and the program crashes when
ran. gcc on Linux gives SEGMENTATION FAULT.

This looks like a bug with DJGPP. NULL can have different
representations but how come the compiler allows de-referencing NULL
without any compile time or run-time errors?

P.S : I am not discussing DJGPP. I just want to know if this behaviour
is OK with C specs?

I've even used a compiler that made it a command line option.
You could choose if dereferencing NULL crashed or not.
Why you would want this option is a good question...
 
J

Jean-Marc Bourguet

Nick Keighley said:
I've even used a compiler that made it a command line option.
You could choose if dereferencing NULL crashed or not.
Why you would want this option is a good question...

The most probable reason is compatibility with a previous implementation
for which it was the (perhaps even documented) behaviour?

Yours,
 
S

santosh

Nick said:
I've even used a compiler that made it a command line option.
You could choose if dereferencing NULL crashed or not.
Why you would want this option is a good question...

To be notified of invalid deferences (as NULL always is) when the
underlying OS/hardware does not catch it?
 
S

santosh

santosh said:
I have already provided the reason why no compile-time diagnostics are
required. As for runtime diagnostics/exceptions, note that the
segfaults you are getting from Windows XP (compiled with MSVC) and
Linux are a response of the operating system. DJGPP on the other hand
is designed to produce programs that run under DOS, which does not
have memory protection and hence fails to trap the access to memory
address zero.

All the above is information beyond the scope of the C Standard, which
by categorising the behaviour as undefined, allows different
implementations to do whatever makes the most sense for them and
avoids defining this complex issue. If it had defined the behaviour in
any manner, there would have been some systems which would then have
had to emulate this behaviour and hence suffer considerable
performance and implementation problems.

This is merely a special case of C allowing access beyond an object.

And the above is subtly wrong and highly misleading.

A null pointer *cannot* by definition point to anywhere and thus a
deference of a null pointer is always invalid. This has nothing to do
with the hardware or with memory addresses, but is dictated by the
semantics of C. It just so happens that on the vast majority of
implementations the value that a null pointer contains is the same as
the value for memory address zero at a bit level. Thus when a null
pointer is deferenced, the implementation interprets the value as a
memory address and attempts to read it. This is caught under some
systems and not under others.

Conceptually a null pointer is distinct from a pointer containing the
address value zero, but under many flat memory model architectures
their representations are identical and hence, in the absence of any
special interpretation (which becomes cumbersome), deferencing a null
pointer performs the same action as deferencing a pointer containing
address zero.

Recently Harald van Dijk gave an example of an implementation (I think
it was the Tendra C compiler) that interprets a null pointer as having
a value other than zero.
 
K

Keith Thompson

santosh said:
Conceptually a null pointer is distinct from a pointer containing the
address value zero, but under many flat memory model architectures
their representations are identical and hence, in the absence of any
special interpretation (which becomes cumbersome), deferencing a null
pointer performs the same action as deferencing a pointer containing
address zero.
[...]

Sort of -- except that there's not really such a thing as "the address
value zero" in C, at least not in the sense that you mean.

You can *convert* an integer value zero to a pointer type. If the
converted value happens to be a constant expression, the result is a
null pointer value, due to the special-case rule that C uses to define
null pointer constants. If it's a non-constant expression whose
current run-time value is zero, the result of the conversion is
implementation-defined, and may or may not correspond to "the address
value zero" on the underlying system, assuming such a thing is even
meaningful.

This means that converting a value of zero to a pointer type might
yield different results depending on whether the zero value is a
constant expression or not, which is a bit bizarre. But most systems
don't make this distinction because, as santosh said in text I've
snipped, most systems choose to use all-bits-zero as the null pointer
representation, avoiding the need for special-case code to handle the
special-case rule for null pointer constants. On most (but not all)
such systems, attempts to dereference a null pointer are caught with
no additional effort, since 00000000 is not a valid address.
 
J

Johannes Bauer

Keith said:
The behavior is undefined, which means that any behavior is ok as far
as the standard is concerned.

I've started wondering whether this is a good idea - on architectures
like the AVR, there are indeed memory-mapped I/O registers in the
address space at address 0x00. For instance on an ATmega32, writing to

*((volatile unsigned char*)0x00) = 0x00;

Would set the TWBR (Two Wire Serial Interface Bit Rate Register). Is
this undefined behaviour, too? This would then mean the language is in
some way incompatible with the architecture, which is a really weird thing.

Regards,
Johannes
 
C

Chris Torek

I've started wondering whether this is a good idea -

It is, in fact, a great idea, precisely because of what you say next.
on architectures like the AVR, there are indeed memory-mapped I/O
registers in the address space at address 0x00. For instance on an
ATmega32, writing to

*((volatile unsigned char*)0x00) = 0x00;

Would set the TWBR (Two Wire Serial Interface Bit Rate Register). Is
this undefined behaviour, too?

Undefined by the C Standard, yes.

By being left undefined in the C Standard, implementors are allowed
to define it themselves. Implementors working on the ATmega32 can
define (*(volatile unsigned char *)0) as "access the TWBR".

Had the C Standard said "this must trap" or "this must access RAM",
implementors writing implementations for the ATmega32 would *not*
have been able to give you access to the TWBR in this obvious and
simple manner.

Undefined behavior is "bad" in that, if you use it, you lose the
guarantees of the C Standard. But it is "good" in that, by leaving
it undefined, you can use non-"Standard C" in ways guaranteed by
something *other* than the C Standard. There is nothing wrong with
doing this: you just need to know that your guarantees are coming
from "Frobozz Inc ATmega32 C" and not "ISO Standard C".
 
J

Johannes Bauer

Chris said:
It is, in fact, a great idea, precisely because of what you say next.

Okay, alright, I understood what you wrote. Thank you for your
comprehensive answer!

Regards,
Johannes
 
C

CBFalconer

rahul said:
#include <stdio.h>

int main (void) {
char *p = NULL;
printf ("%c\n", *p);
return 0;
}

This snippet prints 0(compiled with DJGPP on Win XP). Visual C++
6.0 compiles the program with diagnostics and the program crashes
when ran. gcc on Linux gives SEGMENTATION FAULT.

This looks like a bug with DJGPP. NULL can have different
representations but how come the compiler allows de-referencing
NULL without any compile time or run-time errors?

P.S : I am not discussing DJGPP. I just want to know if this
behaviour is OK with C specs?

No, it is not allowable to dereference the NULL pointer. Behaviour
is undetermined. The fact that DJGPP doesn't detect this fault is
a Quality of Implementation detail.
 
J

James Dow Allen

... how come the compiler allows de-referencing NULL
without any compile time or run-time errors?

All dogs are mammals, but not all mammals are dogs.
Some portable code is written in C, but not all
C code need be portable. Many in this ng overlook
this simple truism, over and over and over again.

As a general rule, portability is a worthy goal, even
when it is completely irrelevant, and I'm not
trying to argue otherwise. However, one of C's original
virtues was its ability to yield *machine-specific*
code with higher-level syntax. I *don't* want to
encourage new C programmers to write non-portable code,
but to deny the possibility is historically inaccurate.

For example, the auto-increment feature of Nova hardware
was interesting:
Autoincrement int *p;
...
jj = *p; /* actually *p++ */
One might deprecate such machine-specific features
(however charming they might seem, their non-portability
renders them almost useless for a practical modern
programmer), but to *forbid* a C programmer from
accessing them as above seems unneccessarily dogmatic.

Similarly, it is a fact that on some DEC systems
*0 == 0
and this meant that an (anal-retentive?) programmer
could save a microsecond with
if (*p) launch(p);
when proper logic dictated
if (p && *p) launch(p);

Too cute for your own good? Agreed. "Don't do
it unless you want to defy the laws of physics and
burn up your CPU!" This comment itself is "too cute."
(I daresay everyone in this ng would have burned up his
CPU by now if that's what *0 really did.)

I certainly never *knowingly* used the DEC "abbreviation"
if (*p)
when
if (p && *p)
was intended, but I *did* write code that was very time
critical and would have been happy to use the trick,
if needed.

James Dow Allen
 
A

Antoninus Twink

As others have already mentioned, it is of course undefined behavior,
which is not necessarily a bad thing in all contexts.

But if Atmel moved that register from address 0x00 to 0x01, or 0x08,
or indeed any other address, the result of reading from or writing to
that address would still be undefined. Any access to any hardware
device is completely undefined under the C standard, and that is
exactly as it should be.

This is complete nonsense, as you are well aware. The whole point of the
OP's query is that 0 in a pointer context is a null pointer constant,
whereas 1 and 8 are not.
 
R

Richard Tobin

Johannes Bauer said:
I've started wondering whether this is a good idea - on architectures
like the AVR, there are indeed memory-mapped I/O registers in the
address space at address 0x00. For instance on an ATmega32, writing to

*((volatile unsigned char*)0x00) = 0x00;

Would set the TWBR (Two Wire Serial Interface Bit Rate Register). Is
this undefined behaviour, too? This would then mean the language is in
some way incompatible with the architecture, which is a really weird thing.

I'm just going to address the last sentence.

You might consider the language incompatible with the architecture if
it said anything about what happened when address 0 was accessed, but
C doesn't do that. There is no necessary connection between C's null
pointer and the architecture's address 0, though they are the same in
most implementations. In particular, there's no reason that

(volatile unsigned char*)0x00

produce the architecture address 0. It could produce, say, 0xffffffff.
The conversion of integers to pointers can involve arbitrary computation,
but null pointers can only be generated from constants, so the compiler
could do or insert code for the necessary conversion.

int x = 0;
(char *)x

is not guaranteed to produce a null pointer.

I doubt that this would be a good idea on balance, though such a compiler
might be useful for certain kinds of debugging.

-- Richard
 
F

Flash Gordon

Richard Tobin wrote, On 06/08/08 13:09:
It isn't. It's a bug in the program. It dereferences NULL.

It may not be a bug in the compiler, but it's certainly a deficiency
that it does not detect the error.[/QUOTE]

Or a feature, providing compatibility with old compilers which did not
detect the error. Specifically with programs that are playing about with
the interrupt vector table in DOS!
 
R

Richard Tobin

It may not be a bug in the compiler, but it's certainly a deficiency
that it does not detect the error.
[/QUOTE]
Or a feature, providing compatibility with old compilers which did not
detect the error. Specifically with programs that are playing about with
the interrupt vector table in DOS!

So it's a backward-compatible deficiency.

-- Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,229
Latest member
GloryAngul

Latest Threads

Top