Linux Kernel Source--Unquestionably Undefined Behavior

B

Billy Bong

If you hang around this newsgroup enough, you'll soon find out that one
of the most insidious crimes you can commit in the C language is
undefined behavior.

What is undefined behavior?

According to the C Standard:

"behavior, upon use of a nonportable or erroneous program construct or of
erroneous data, for which this International Standard imposes no
requirements"

One of the many popular consequences of undefined behavior purported in
this newsgroup is that *anything* can happen. For example, the following
post, from an arguably c.l.c regular, apparently claims that undefined
behavior could start WWIII:

http://groups.google.com/group/comp.lang.c/browse_thread/
thread/3686f35c51a00ce5/a452e5db10ca8f52?lnk=st&q=#a452e5db10ca8f52

Claiming that undefined behavior can start WWIII is not something to be
taken lightly. Especially in light of the fact that the Linux Kernel
Source commits undefined behavior.

Here is what the C Standard has to say about one aspect of undefined
behavior (section numbers are from the C99 Standard, and may differ from
the C90 Standard, but the wording in the two standards is effectively the
same):

In Section 7.1.3 (Reserved identifiers) of the C Standard, it states the
following:

"All identifiers that begin with an underscore and either an uppercase
letter or another underscore are always reserved for any use."

In Appendix J.2 (Undefined behavior) of the C Standard, it states the
following:

"The program declares or defines a reserved identifier, other than as
allowed by 7.1.4 (7.1.3)."

(Note that section 7.1.4 (Library functions), as specified above, is
immaterial to the discussion at hand.)

And, here are some facts ...

1. The Linux Kernel Source is NOT part of the implementation, and is
therefore NOT allowed to define any reserved identifiers as specified by
the C Standard.

2. The Linux Kernel Source defines identifiers that begin with a leading
underscore followed by an uppercase letter, in almost all of its header
files. For example, the file fs.h starts off with this:

#ifndef _LINUX_FS_H
#define _LINUX_FS_H

3. The Linux Kernel Source, as a result, commits undefined behavior.

If you follow all of this, like you should, and you believe what many in
this newsgroup purport to be the consequences of undefined behavior, then
you owe it to yourself to seriously consider the ramifications of
undefined behavior, any time you run Linux.

Personally, I *do* buy into this whole undefined behavior thing, though
not necessarily to the extent that some of the extremists here in this
group seem to purport. And my buy-in is based on experience, as I'll now
describe.

Sometimes, when I boot into Linux, one of my monitors does not display
anything. When this happens, I immediately reboot into Windows XP, to see
if it's a software or a hardware problem. And--100% of the time--the
monitor comes up as expected in Windows XP. I then reboot into Linux,
and, most of the time, the monitor comes up as expected. I attribute this
behavior to undefined behavior in Linux, and sometimes wonder if it's the
result of the undefined behavior that results from using the above
mentioned reserved identifiers (or perhaps due to the "kludges" and
"fixmes" so ubiquitously found in The Linux kernel Source).

There are other things that I experience with Linux that I don't
experience with Windows, and I attribute these behaviors to the undefined
behavior in Linux. For example, I have an external USB drive that has one
Linux partition and one NTFS partition. Sometimes (a lot of times
recently) when I have this drive plugged in, the mouse cursor freezes for
about 5 seconds, and then unfreezes for about 760 milliseconds. This
behavior periodically repeats, over and over (in other words, cursor
freezes, then unfreezes, then freezes, then unfreezes, etc.). I do not
experience this problem when I boot into Windows XP. I attribute this
behavior to undefined behavior, as defined by the C Standard, and as you
have all taught me so well.

Thanks all for teaching me so well about undefined behavior.
 
T

Thomas Troeger

Billy said:
If you hang around this newsgroup enough, you'll soon find out that one
of the most insidious crimes you can commit in the C language is
undefined behavior.
[...]

#ifndef _LINUX_FS_H
#define _LINUX_FS_H

I don't understand your point. How is a preprocessor construct related
to limitations in the naming of C identifiers? After all, what the C
compiler gets to see is the output of the preprocessor (as with using
the `-E' option), and this will certainly not contain the mentioned
lines, underscore uppercase letter or not.
 
P

polas

If you hang around this newsgroup enough, you'll soon find out that one
of the most insidious crimes you can commit in the C language is
undefined behavior.

What is undefined behavior?

According to the C Standard:

"behavior, upon use of a nonportable or erroneous program construct or of
erroneous data, for which this International Standard imposes no
requirements"

One of the many popular consequences of undefined behavior purported in
this newsgroup is that *anything* can happen. For example, the following
post, from an arguably c.l.c regular, apparently claims that undefined
behavior could start WWIII:

http://groups.google.com/group/comp.lang.c/browse_thread/
thread/3686f35c51a00ce5/a452e5db10ca8f52?lnk=st&q=#a452e5db10ca8f52

Claiming that undefined behavior can start WWIII is not something to be
taken lightly. Especially in light of the fact that the Linux Kernel
Source commits undefined behavior.

Here is what the C Standard has to say about one aspect of undefined
behavior (section numbers are from the C99 Standard, and may differ from
the C90 Standard, but the wording in the two standards is effectively the
same):

In Section 7.1.3 (Reserved identifiers) of the C Standard, it states the
following:

"All identifiers that begin with an underscore and either an uppercase
letter or another underscore are always reserved for any use."

In Appendix J.2 (Undefined behavior) of the C Standard, it states the
following:

"The program declares or defines a reserved identifier, other than as
allowed by 7.1.4 (7.1.3)."

(Note that section 7.1.4 (Library functions), as specified above, is
immaterial to the discussion at hand.)

And, here are some facts ...

1. The Linux Kernel Source is NOT part of the implementation, and is
therefore NOT allowed to define any reserved identifiers as specified by
the C Standard.

2. The Linux Kernel Source defines identifiers that begin with a leading
underscore followed by an uppercase letter, in almost all of its header
files. For example, the file fs.h starts off with this:

#ifndef _LINUX_FS_H
#define _LINUX_FS_H

3. The Linux Kernel Source, as a result, commits undefined behavior.

If you follow all of this, like you should, and you believe what many in
this newsgroup purport to be the consequences of undefined behavior, then
you owe it to yourself to seriously consider the ramifications of
undefined behavior, any time you run Linux.

Personally, I *do* buy into this whole undefined behavior thing, though
not necessarily to the extent that some of the extremists here in this
group seem to purport. And my buy-in is based on experience, as I'll now
describe.

Sometimes, when I boot into Linux, one of my monitors does not display
anything. When this happens, I immediately reboot into Windows XP, to see
if it's a software or a hardware problem. And--100% of the time--the
monitor comes up as expected in Windows XP. I then reboot into Linux,
and, most of the time, the monitor comes up as expected. I attribute this
behavior to undefined behavior in Linux, and sometimes wonder if it's the
result of the undefined behavior that results from using the above
mentioned reserved identifiers (or perhaps due to the "kludges" and
"fixmes" so ubiquitously found in The Linux kernel Source).

There are other things that I experience with Linux that I don't
experience with Windows, and I attribute these behaviors to the undefined
behavior in Linux. For example, I have an external USB drive that has one
Linux partition and one NTFS partition. Sometimes (a lot of times
recently) when I have this drive plugged in, the mouse cursor freezes for
about 5 seconds, and then unfreezes for about 760 milliseconds. This
behavior periodically repeats, over and over (in other words, cursor
freezes, then unfreezes, then freezes, then unfreezes, etc.). I do not
experience this problem when I boot into Windows XP. I attribute this
behavior to undefined behavior, as defined by the C Standard, and as you
have all taught me so well.

Thanks all for teaching me so well about undefined behavior.

I have two comments/observations

1) Since you previous post about the FIXME and kludges in the comments
of the linux kernel source, there have been many replies from people
who obviously know a great deal more about C and software development
than yourself (and for that matter myself also.) LISTEN TO THEM! If
you do, then it goes to answer your initial question about the source
and undermine any preconceptions that are based upon the existance of
these comments in the code.

2) Are you a troll? Sorry, no offence intented but got to ask - from
this post it seems a little more like Linux bashing than ANSI standard
C.

Nick
 
B

Ben Bacarisse

Thomas Troeger said:
Billy said:
If you hang around this newsgroup enough, you'll soon find out that
one of the most insidious crimes you can commit in the C language is
undefined behavior.
[...]

#ifndef _LINUX_FS_H
#define _LINUX_FS_H

I don't understand your point. How is a preprocessor construct related
to limitations in the naming of C identifiers? After all, what the C
compiler gets to see is the output of the preprocessor (as with using
the `-E' option), and this will certainly not contain the mentioned
lines, underscore uppercase letter or not.

I terms of the C standard, defining or using certain macros leads to
undefined behaviour. This is behaviour upon which the C standard
places no requirements. In this group, that is interpreted to mean
"anything can happen" but it does not mean "bad things *will* happen".

One of the possible behaviours is that the Linux kernel executes
correctly.
 
R

Randy Howard

[snip a bunch of gorp about UB]
And, here are some facts ...

1. The Linux Kernel Source is NOT part of the implementation, and is
therefore NOT allowed to define any reserved identifiers as specified by
the C Standard.

Hint: The linux kernel is not compiled using a compiler configured for
adherence to any ANSI or ISO C standard. It compiles a "variant",
namely 'gcc', and uses a lot of extensions not found in the standard.
2. The Linux Kernel Source defines identifiers that begin with a leading
underscore followed by an uppercase letter, in almost all of its header
files. For example, the file fs.h starts off with this:

#ifndef _LINUX_FS_H

#define _LINUX_FS_H

3. The Linux Kernel Source, as a result, commits undefined behavior.

Hint: The linux kernel is not compiled using a compiler configured for
adherence to any ANSI or ISO C standard. It compiles a "variant",
namely 'gcc', and uses a lot of extensions not found in the standard.
If you follow all of this, like you should, and you believe what many in
this newsgroup purport to be the consequences of undefined behavior, then
you owe it to yourself to seriously consider the ramifications of
undefined behavior, any time you run Linux.

If the linux kernel was written and compiled as compliant, portable
standard C, you'd have a point, maybe, but it isn't, so you're just
full of it.
Sometimes, when I boot into Linux, one of my monitors does not display
anything. When this happens, I immediately reboot into Windows XP, to see
if it's a software or a hardware problem. And--100% of the time--the
monitor comes up as expected in Windows XP.

"The monitor comes up" being your enlightened technical description for
"pixels light up"?
... sometimes wonder if it's the
result of the undefined behavior that results from using the above
mentioned reserved identifiers (or perhaps due to the "kludges" and
"fixmes" so ubiquitously found in The Linux kernel Source).

Maybe you have bad hardware. Maybe you have a distribution that
doesn't match your hardware configuration well. Perhaps you messed it
up. Maybe you just don't know how to set it up any better than you
understand UB.

[more false conclusions snipped]
Thanks all for teaching me so well about undefined behavior.

No thanks are in order, you obviously still have much to learn.
 
T

Thomas Troeger

Maybe you have bad hardware. Maybe you have a distribution that
doesn't match your hardware configuration well. Perhaps you messed it
up. Maybe you just don't know how to set it up any better than you
understand UB.

Where I have to remark that since 1993 for me, Linux only has crashed
because of buggy hardware components or unreasonable user behaviour like
writing into /dev/mem at an arbitrary address :) So your observations
are most likely just faulty or misconfigured hardware.
 
V

vippstar

If you hang around this newsgroup enough, you'll soon find out that one
of the most insidious crimes you can commit in the C language is
undefined behavior.
<rest of post>
The linux kernel sure isn't ISO C code.
It's not C89, C90, C95 or C99 code nor it was intended to be.
Therefore the standards for these languages do not apply.
Moreover, your post is OT.
 
S

Syren Baran

Billy said:
What is undefined behavior?

According to the C Standard:

"behavior, upon use of a nonportable or erroneous program construct or of
erroneous data, for which this International Standard imposes no
requirements"
Yawn. The behavior is well defined in the implemention used to compile
linux, namely gcc. But you a have very good point there, i dont know if
linux will compile in MS VisualC++.
Claiming that undefined behavior can start WWIII is not something to be
taken lightly. Especially in light of the fact that the Linux Kernel
Source commits undefined behavior.
Ok, the example you give later if is an #ifndef and #define. Not even
visible in userspace. Ah, i get it. You are applying for a job as jester.
1. The Linux Kernel Source is NOT part of the implementation, and is
therefore NOT allowed to define any reserved identifiers as specified by
the C Standard.
The kernel provides the basic features which in turn provide the
possibility to implement a conforming implementation.
2. The Linux Kernel Source defines identifiers that begin with a leading
underscore followed by an uppercase letter, in almost all of its header
files. For example, the file fs.h starts off with this:

#ifndef _LINUX_FS_H
#define _LINUX_FS_H
3. The Linux Kernel Source, as a result, commits undefined behavior.
The wording is incorrect. You would have to say:
The act of compiling the linux kernel with a strictly conforming c
compiler may result in undefined behavior.
If you follow all of this, like you should, and you believe what many in
this newsgroup purport to be the consequences of undefined behavior, then
you owe it to yourself to seriously consider the ramifications of
undefined behavior, any time you run Linux.
Do you think Windows will compile with a strictly conforming c or c++
compiler without any extensions? OH NOEES !!!!one!eleven! I have to find
a way to stop the aliens from constructing their universe implosion machine.
Thanks all for teaching me so well about undefined behavior.

Please come back when you understood the difference between a standard
and an implementation, i do look forward to jesters next episode.
 
S

santosh

Syren said:
Yawn. The behavior is well defined in the implemention used to compile
linux, namely gcc. But you a have very good point there, i dont know
if linux will compile in MS VisualC++.

It most certainly won't. However it /might/ do so with Intel's compiler,
which seeks, I understand, to implement most of gcc's extensions.

The wording is incorrect. You would have to say:
The act of compiling the linux kernel with a strictly conforming c
compiler may result in undefined behavior.

Undefined behaviour applies to runtime behaviour. The act of compiling
itself has to be defined. However any executable, if produced, will
exhibit undefined behaviour, which includes running exactly as the
programmer expected. But such behaviour *cannot* be relied upon.

Please come back when you understood the difference between a standard
and an implementation, i do look forward to jesters next episode.

There has been a recent spate of anti-jacob, anti-lcc and anti-Linux
diatribes by a host of anonymous trolls. It's best to ignore them.
 
P

Philip Potter

santosh said:
Undefined behaviour applies to runtime behaviour. The act of compiling
itself has to be defined. However any executable, if produced, will
exhibit undefined behaviour, which includes running exactly as the
programmer expected. But such behaviour *cannot* be relied upon.

...unless you know your implementation provides stronger guarantees than
the C Standard. This necessarily implies that your code will no longer
be portable to implementations which don't provide similar guarantees,
but this is of no concern to the Linux kernel people.
There has been a recent spate of anti-jacob, anti-lcc and anti-Linux
diatribes by a host of anonymous trolls. It's best to ignore them.

Quite.
 
A

Army1987

Billy said:
If you hang around this newsgroup enough, you'll soon find out that one
of the most insidious crimes you can commit in the C language is
undefined behavior.

Sometimes it is justified to use UB, when you happen to know what that
behavior will be by any other means than normative parts of the C
standard.
What is undefined behavior?

According to the C Standard:

"behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International Standard imposes no
requirements" Indeed.
Here is what the C Standard has to say about one aspect of undefined
behavior (section numbers are from the C99 Standard, and may differ from
the C90 Standard, but the wording in the two standards is effectively
the same):

In Section 7.1.3 (Reserved identifiers) of the C Standard, it states the
following:

"All identifiers that begin with an underscore and either an uppercase
letter or another underscore are always reserved for any use."

In Appendix J.2 (Undefined behavior) of the C Standard, it states the
following:

"The program declares or defines a reserved identifier, other than as
allowed by 7.1.4 (7.1.3)."

(Note that section 7.1.4 (Library functions), as specified above, is
immaterial to the discussion at hand.)

And, here are some facts ...

1. The Linux Kernel Source is NOT part of the implementation, and is
therefore NOT allowed to define any reserved identifiers as specified by
the C Standard.

2. The Linux Kernel Source defines identifiers that begin with a leading
underscore followed by an uppercase letter, in almost all of its header
files. For example, the file fs.h starts off with this:

#ifndef _LINUX_FS_H
#define _LINUX_FS_H
Whoever wrote that does know what the behavior is.
3. The Linux Kernel Source, as a result, commits undefined behavior.
[snip stuff pro WinXP]
Of course, you have read the source code of Windows XP and noticed that it
never has UB according to the C standard.

(Not to mention that one of the last time I used it the mouse cursor froze
when I was just changing the window theme.)

*plonk*
 
A

Army1987

santosh said:
Undefined behaviour applies to runtime behaviour. The act of compiling
itself has to be defined.
IIUC, this doesn't apply if the compiler is able to prove beforehand that
any possible execution path of the program leads to UB.
 
S

santosh

Army1987 said:
IIUC, this doesn't apply if the compiler is able to prove beforehand
that any possible execution path of the program leads to UB.

Do you mean to say that the translation of a source for which the
compiler concludes that all possible paths of execution are undefined,
is itself undefined? So merely attempting to translate such a program
could cause demons to fly out of your nose? I suppose this is due to
the fact that the Standard does not impose any specific method of
translation, as long as the conceptual steps in 5.1.1.2 (n1256) are
followed?
 
A

Army1987

santosh said:
Do you mean to say that the translation of a source for which the
compiler concludes that all possible paths of execution are undefined,
is itself undefined? So merely attempting to translate such a program
could cause demons to fly out of your nose? I suppose this is due to
the fact that the Standard does not impose any specific method of
translation, as long as the conceptual steps in 5.1.1.2 (n1256) are
followed?
" Possible undefined behavior ranges from ignoring the
situation completely with unpredictable
results, to behaving during translation or program execution in a
documented manner characteristic of the environment (with or without the
issuance of a diagnostic message), to terminating a translation or
execution (with the issuance of a diagnostic message). "
 
A

Army1987

polas said:
On 24 Jan, 09:05, Billy Bong <[email protected]> wrote:
["The Linux kernel has UB because it defines a reserved identifier"]
2) Are you a troll? Sorry, no offence intented but got to ask - from
this post it seems a little more like Linux bashing than ANSI standard
C.
If he's not trolling, then there exists at least one person smart enough
to quote past Usenet posting and read the C standard and the Linux kernel
source, but dumb enough to think that the malfunctioning of Linux on his
PC can be due to the invasion of the implementation's namespace.

(And, in some sense, the Linux kernel *is* part of the gcc implementation...)
 
G

Gordon Burditt

One of the many popular consequences of undefined behavior purported in
this newsgroup is that *anything* can happen. For example, the following
post, from an arguably c.l.c regular, apparently claims that undefined
behavior could start WWIII:

It could start World War Nine (bypassing WWIII), and retroactively
alter the C standard to delete fflush(stdin) functionality. Oh,
wait, fflush(stdin) already got retroactively deleted from the
standard. Isn't temporal mechanics fun?
Claiming that undefined behavior can start WWIII is not something to be
taken lightly. Especially in light of the fact that the Linux Kernel
Source commits undefined behavior.

The Linux Kernel (and I'm pretty sure this will be true of just
about any OS) doesn't claim to be ANSI C, and it uses non-standard
extensions. This is pretty much guaranteed as an OS needs to use
instruction set features that C doesn't, and things like cache
flushes, supervisor/user mode switching, non-memory-mapped I/O,
task switching, and the like may require instructions not generated
by a C compiler.
1. The Linux Kernel Source is NOT part of the implementation, and is
therefore NOT allowed to define any reserved identifiers as specified by
the C Standard.

2. The Linux Kernel Source defines identifiers that begin with a leading
underscore followed by an uppercase letter, in almost all of its header
files. For example, the file fs.h starts off with this:

#ifndef _LINUX_FS_H
#define _LINUX_FS_H

3. The Linux Kernel Source, as a result, commits undefined behavior.

One of the possible and PRACTICAL issues here is that an ANSI C
implementation might pre-define _LINUX_FS_H (which it's allowed to
do), and therefore nothing on the inside of fs.h will be compiled.
This may cause errors or warnings about functions with no prototype,
undefined structures, typedefs, or variables, and there may be
problems with macros expected to be defined not being defined.
(Presumably there was some reason it was included in the first
place). Probable result: it won't compile.

But it never claimed to be ANSI C in the first place.
 
T

Thad Smith

Syren said:
Billy Bong schrieb:
The wording is incorrect. You would have to say:
The act of compiling the linux kernel with a strictly conforming c
compiler may result in undefined behavior.

Even more accurate is that the compiling and running the Linux kernel
results in behavior which is not defined by Standard C.

Just because Standard C doesn't define the behavior of a particular program
doesn't mean that it is useless. It may be adequately defined by something
else.

The purpose of noting behavior which is undefined by Standard C is to
distinguish code or code portions which are not inherently portable to
compilers adhering to the Standard.
 
S

santosh

Thad said:
Even more accurate is that the compiling and running the Linux kernel
results in behavior which is not defined by Standard C.

Just because Standard C doesn't define the behavior of a particular
program doesn't mean that it is useless. It may be adequately defined
by something else.

Yes. In this case it is the combination of gcc and the hardware and it's
protocols. What the OP has got to realise is the definition of
undefined behaviour as given in ISO 9899:1999 is valid only for the
language as defined by that document. And the C as used in the Linux
kernel is emphatically not Standard C.
The purpose of noting behavior which is undefined by Standard C is to
distinguish code or code portions which are not inherently portable to
compilers adhering to the Standard.

Relying on undefined behaviour is portable, but not very useful. It's
relying on implementation defined behaviour that is likely not
portable.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top