Code review requested

bart.vandewoestyne · Sep 20, 2012

Hello group,

Just as a fun-project and to learn new things, I've started writing my version of the Tiger compiler from the book 'Modern Compiler Implementation in C' by Andrew W. Appel. My code can be found at the following GitHub repo:

https://github.com/BartVandewoestyne/c/tree/master/books/Modern_Compiler_Implementation_in_C

With some help from code that I found on the Internet, I have finished everything from the first chapter now:

https://github.com/BartVandewoestyne/c/tree/master/books/Modern_Compiler_Implementation_in_C/chap01

Because I also consider this as an exercise in writing good C-style (more specifically C99), I would like to have my code reviewed. Please share me your thoughts on the following:

1) General comments concerning my C-programming style is greatly appreciated!

2) The maxargs.c and interp.c still give me the warning "control reaches end of non-void function". I know why this is, but I am wondering what wouldbe the cleanest way to get rid of this warning. Considering for example the function maxargs_expList from maxargs.c, i just *know* that expList->kind is only one of the two proposed values in the switch statement. If anything else appears, that is an error. What is in your opinion the best solution in this case, leading to code that compiles without warnings?

Thanks!
Bart

Ben Bacarisse · Sep 20, 2012

Just as a fun-project and to learn new things, I've started writing my
version of the Tiger compiler from the book 'Modern Compiler
Implementation in C' by Andrew W. Appel. My code can be found at the
following GitHub repo:

https://github.com/BartVandewoestyne/c/tree/master/books/Modern_Compiler_Implementation_in_C

With some help from code that I found on the Internet, I have finished
everything from the first chapter now:

https://github.com/BartVandewoestyne/c/tree/master/books/Modern_Compiler_Implementation_in_C/chap01

Because I also consider this as an exercise in writing good C-style
(more specifically C99), I would like to have my code reviewed.
Please share me your thoughts on the following:

1) General comments concerning my C-programming style is greatly
appreciated!

Looks good to me.

A couple of points: I'd strive for more consistency in layout. I see
space around function arguments in some places and not others (f( x ) vs
f(x)) and I see some operators surrounded by space and not others. You
have both switch( x ) and switch (x) in the same source file. It's not
a big point, but I find it a little distracting.

You use TRUE and FALSE for functions that return bool. The header that
defines bool also defined true and false, so why use your own?

Slightly more significant is that I was puzzled by why count_exp can
never return zero. This might well be correct, but it's the kind of
things that merits a comment. I think there is a connection between
this and the next point.

My only big comment... A lot of the code is boilerplate to deal with
a large number of structure types (well, unions within a structure). I
would probably search for some way to simplify this. I don't know the
book, so it's possible you have to do it this way.

One big example of this that the code seems to have two kinds of
expression list -- one used only for the last node. This look peculiar
to me and does seem to add complexity. Is there some benefit to it?
Does the book require it? Have I misunderstood?

2) The maxargs.c and interp.c still give me the warning "control
reaches end of non-void function". I know why this is, but I am
wondering what would be the cleanest way to get rid of this warning.
Considering for example the function maxargs_expList from maxargs.c, i
just *know* that expList->kind is only one of the two proposed values
in the switch statement. If anything else appears, that is an error.
What is in your opinion the best solution in this case, leading to
code that compiles without warnings?

My preference is to put the redundant return in place, preceded by an
assert. Then I prepare to be amazed at how often I see the assert fire!

Bart Vandewoestyne · Sep 20, 2012

[...]
A couple of points: I'd strive for more consistency in layout. I see
space around function arguments in some places and not others (f( x ) vs
f(x)) and I see some operators surrounded by space and not others. You
have both switch( x ) and switch (x) in the same source file. It's not
a big point, but I find it a little distracting.

I totally agree!

You use TRUE and FALSE for functions that return bool. The header that
defines bool also defined true and false, so why use your own?

The book has a

#define TRUE 1
#define FALSE 0

in util.h. Do you mean that is not necessary because TRUE and FALSE are defined in a standard header?

Slightly more significant is that I was puzzled by why count_exp can
never return zero. This might well be correct, but it's the kind of
things that merits a comment. I think there is a connection between
this and the next point.

My educated guess is that it's related to the fact that for the simple language described in the book, the print statement always has at least one argument.

My only big comment... A lot of the code is boilerplate to deal with
a large number of structure types (well, unions within a structure). I
would probably search for some way to simplify this. I don't know the
book, so it's possible you have to do it this way.

Indeed. A lot of this is code taken from the book... so I try to stick with that for now...

One big example of this that the code seems to have two kinds of
expression list -- one used only for the last node. This look peculiar
to me and does seem to add complexity. Is there some benefit to it?
Does the book require it? Have I misunderstood?

The grammar of the simple language described in the book also has this. So yes, the book requires it.

[...]
My preference is to put the redundant return in place, preceded by an
assert. Then I prepare to be amazed at how often I see the assert fire!

Using assert(0) at 'unreachable' places indeed seemed like the best and cleanest solution to me. It makes the warnings go away and on top of that, i get extra debugging facilities.

New version is online at

https://github.com/BartVandewoestyne/c/tree/master/books/Modern_Compiler_Implementation_in_C/chap01

Comments still welcome! If no more comments, up to Chapter 2!

Regards,
Bart

Ben Bacarisse · Sep 20, 2012

Bart Vandewoestyne said:
The book has a

#define TRUE 1
#define FALSE 0

in util.h. Do you mean that is not necessary because TRUE and FALSE
are defined in a standard header?

Yes, the same place that the type bool is defined (stdbool.h, new in
C99).

What gets defined in stdbool.h are true and false rather than TRUE and
FALSE, so if you want to use code form the book you should probably not
change. It suggests, though, that the book is not using C99 so using
it's exercises to learn C99 is going to be problematic.

Indeed. A lot of this is code taken from the book... so I try to
stick with that for now...

The grammar of the simple language described in the book also has
this. So yes, the book requires it.

That doesn't follow. You might be able to (well, I'll stick my neck out
-- you can) build a structure from that grammar that requires fewer
special cases.

<snip>

ImpalerCore · Sep 20, 2012

[...]
A couple of points: I'd strive for more consistency in layout. I see
space around function arguments in some places and not others (f( x ) vs
f(x)) and I see some operators surrounded by space and not others. You
have both switch( x ) and switch (x) in the same source file. It's not
a big point, but I find it a little distracting.

Click to expand...

I totally agree!

You use TRUE and FALSE for functions that return bool. The header that
defines bool also defined true and false, so why use your own?

Click to expand...

The book has a

#define TRUE 1
#define FALSE 0

in util.h. Do you mean that is not necessary because TRUE and FALSE are defined in a standard header?

Slightly more significant is that I was puzzled by why count_exp can
never return zero. This might well be correct, but it's the kind of
things that merits a comment. I think there is a connection between
this and the next point.

Click to expand...

My educated guess is that it's related to the fact that for the simple language described in the book, the print statement always has at least one argument.

My only big comment... A lot of the code is boilerplate to deal with
a large number of structure types (well, unions within a structure). I
would probably search for some way to simplify this. I don't know the
book, so it's possible you have to do it this way.

Click to expand...

Indeed. A lot of this is code taken from the book... so I try to stickwith that for now...

One big example of this that the code seems to have two kinds of
expression list -- one used only for the last node. This look peculiar
to me and does seem to add complexity. Is there some benefit to it?
Does the book require it? Have I misunderstood?

Click to expand...

The grammar of the simple language described in the book also has this. So yes, the book requires it.

[...]
My preference is to put the redundant return in place, preceded by an
assert. Then I prepare to be amazed at how often I see the assert fire!

Click to expand...

Using assert(0) at 'unreachable' places indeed seemed like the best and cleanest solution to me. It makes the warnings go away and on top of that, i get extra debugging facilities.

New version is online at

https://github.com/BartVandewoestyne/c/tree/master/books/Modern_Compi...

Comments still welcome! If no more comments, up to Chapter 2!

The only thing that I'm not fond of is the typedefs used to hide the
pointer type. The pointer '*' symbol is an important type modifier
and I like to show them explicitly for easier semantic understanding
of a function interface. When I look at your code, I have to lookup
each identifier where its defined or in the documentation to remember
whether the interface is pass-by-value or pass-by-reference, which
gets more problematic when you have a large interface with many
types. While I can get used to most naming conventions, my preference
is to make the pointer attribute of a type explicit. By the same
token, I prefer using 'char*' explicitly over something like 'string'.

On a different note, I prefer to place the asterisk '*' next to the
type rather than next to the variable, as in 'struct table*
my_table'. I view the pointer modification as a 'type' modifier, not
a 'variable' modifier as C's syntax suggests. It's just a personal
quirk that's non-standard, so feel free to ignore it.

I do not particularly like the way you use typedefs to alias struct
types. I prefer to typedef a struct type to the same name (as in
'typedef struct c_list c_list;'), or to place the trailing suffix in
the struct identifier.

struct Table_;
typedef struct Table_ Table;

I also think that '#include "stdlib.h"' should be '#include
<stdlib.h>' unless you are writing your own version for some reason.
The quotes are used to reference headers local to a library or
application. In this case, the standard library is not part of your
application, so it's best to display that explicitly using
'<stdlib.h>'. You do this in "util.h" but not in other files.

Just my preferences, take them as such.

Best regards,
John D.

Ian Collins · Sep 20, 2012

On 09/21/12 03:30 AM, Bart Vandewoestyne wrote:

[wrapping yours lines on Usenet is good practice]

Using assert(0) at 'unreachable' places indeed seemed like the best and cleanest solution to me. It makes the warnings go away and on top of that, i get extra debugging facilities.

Rather than assert(0), I prefer assert(!"Some useful diagnostic"). This
helps you when it fires and documents why the assert is there in the code.

Bart Vandewoestyne · Sep 20, 2012

Rather than assert(0), I prefer assert(!"Some useful diagnostic"). This
helps you when it fires and documents why the assert is there in the code..

I have been thinking of this, but i picked assert(0) because I thought assert(!"Some useful diagnostic") is not standard-conforming. However, if I compile with gcc and the -std=c99 option i get no warnings, and thinking about it, the expression !"Some useful diagnostic" probably also converts to a 'scalar expression' as required by the C99 standard, so is probably completely standard-conforming (please correct me if I'm wrong).

Updated code online at https://github.com/BartVandewoestyne/c/tree/master/books/Modern_Compiler_Implementation_in_C/chap01

Thanks!
Bart

Bart Vandewoestyne · Sep 20, 2012

Yes, the same place that the type bool is defined (stdbool.h, new in
C99).

What gets defined in stdbool.h are true and false rather than TRUE and
FALSE, so if you want to use code form the book you should probably not
change. It suggests, though, that the book is not using C99 so using
it's exercises to learn C99 is going to be problematic.

I just checked. The book mentions:

First published: 1998
Reprinted with corrections: 1999
First paperback edition: 2004

Since it was first published in 1998 and since TRUE and FALSE are defined, i think we can indeed assume it is C90 code. I will therefore stick to C90. It's

ImpalerCore · Sep 20, 2012

I have been thinking of this, but i picked assert(0) because I thought assert(!"Some useful diagnostic") is not standard-conforming. However, if I compile with gcc and the -std=c99 option i get no warnings, and thinking about it, the expression !"Some useful diagnostic" probably also convertsto a 'scalar expression' as required by the C99 standard, so is probably completely standard-conforming (please correct me if I'm wrong).

Updated code online athttps://github.com/BartVandewoestyne/c/tree/master/books/Modern_Compi...

About the boolean type used in your example, you can emulate a C99
bool in C90 pretty closely by using a typedef.

\code
#if defined(HAVE_STDBOOL_H)
# include <stdbool.h>
#else
/*!
* \brief Define a boolean type approximation.
*
* This enumeration defines a close enough approximation to the
* semantics used by the C99 \c bool type to use in the library
* implementation.
*
* This introduces a couple of caveats to avoid when using the
* \c bool type from the library. The first is to avoid conversions
* between other integer types and \c bool. There are semantic
* differences between \c _Bool conversions compared with using an
* enumeration. The other issue is that the \c true and \c false
* identifiers are not declared as macro identifiers. This implies
* that \c true and \c false cannot be used in preprocessor
* expressions. The use of \c true and \c false identifiers at the
* preprocessor level is not ubiquitous, so the issue is unlikely to
* cause problems.
*/
typedef enum { false, true } bool;
#endif
\endcode

You can also go the __STDC__ route for the preprocessor check instead
of the autoconf style check.

\code
#if (__STDC__ && __STDC_VERSION__ >= 199901L)
# include <stdbool.h>
#else
....
#endif
\endcode

Best regards,
John D.

Eric Sosman · Sep 20, 2012

I have been thinking of this, but i picked assert(0) because I thought assert(!"Some useful diagnostic") is not standard-conforming. However, if I compile with gcc and the -std=c99 option i get no warnings, and thinking about it, the expression !"Some useful diagnostic" probably also converts to a 'scalar expression' as required by the C99 standard, so is probably completely standard-conforming (please correct me if I'm wrong).

The utility of an assert() diagnostic depends on the person
reading the message. If it's the programmer, __FILE__ and __LINE__
are really all that's needed. If it's the end user (anyone without
access to the source), a case could be made for assert(!"BUG!") but
not much more. In neither case is detailed info of much use! (The
values of important variables could be useful, but assert() won't
display them.)

On the topic of TRUE and FALSE, just be sure you don't make
the mistake I once encountered in someone else's actual code:

typedef enum { TRUE, FALSE } bool;

Bart Vandewoestyne · Sep 20, 2012

Yes, the same place that the type bool is defined (stdbool.h, new in
C99).

What gets defined in stdbool.h are true and false rather than TRUE and
FALSE, so if you want to use code form the book you should probably not
change. It suggests, though, that the book is not using C99 so using
it's exercises to learn C99 is going to be problematic.

I just checked. The book mentions:

First published: 1998
Reprinted with corrections: 1999
First paperback edition: 2004

Since it was first published in 1998 and since TRUE and FALSE are
defined, i think we can indeed assume it is C90 code. I will therefore
stick to C90. It's a good exercise to get to know the differences
between C90 and C99.

Regards,
Bart

James Kuyper · Sep 20, 2012

On 09/20/2012 04:48 PM, ImpalerCore wrote:
....

\code
#if defined(HAVE_STDBOOL_H)

HAVE_STDBOOL_H is not defined by any version of the C standard; it's not
a portable way of checking for whether stdbool.h is supported.

# include <stdbool.h> ....
* This introduces a couple of caveats to avoid when using the
* \c bool type from the library. The first is to avoid conversions
* between other integer types and \c bool. There are semantic
* differences between \c _Bool conversions compared with using an
* enumeration.

It's not just integer types you need to worry about, (bool)0.5 and
(bool)&object won't work properly, either.

Bart Vandewoestyne · Sep 20, 2012

The only thing that I'm not fond of is the typedefs used to hide the
pointer type. The pointer '*' symbol is an important type modifier
and I like to show them explicitly for easier semantic understanding
of a function interface. When I look at your code, I have to lookup
each identifier where its defined or in the documentation to remember
whether the interface is pass-by-value or pass-by-reference, which
gets more problematic when you have a large interface with many
types. While I can get used to most naming conventions, my preference
is to make the pointer attribute of a type explicit. By the same
token, I prefer using 'char*' explicitly over something like 'string'.

I can agree with this, but I'm trying to follow the book's code and
conventions, so I guess I'll stick with it for now. Once I worked my
way through all the chapters and have a fully working Tiger compiler
(that will probably take me some months/years ;-) I might start
thinking of cleaning up the code and move it to C99.

On a different note, I prefer to place the asterisk '*' next to the
type rather than next to the variable, as in 'struct table*
my_table'. I view the pointer modification as a 'type' modifier, not
a 'variable' modifier as C's syntax suggests. It's just a personal
quirk that's non-standard, so feel free to ignore it.

I know this is a common discussion and I haven't decided for myself yet
what form I prefer the most... I think the most important is to choose
one and be consistent.

I do not particularly like the way you use typedefs to alias struct
types. I prefer to typedef a struct type to the same name (as in
'typedef struct c_list c_list;'), or to place the trailing suffix in
the struct identifier.

struct Table_;
typedef struct Table_ Table;

I agree, and the book is quite inconsistent here... sometimes I also
find it hard to know and remember what is what exactly... especially
with the underscores and capital vs non-capital letters. The code from
the book could definitely be made more consistent in my opinion.

I also think that '#include "stdlib.h"' should be '#include
<stdlib.h>' unless you are writing your own version for some reason.

Good catch! Thanks!

Regards,
Bart

ImpalerCore · Sep 20, 2012

On 09/20/2012 04:48 PM, ImpalerCore wrote:
...

HAVE_STDBOOL_H is not defined by any version of the C standard; it's not
a portable way of checking for whether stdbool.h is supported.

By standard means sure, but not all C compilers are C99 conforming.
On an older project, I had to deal with a compiler that had some C99
extensions but not true C99 support, so it had a __STDC_VERSION__ <
199901L but came with <stdbool.h>. But I agree, in the abstract the
__STDC_VERSION__ version is better. Since the code I write attempts
to be compatible with the enum workaround, I should probably shift to
the __STDC__ preprocessor symbols.

Still, 'HAVE_STDBOOL_H' is a standard style of identifier when you use
need autoconf to account for cross-platform compatibility. The real
world can be a mess at times.

It's not just integer types you need to worry about, (bool)0.5 and
(bool)&object won't work properly, either.

I have not experienced code that did those type conversions, so I
didn't document it at the time. Have you happened to experience
either of those type conversions in the wild?

Best regards,
John D.

Rui Maciel · Sep 20, 2012

Bart said:
I have been thinking of this, but i picked assert(0) because I thought
assert(!"Some useful diagnostic") is not standard-conforming. However, if
I compile with gcc and the -std=c99 option i get no warnings, and thinking
about it, the expression !"Some useful diagnostic" probably also converts
to a 'scalar expression' as required by the C99 standard, so is probably
completely standard-conforming (please correct me if I'm wrong).

A character string literal is used to initialize an array of static storage
duration. This means that whenever you specify a character string literal,
that expression will evaluate to a valid address. As that address won't be
0 then a character string literal will always evaluate to true.

Hence, !"Some useful diagnostic" will always evaluate to false and
assert(!"Some useful diagnostic") will always be triggered, all this in a
very standard way.

Rui Maciel

James Kuyper · Sep 20, 2012

I have not experienced code that did those type conversions, so I
didn't document it at the time. Have you happened to experience
either of those type conversions in the wild?

My project is still governed by a 1997 agreement with our client which
requires us to write code which "conforms" to C90 plus the two Technical
Corrigenda. Technically, that's not exactly a restrictive requirement -
"War and Peace" qualifies as conforming C code - but I try to follow the
spirit of the requirement, even if it was incorrectly worded.

I long ago adopted a policy of avoiding defining variables whose sole
job is to keep track of whether or not a condition is true; I prefer
re-stating the condition - it often leads to clearer code, and generally
does the right thing when the condition involves variables whose values
have changed since the last time it was evaluated. If the condition does
not actually need to be re-evaluated, most modern compilers can be
counted on to detect that fact (more reliably than I can), and use an
unnamed temporary for the same purpose that a named boolean variable
would have been used for. Therefore, I think this is a feature I won't
make much use of, even when I can do so.

However, if I ever do use _Bool, implicit conversions from pointer
values to _Bool stand a very good chance of being involved. I often
write if(p) rather than if(p!=NULL), which is essentially the same idea.

Keith Thompson · Sep 20, 2012

Rui Maciel said:
A character string literal is used to initialize an array of static storage
duration. This means that whenever you specify a character string literal,
that expression will evaluate to a valid address.

That's true in most contexts, including this one, but there are
exceptions:

sizeof "hello"
yields 6, the size of the array object, not the size of a pointer.

char s[] = "hello";
copies the array value to s; it doesn't decay to the address.

&"Waldo"
yields the memory address of the array object (thus answering
the age-old question).

As that address won't be
0 then a character string literal will always evaluate to true.

Hence, !"Some useful diagnostic" will always evaluate to false and
assert(!"Some useful diagnostic") will always be triggered, all this in a
very standard way.

More pedantically, the expression
!"Some useful diagnostic"
will evaluate to the int value 0 -- which is a false value. The unary
"!" operator always yields a value (either 0 or 1) of type int.p

Keith Thompson · Sep 20, 2012

ImpalerCore said:
I do not particularly like the way you use typedefs to alias struct
types. I prefer to typedef a struct type to the same name (as in
'typedef struct c_list c_list;'), or to place the trailing suffix in
the struct identifier.

struct Table_;
typedef struct Table_ Table;

[...]

Yes, if you're going to use a typedef for a struct type, there's no need
to use a different identifier than the struct tag (likewise for unions
and enums).

My own preference is not to use a typedef at all, unless I want to hide
the fact that the type is a struct. For example, I'd just write:

struct table {
/* member declarations */
};

and then refer to the type as "struct table". I don't see much
point in defining a second name (like "table") for a type that
already has a perfectly good name ("struct table"). A lot of
people like the advantage of not having to type the extra word,
which is a valid argument, but not one that I find convincing.

hormelfree · Sep 20, 2012

On the topic of TRUE and FALSE, just be sure you don't make the mistake I once > encountered in someone else's actual code: typedef enum { TRUE, FALSE } bool;

Well, it would work SOMETIMES...

Francois Grieu · Sep 21, 2012

I long ago adopted a policy of avoiding defining variables whose sole
job is to keep track of whether or not a condition is true; I prefer
re-stating the condition - it often leads to clearer code, and generally
does the right thing when the condition involves variables whose values
have changed since the last time it was evaluated.

That's certainly fine when performance and code density do not matter.

If the condition does
not actually need to be re-evaluated, most modern compilers can be
counted on to detect that fact (more reliably than I can), and use an
unnamed temporary for the same purpose that a named boolean variable
would have been used for.

I wish I knew any compiler than could do that for the 8-bit targets
(ST7-on-steroids, 8051-siblings, PIC18-and-then-some) that I often code for.
http://www.st.com/internet/com/TECH..._LITERATURE/PROGRAMMING_MANUAL/CD00004607.pdf
http://www.classic.nxp.com/acrobat_download2/various/80C51_FAM_PROG_GUIDE_1.pdf
http://ww1.microchip.com/downloads/en/DeviceDoc/39500a.pdf

From my mind's representation of their capability, all compilers I used for
these target miss that the same expression is used twice, except for
sub-expressions in the same expression, or perhaps expressions in a linear
sequence of statements that does not contain any function call. Even if this
common expression recognition triggers, I doubt it would use bit variables
and instructions often availabl. Hence, *if* speed or size is more relevant
than clarity, using an explicit intermediary bit variable would typically be
beneficial on these targets.

Francois Grieu

Convert string with control character in caret notation to realcontrol character string.	8	Sep 25, 2012
error: initializer element is not constant	5	Oct 3, 2012
Code sharing	2	Oct 15, 2024
Code Review requested: Postscript Interpreter	21	Dec 19, 2010
"0%\| \| 0/10 [00:00<?, ?it/s] " Python Never Ends Running	0	Nov 21, 2022
code review	127	Jun 28, 2012
Javascript shipping countdown timer code help	9	Dec 22, 2024
How does DataFrame.interpolation() work (in its source code)?	0	Mar 26, 2022

Code review requested

bart.vandewoestyne

Ben Bacarisse

Bart Vandewoestyne

Ben Bacarisse

ImpalerCore

Ian Collins

Bart Vandewoestyne

Bart Vandewoestyne

ImpalerCore

Eric Sosman

Bart Vandewoestyne

James Kuyper

Bart Vandewoestyne

ImpalerCore

Rui Maciel

James Kuyper

Keith Thompson

Keith Thompson

hormelfree

Francois Grieu

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads