polymorphic structs with "methods"

Peter Michaux · Mar 23, 2008

/*
I want to have heterogeneous lists but treat all nodes the same
without checking some sort of struct "type" member and then using a
switch statement to call the appropriate function for that type. This
is in an effort to make my code more modular. Updating the switch
statement when a new "type" is added is not appealing. I've been
working on an idea (surely not original) about having polymorphic
struct "objects" that have function pointer members for "methods". My
code example is below and I would like to know where it sits in the
range from "hideous worst practice ever" to "yep people do that sort
of thing". If there are known improvements I could make I would
appreciate any comments, a link to a web page or book title.
*/

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

typedef struct _Object {
void (*prettyPrint)(struct _Object *);
void (*destroy)(struct _Object *);
} Object;

/***********************************/

typedef struct _Number {
/* obj member must be first in this struct
for down casts of Number to Object */
Object obj;
double val;
} Number;

void prettyPrintNumber(Object *obj) {
Number *num = (Number *)obj;
printf("the number is: %g\n", num->val);
}

void destroyNumber(Object *obj) {
Number *num = (Number *)obj;
free(num);
}

Number *createNumber(double val) {
Number *num = (Number *)malloc(sizeof(Number));
num->obj.prettyPrint = prettyPrintNumber;
num->obj.destroy = destroyNumber;
num->val = val;
return num;
}

/***********************************/

typedef struct _String {
Object obj;
char *val;
} String;

void prettyPrintString(Object *obj) {
String *str = (String *)obj;
printf("the string is: %s\n", str->val);
}

void destroyString(Object *obj) {
String *str = (String *)obj;
free(str->val);
free(str);
}

String *createString(char *val) {
String *str = (String *)malloc(sizeof(String));
str->obj.prettyPrint = prettyPrintString;
str->obj.destroy = destroyString;
str->val = strdup(val);
return str;
}

/***********************************/

int main(void)
{
printf("hello, world\n");

Object *n = (Object *)createNumber(21);
/* don't need to know n is a Number to print it */
n->prettyPrint(n);

Object *s = (Object *)createString("test");
s->prettyPrint(s);

n->destroy(n);
s->destroy(s);
return 0;
}

/*
Thanks,
Peter
*/

Eric Sosman · Mar 23, 2008

Peter said:
/*
I want to have heterogeneous lists but treat all nodes the same
without checking some sort of struct "type" member and then using a
switch statement to call the appropriate function for that type. This
is in an effort to make my code more modular. Updating the switch
statement when a new "type" is added is not appealing. I've been
working on an idea (surely not original) about having polymorphic
struct "objects" that have function pointer members for "methods". My
code example is below and I would like to know where it sits in the
range from "hideous worst practice ever" to "yep people do that sort
of thing". If there are known improvements I could make I would
appreciate any comments, a link to a web page or book title.
*/

"Yep, people do that sort of thing." But in the rather
bare form you've presented, I think you'll find that you've
just traded one maintenance headache for another. You no
longer need to revisit all the switch statements when adding
a new type, but you *do* have to revisit all the existing
types when adding a new method. Also, as the repertoire of
methods grows, so too will the size of the Object struct and
the tedium of initializing each instance's pointers.

Some people have attacked both these problems by adding
one level of indirection. Instead of dragging all the function
pointers around in every instance, they carry just one pointer
to a single struct that contains all the method pointers for
the "class." For example,

struct Methods {
void (*prettyPrint)(const void *);
int (*compareTo)(const void *, const void *);
void (*destroy)(void *);
};

const struct Methods numberMethods = {
prettyPrintNumber, compareToNumber, destroyNumber };

struct Number {
struct Methods *methods;
double value;
};

struct Number * createNumber(double value) {
struct Number *new = malloc(sizeof *new);
if (new != NULL) {
new->methods = numberMethods;
new->value = value;
}
return new;
}

const struct Methods stringMethods = {
prettyPrintString, compareToString, destroyString };

struct String {
struct Methods *methods;
char *string;
};

struct String * createString(const char *string) {
struct String *new = malloc(sizeof *new);
if (new != NULL) {
new->methods = stringMethods;
new->string = strdup(string); // non-Standard, BTW
if (new->string == NULL) {
free (new);
new = NULL;
}
}
return new;
}

typedef struct Methods *Object; // optional, helpful

void displayAndDestroy(Object *obj) {
obj->methods.prettyPrint(obj);
obj->methods.destroy(obj);
}

It is possible to use more elaborate schemes, too, which
may make it easier to manage "inheritance," dynamic creation of
new "classes," and so on.

However, I'd recommend against it. It's a kludge -- I don't
think there's any way to escape a lot of `void*' pointers and/or
a lot of casts, both of which are detrimental to type safety.
It's a kludge that was necessary a few decades ago, when C++
was a mish-mash of mutually incompatible semi-implementations
and Java hadn't been invented. But the toolchains are a lot
better than in the bad old days: C++ implementations are more
widespread and have converged toward their Standard, Java may
have been over-hyped at first but turns out to be quite useful,
and there are probably other languages that would meet most
reasonable sets of requirements.

Doing O-O in C is not putting lipstick on a pig -- C is not
a pig -- but it's a lot like teaching a pig to play the violin:
Just barely possible, perhaps, but not pleasing.

Peter Michaux · Mar 23, 2008

Hi Eric,

Thanks for the detailed reply...

"Yep, people do that sort of thing." But in the rather
bare form you've presented, I think you'll find that you've
just traded one maintenance headache for another. You no
longer need to revisit all the switch statements when adding
a new type, but you *do* have to revisit all the existing
types when adding a new method.

How does your suggestion below avoid revisiting all types when a new
method is added? I'm not suggesting the type of revisiting looks bad,
but at least a new function will need to be written and added to
"numberMethods", for example.

Also, as the repertoire of
methods grows, so too will the size of the Object struct and
the tedium of initializing each instance's pointers.
Indeed.

Some people have attacked both these problems by adding
one level of indirection. Instead of dragging all the function
pointers around in every instance, they carry just one pointer
to a single struct that contains all the method pointers for
the "class." For example,

struct Methods {
void (*prettyPrint)(const void *);
int (*compareTo)(const void *, const void *);
void (*destroy)(void *);
};

const struct Methods numberMethods = {
prettyPrintNumber, compareToNumber, destroyNumber };

struct Number {
struct Methods *methods;
double value;
};

struct Number * createNumber(double value) {
struct Number *new = malloc(sizeof *new);

I haven't seen the above idiom before. It is completely equivalent to
the following?

struct Number *new = malloc(sizeof struct Number);

And is there no need for the explicit cast?

struct Number *new = (struct Number *)malloc(sizeof struct Number);

if (new != NULL) {

Thanks for adding in the bits I forgot. I really do need to do more C
programming

new->methods = numberMethods;

The above line needs to be the following correct?

new->methods = &numberMethods;

new->value = value;
}
return new;
}

const struct Methods stringMethods = {
prettyPrintString, compareToString, destroyString };

struct String {
struct Methods *methods;
char *string;
};

struct String * createString(const char *string) {
struct String *new = malloc(sizeof *new);
if (new != NULL) {
new->methods = stringMethods;
new->string = strdup(string); // non-Standard, BTW
if (new->string == NULL) {
free (new);
new = NULL;
}
}
return new;
}

typedef struct Methods *Object; // optional, helpful

void displayAndDestroy(Object *obj) {
obj->methods.prettyPrint(obj);
obj->methods.destroy(obj);
}

It is possible to use more elaborate schemes, too, which
may make it easier to manage "inheritance," dynamic creation of
new "classes," and so on.

If I'm programming in C, I'll try not to go off the deep end. I only
want to use the type of system like you show above for objects that
will be used polymorphically. If I know the type of the object then I
have no problem calling destroyType(obj). This sort of polymorphic
problem in a non-OO language has made me wonder for a while about how
to do things in a functional language like Scheme, for example.

However, I'd recommend against it.

Recommend against the more elaborate systems or even the system you
present above?

It's a kludge -- I don't
think there's any way to escape a lot of `void*' pointers and/or
a lot of casts, both of which are detrimental to type safety.
It's a kludge that was necessary a few decades ago, when C++
was a mish-mash of mutually incompatible semi-implementations
and Java hadn't been invented. But the toolchains are a lot
better than in the bad old days: C++ implementations are more
widespread and have converged toward their Standard, Java may
have been over-hyped at first but turns out to be quite useful,
and there are probably other languages that would meet most
reasonable sets of requirements.

Doing O-O in C is not putting lipstick on a pig -- C is not
a pig -- but it's a lot like teaching a pig to play the violin:
Just barely possible, perhaps, but not pleasing.

Part of my exercise is to learn how to program better in C.

Thanks again for the comments and code. Much appreciated.

Peter

Chris Thomasson · Mar 24, 2008

Peter Michaux said:
/*
I want to have heterogeneous lists but treat all nodes the same
without checking some sort of struct "type" member and then using a
switch statement to call the appropriate function for that type. This
is in an effort to make my code more modular. Updating the switch
statement when a new "type" is added is not appealing. I've been
working on an idea (surely not original) about having polymorphic
struct "objects" that have function pointer members for "methods". My
code example is below and I would like to know where it sits in the
range from "hideous worst practice ever" to "yep people do that sort
of thing". If there are known improvements I could make I would
appreciate any comments, a link to a web page or book title.
*/

[...]

http://groups.google.com/group/comp.lang.c/browse_frm/thread/1b106926ba5db19f

Ben Bacarisse · Mar 24, 2008

Peter Michaux said:
On Mar 23, 3:32 pm, Eric Sosman <[email protected]> wrote:

I haven't seen the above idiom before. It is completely equivalent to
the following?

struct Number *new = malloc(sizeof struct Number);

You've missed the required brackets round the type-name:

struct Number *new = malloc(sizeof(struct Number));

sizeof has two forms, roughly:

sizeof <expression>

sizeof ( said:
And is there no need for the explicit cast?

No, not need at all. void * can be converted to any object pointer
type (and vice-versa), and this will happen without any diagnostic
simply due to the rules of assignment.

Malcolm McLean · Mar 24, 2008

Peter Michaux said:
Hi Eric,

Thanks for the detailed reply...

How does your suggestion below avoid revisiting all types when a new
method is added? I'm not suggesting the type of revisiting looks bad,
but at least a new function will need to be written and added to
"numberMethods", for example.

The decision to use procedural programming runs through your whole design.
The program is designed as a pyramid with each brick knowing about only the
parts directly beneath it. Functions tend to get more simple and more
general as you go further down.
What this means is that there is no central dispatcher which needs to know
about every "method" ( in C talk, "function") of an object and to resolve at
runtime.

Part of my exercise is to learn how to program better in C.

If you are com ing from a Java background it is natural to want to emulate
Java constructs in C. However this is a bad idea. C isn't a crippled Java,
though it can be used as such. Think functions, and when you specify a
function think, "how can I make this function reusable?". Even if in fact
you don't reuse the fucntion, your code will improve as a result.

Eric Sosman · Mar 24, 2008

Peter said:
Hi Eric,

Thanks for the detailed reply...

[...] You no
longer need to revisit all the switch statements when adding
a new type, but you *do* have to revisit all the existing
types when adding a new method.

Click to expand...

How does your suggestion below avoid revisiting all types when a new
method is added? I'm not suggesting the type of revisiting looks bad,
but at least a new function will need to be written and added to
"numberMethods", for example.

That's in my telegraphically brief allusion to "more
elaborate schemes" -- I ran out of typing time before fleshing
that part out ...

The idea (or "an" idea) is that when adding a new method
you'd add a pointer to a "no such method" function to the
struct for each type that hasn't implemented the method yet
(or perhaps never will). Then if a piece of code invokes the
"logarithm" method on a SkipList object, you'll find out.

Still fancier schemes might use a dispatch table that's
filled in at run-time, when the "class" is defined or when
methods are called. Those I've seen have usually involved a
method ID of some kind, which is used during a one-time search
up the "inheritance tree" to discover the right function to
call (the result gets cached so the search is done just once).
But these schemes also usually involve an intermediate call
to a "method invoker," and a further loss of type safety from
wrapping all returned values in unions and/or using variadic
functions pretty much everywhere.

I haven't seen the above idiom before.

That, my lad, is The Comp.Lang.C Official Allocation Idiom.
Kneel, knock your forehead on the floor three times, and go
browse the group's archives.

It is completely equivalent to
the following?

See Ben Bacarisse's reply.

The above line needs to be the following correct?

new->methods = &numberMethods;

Yes, it does. Sorry for any confusion.

Recommend against the more elaborate systems or even the system you
present above?

Both, really. I'm not against using a little bit of
self-identification in C data, but if polymorphism starts to
play a significant role in the program I'll either rethink
the design or revisit my choice of implementation language.
If I'm up on a ladder and see a loose nail, I *will* thump it
down with the handle of the screwdriver I'm carrying, but if
I find myself doing it a lot I'll climb back down for a hammer.

ymuntyan · Mar 24, 2008

"Yep, people do that sort of thing." But in the rather
bare form you've presented, I think you'll find that you've
just traded one maintenance headache for another. You no
longer need to revisit all the switch statements when adding
a new type, but you *do* have to revisit all the existing
types when adding a new method. Also, as the repertoire of
methods grows, so too will the size of the Object struct and
the tedium of initializing each instance's pointers.

Some people have attacked both these problems by adding
one level of indirection. Instead of dragging all the function
pointers around in every instance, they carry just one pointer
to a single struct that contains all the method pointers for
the "class." For example,

[snip example code]

It is possible to use more elaborate schemes, too, which
may make it easier to manage "inheritance," dynamic creation of
new "classes," and so on.

However, I'd recommend against it. It's a kludge -- I don't
think there's any way to escape a lot of `void*' pointers and/or
a lot of casts, both of which are detrimental to type safety.
It's a kludge that was necessary a few decades ago, when C++
was a mish-mash of mutually incompatible semi-implementations
and Java hadn't been invented. But the toolchains are a lot
better than in the bad old days: C++ implementations are more
widespread and have converged toward their Standard, Java may
have been over-hyped at first but turns out to be quite useful,
and there are probably other languages that would meet most
reasonable sets of requirements.

Doing O-O in C is not putting lipstick on a pig -- C is not
a pig -- but it's a lot like teaching a pig to play the violin:
Just barely possible, perhaps, but not pleasing.

Very possible and very effective, see http://gnome.org.
But is often hard and unpleasant to deal with, certainly.

Yevgen

Peter Michaux · Mar 24, 2008

[snip]

Part of my exercise is to learn how to program better in C.

Click to expand...

If you are com ing from a Java background

Not quite but I have definitely used OOP languages.

it is natural to want to emulate Java constructs in C.
However this is a bad idea. C isn't a crippled Java,
though it can be used as such.

I don't particularly want to program C as if it is another language.
One of the first things I was told about programming is "You can
program Fortran in any language" and it was meant to be a warning.

Think functions, and when you specify a
function think, "how can I make this function reusable?". Even if in fact
you don't reuse the fucntion, your code will improve as a result.

I understand completely appreciate this sentiment.

I am thinking of a few situations that involve iterating over a
structure (e.g. array, linked list, tree) and doing "something" to
each node. The structure may be a parse tree where each node is a
struct representing a different kind of language statement. The
structure may be a list of allocated objects that each require a
different destroy function to be called for garbage collection.

The solution to this problem I see in books and tutorials[1] is that
the iterator examines some "type" member of each node and then calls
the appropriate function. This retains type safety but when a new node
type is added to the program, adjustments need to be made in many
somewhat unrelated places. This is the switch statement problem I
dread and immediately makes me think the code was written in the
1970s. Perhaps an unfair judgement but using global variables to pass
multiple return values makes me think the same thing.

The solution that I've come to like working with OOP languages is that
the node itself knows the type-appropriate function that needs to be
called. This means when the garbage collector is zipping through a
list of garbage, it just calls each node's destroy function pointer
member. If a new node type is added to the program the garbage
collector does not need modification. That is an important positive to
me.

I could choose to use C++ for my experiments and use a blend of OOP
and procedural programming at will but using C++ doesn't appeal for
several reasons. I could list the reasons but they really boil down to
one: I've never liked big languages, I currently don't like big
languages, I doubt I will ever like big languages....or is that three
reasons? ;-)

How would you handle these heterogeneous structures in C?

Peter

[1] an example I recently read showing the switch problem is in the
following Lex/Yacc tutorial http://epaperpress.com/lexandyacc/index.html

santosh · Mar 24, 2008

Peter said:
[snip]

Part of my exercise is to learn how to program better in C.

Click to expand...

If you are com ing from a Java background

Click to expand...

Not quite but I have definitely used OOP languages.

it is natural to want to emulate Java constructs in C.
However this is a bad idea. C isn't a crippled Java,
though it can be used as such.

Click to expand...

I don't particularly want to program C as if it is another language.
One of the first things I was told about programming is "You can
program Fortran in any language" and it was meant to be a warning.

Think functions, and when you specify a
function think, "how can I make this function reusable?". Even if in
fact you don't reuse the fucntion, your code will improve as a
result.

Click to expand...

I understand completely appreciate this sentiment.

I am thinking of a few situations that involve iterating over a
structure (e.g. array, linked list, tree) and doing "something" to
each node. The structure may be a parse tree where each node is a
struct representing a different kind of language statement. The
structure may be a list of allocated objects that each require a
different destroy function to be called for garbage collection.

How would you handle these heterogeneous structures in C?

Why not add a "constructor" function for each node type that fills in
the address of all the functions relevant for that node into a
predefined set of node members, which could then be called by the
iterating code. Instead of function pointers you could have "wrapper"
functions that provide a uniform interface to outside code and call
the "real" functions inside them. For the case of functions that do
resource deallocation, they could be collected under a
predefined "destructor" function.

Malcolm McLean · Mar 24, 2008

Peter Michaux said:
I am thinking of a few situations that involve iterating over a
structure (e.g. array, linked list, tree) and doing "something" to
each node. The structure may be a parse tree where each node is a
struct representing a different kind of language statement. The
structure may be a list of allocated objects that each require a
different destroy function to be called for garbage collection.

How would you handle these heterogeneous structures in C?

I'd say, probably, the program is fundamentally misdesigned for C. A parse
tree definitely makes sense in a language such as Lisp. In C, the parsing is
better expressed as a hierarchy of mutually recursive functions.

However it may be that this solution isn't viable for various reasons. You
can have a void pointer with a type field, and switch on the type to do
various operations. But you are working against the language.

Another solution is to make the structure more general, even if that means
wasting fields. Then you have just one destructor. You may still need a
switch on the type, but it is less overwhelming.

Don't try to code generic linked lists or trees. In C it is easier to
hardcode the logic for each example. Whether this says something good or
something bad about the C language is a moot point.

Flash Gordon · Mar 24, 2008

Malcolm McLean wrote, On 24/03/08 18:10:

Don't try to code generic linked lists or trees. In C it is easier to
hardcode the logic for each example. Whether this says something good or
something bad about the C language is a moot point.

Generic linked lists can be done in C. I have access to such a library
myself (not written by me and not owned by me so I can't publish it) and
it is working fine for a number of completely unrelated lists. Maybe one
day I will write my own based on my idea of how to do it.

Peter Michaux · Mar 24, 2008

I'd say, probably, the program is fundamentally misdesigned for C. A parse
tree definitely makes sense in a language such as Lisp. In C, the parsing is
better expressed as a hierarchy of mutually recursive functions.

Are there not thousands of language compilers and interpreters that
are written in C and produce parse trees? Are they all fundamentally
misdesigned for C?

Peter

santosh · Mar 24, 2008

Flash said:
Malcolm McLean wrote, On 24/03/08 18:10:

Generic linked lists can be done in C. I have access to such a library
myself (not written by me and not owned by me so I can't publish it)
and it is working fine for a number of completely unrelated lists.
Maybe one day I will write my own based on my idea of how to do it.

Yes. See:

<http://sglib.sourceforge.net/>
<http://mij.oltrelinux.com/devel/simclist/>
<http://home.earthlink.net/~jrhay/src/wwwsrc/src.html>

Malcolm McLean · Mar 24, 2008

Peter Michaux said:
Are there not thousands of language compilers and interpreters that
are written in C and produce parse trees? Are they all fundamentally
misdesigned for C?

I don't know. Though I've written a Basic interpreter it's not really my
area of expertise. Certainly not all interpreters or compilers produce
objects called "parse trees". You need a parse tree of course, but the
recursive structure can be in the code rather than in the output.

There's nothing wrong with trees in C. The problem is when the items are all
tagged so that you have de facto polymorphism. That can become very
difficult to deal with, and the real answer is not to create such a
structure in the first place.

CBFalconer · Mar 24, 2008

Peter said:
Are there not thousands of language compilers and interpreters
that are written in C and produce parse trees? Are they all
fundamentally misdesigned for C?

Take a look at hashlib. It was originally designed with the idea
of use in compiler symbol table generation (note the ease of
opening, closing and nesting tables). Purely standard C, so no
portability problems. See:

<http://cbfalconer.home.att.net/download/>

jaysome · Mar 25, 2008

Take a look at hashlib. It was originally designed with the idea of use
in compiler symbol table generation (note the ease of opening, closing
and nesting tables). Purely standard C, so no portability problems.

I think hashlib has portability problems, insofar as it does not compile
without error with gcc on 64-bit Ubuntu Linux 7.10.

jaysome@ubuntu:~/downloads/hashlib$ uname -a
Linux ubuntu 2.6.22-14-generic #1 SMP Thu Jan 31 23:33:13 UTC 2008 x86_64
GNU/Linux

jaysome@ubuntu:~/downloads/hashlib$ make
cc -W -Wall -ansi -pedantic -O2 -gstabs+ -c -o hashlib.o hashlib.c
cc -W -Wall -ansi -pedantic -O2 -gstabs+ -c -o cokusmt.o cokusmt.c
In file included from cokusmt.c:59:
cokusmt.h:9:5: error: #error System long word size not suitable for
cokusMT
make: *** [cokusmt.o] Error 1

In <limits.h>

/* Maximum value an `unsigned long int' can hold. (Minimum is 0.) */
# if __WORDSIZE == 64
# define ULONG_MAX 18446744073709551615UL
# else
# define ULONG_MAX 4294967295UL
# endif

__WORDSIZE is 64 on my machine, and this is acceptable by both the C90
and C99 C standards.

Best regards

santosh · Mar 25, 2008

jaysome said:
Take a look at hashlib. It was originally designed with the idea of
use in compiler symbol table generation (note the ease of opening,
closing
and nesting tables). Purely standard C, so no portability problems.

Click to expand...

I think hashlib has portability problems, insofar as it does not
compile without error with gcc on 64-bit Ubuntu Linux 7.10.

jaysome@ubuntu:~/downloads/hashlib$ uname -a
Linux ubuntu 2.6.22-14-generic #1 SMP Thu Jan 31 23:33:13 UTC 2008
x86_64 GNU/Linux

jaysome@ubuntu:~/downloads/hashlib$ make
cc -W -Wall -ansi -pedantic -O2 -gstabs+ -c -o hashlib.o hashlib.c
cc -W -Wall -ansi -pedantic -O2 -gstabs+ -c -o cokusmt.o cokusmt.c
In file included from cokusmt.c:59:
cokusmt.h:9:5: error: #error System long word size not suitable for
cokusMT
make: *** [cokusmt.o] Error 1

In <limits.h>

/* Maximum value an `unsigned long int' can hold. (Minimum is 0.) */
# if __WORDSIZE == 64
# define ULONG_MAX 18446744073709551615UL
# else
# define ULONG_MAX 4294967295UL
# endif

__WORDSIZE is 64 on my machine, and this is acceptable by both the C90
and C99 C standards.

Well the README file that comes with hashlib says that "cokusmt.c" and
it's associated header are purely to ensure that regression test will
function on any system, though as you say, the files themselves do not
compile on systems where ULONG_MAX is not 4294967295.

Also the README file notes that under DJGPP, Cygwin or Linux the only
command needed to compile the package should be:

make hashlib

However this fails over here with the following error message:

$ make hashlib
cc -W -Wall -ansi -pedantic -O2 -gstabs+ -c -o hashlib.o hashlib.c
cc hashlib.o -o hashlib
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../lib/crt1.o: In function
`_start':../sysdeps/i386/elf/start.S:115: undefined reference to `main'
collect2: ld returned 1 exit status
make: *** [hashlib] Error 1
$

I think his Makefile is specific to DJGPP/Windows.

santosh · Mar 25, 2008

santosh said:
jaysome said:

Peter Michaux wrote:

I am thinking of a few situations that involve iterating over a
structure (e.g. array, linked list, tree) and doing "something"
to each node. The structure may be a parse tree where each node
is a struct representing a different kind of language statement.
The structure may be a list of allocated objects that each
require a different destroy function to be called for garbage
collection.

How would you handle these heterogeneous structures in C?

I'd say, probably, the program is fundamentally misdesigned for C.
A parse tree definitely makes sense in a language such as Lisp. In
C, the parsing is better expressed as a hierarchy of mutually
recursive functions.

Are there not thousands of language compilers and interpreters that
are written in C and produce parse trees? Are they all
fundamentally misdesigned for C?

Take a look at hashlib. It was originally designed with the idea of
use in compiler symbol table generation (note the ease of opening,
closing
and nesting tables). Purely standard C, so no portability problems.

Click to expand...

I think hashlib has portability problems, insofar as it does not
compile without error with gcc on 64-bit Ubuntu Linux 7.10.

jaysome@ubuntu:~/downloads/hashlib$ uname -a
Linux ubuntu 2.6.22-14-generic #1 SMP Thu Jan 31 23:33:13 UTC 2008
x86_64 GNU/Linux

jaysome@ubuntu:~/downloads/hashlib$ make
cc -W -Wall -ansi -pedantic -O2 -gstabs+ -c -o hashlib.o hashlib.c
cc -W -Wall -ansi -pedantic -O2 -gstabs+ -c -o cokusmt.o cokusmt.c
In file included from cokusmt.c:59:
cokusmt.h:9:5: error: #error System long word size not suitable for
cokusMT
make: *** [cokusmt.o] Error 1

In <limits.h>

/* Maximum value an `unsigned long int' can hold. (Minimum is 0.)
*/
# if __WORDSIZE == 64
# define ULONG_MAX 18446744073709551615UL
# else
# define ULONG_MAX 4294967295UL
# endif

__WORDSIZE is 64 on my machine, and this is acceptable by both the
C90 and C99 C standards.

Click to expand...

Well the README file that comes with hashlib says that "cokusmt.c" and
it's associated header are purely to ensure that regression test will
function on any system, though as you say, the files themselves do not
compile on systems where ULONG_MAX is not 4294967295.

Also the README file notes that under DJGPP, Cygwin or Linux the only
command needed to compile the package should be:

make hashlib

However this fails over here with the following error message:

$ make hashlib
cc -W -Wall -ansi -pedantic -O2 -gstabs+ -c -o hashlib.o hashlib.c
cc hashlib.o -o hashlib
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../lib/crt1.o: In function
`_start':../sysdeps/i386/elf/start.S:115: undefined reference to
`main' collect2: ld returned 1 exit status
make: *** [hashlib] Error 1
$

I think his Makefile is specific to DJGPP/Windows.

Also for a package that is frequently advertised as being pure ISO C, he
should probably rethink the following statement in line 388 of
hashtest.c:

fflush(stdin);

jaysome · Mar 25, 2008

santosh said:
santosh said:

jaysome said:

On Mon, 24 Mar 2008 17:46:01 -0500, CBFalconer wrote:

Peter Michaux wrote:

I am thinking of a few situations that involve iterating over a
structure (e.g. array, linked list, tree) and doing "something" to
each node. The structure may be a parse tree where each node is a
struct representing a different kind of language statement. The
structure may be a list of allocated objects that each require a
different destroy function to be called for garbage collection.

How would you handle these heterogeneous structures in C?

I'd say, probably, the program is fundamentally misdesigned for C.
A parse tree definitely makes sense in a language such as Lisp. In
C, the parsing is better expressed as a hierarchy of mutually
recursive functions.

Are there not thousands of language compilers and interpreters that
are written in C and produce parse trees? Are they all fundamentally
misdesigned for C?

Take a look at hashlib. It was originally designed with the idea of
use in compiler symbol table generation (note the ease of opening,
closing
and nesting tables). Purely standard C, so no portability problems.

I think hashlib has portability problems, insofar as it does not
compile without error with gcc on 64-bit Ubuntu Linux 7.10.

jaysome@ubuntu:~/downloads/hashlib$ uname -a Linux ubuntu
2.6.22-14-generic #1 SMP Thu Jan 31 23:33:13 UTC 2008 x86_64 GNU/Linux

jaysome@ubuntu:~/downloads/hashlib$ make cc -W -Wall -ansi -pedantic
-O2 -gstabs+ -c -o hashlib.o hashlib.c cc -W -Wall -ansi -pedantic
-O2 -gstabs+ -c -o cokusmt.o cokusmt.c In file included from
cokusmt.c:59:
cokusmt.h:9:5: error: #error System long word size not suitable for
cokusMT
make: *** [cokusmt.o] Error 1

In <limits.h>

/* Maximum value an `unsigned long int' can hold. (Minimum is 0.) */
# if __WORDSIZE == 64
# define ULONG_MAX 18446744073709551615UL # else
# define ULONG_MAX 4294967295UL
# endif

__WORDSIZE is 64 on my machine, and this is acceptable by both the C90
and C99 C standards.

Click to expand...

Well the README file that comes with hashlib says that "cokusmt.c" and
it's associated header are purely to ensure that regression test will
function on any system, though as you say, the files themselves do not
compile on systems where ULONG_MAX is not 4294967295.

Also the README file notes that under DJGPP, Cygwin or Linux the only
command needed to compile the package should be:

make hashlib

However this fails over here with the following error message:

$ make hashlib
cc -W -Wall -ansi -pedantic -O2 -gstabs+ -c -o hashlib.o hashlib.c
cc hashlib.o -o hashlib
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../lib/crt1.o: In function
`_start':../sysdeps/i386/elf/start.S:115: undefined reference to `main'
collect2: ld returned 1 exit status make: *** [hashlib] Error 1
$

I think his Makefile is specific to DJGPP/Windows.

Click to expand...

Also for a package that is frequently advertised as being pure ISO C, he
should probably rethink the following statement in line 388 of
hashtest.c:

fflush(stdin);

Good catch.

I suspect that, based on the code:

if (0 == (i % 20000)) {
printf("\r%lu inserted", i);
fflush(stdin);
}

it should be:

fflush(stdio);

I recall reading a thread in here recently concerning whether or
not debuggers were useful. I recall that some in here said that reviewing
the source code could take the place of using a debugger. (In my
experience, that's bullshit--there is no substitute for a debugger).

It was as if they were saying that the source code could and would be
reviewed, and that by reviewing the source code, some/all bugs would be
uncovered (I seem to recall the author of hashlib even making such a
statement to that effect).

Apparently this/that was not the case in this particular instance.

The prevalent epitome of that discussion is the more ironic when one
considers that the use of "fflush(stdin);" is promptly identified as
undefined behavior in this newsgroup (correctly so).

But some in this newsgroup go on to claim that undefined behavior is
undefined behavior--there is no gray area--and furthermore that undefined
behavior (in any form) can smoke your hard drive or result in other forms
of nasty behavior, including but not limited to starting WWIII.

What's perhaps most ironic is that there is a lot of truth to this, but
not if you're running one of the state-of-the-art OSes like Linux or XP
or Vista or Mac OS X.

What's probably gonna do us all in is a C program written by someone that
works for a government that produces undefined behavior and runs on
something like Mac OS 9 or Windows 98 or some other crappy OS that
requires the honest people to be honest, and there were no safeguards in
place to insure that the honest people were indeed honest, despite their
best intentions.

God have mercy on our souls when C undefined behavior smokes us all.

Adding adressing of IPv6 to program	1	Feb 16, 2023
Problem with enums, const char* and structs	9	Oct 31, 2012
what is wrong with the following very short program	19	Feb 10, 2014
problem with memory allocation and structs	5	Dec 4, 2011
In C, the longest palindromic subsequence multithread exists	0	Nov 23, 2022
Issue with textbox script?	0	Sep 5, 2022
iterators	10	Jul 8, 2013
Help on understanding Structs	26	Nov 17, 2005

polymorphic structs with "methods"

Peter Michaux

Eric Sosman

Peter Michaux

Chris Thomasson

Ben Bacarisse

Malcolm McLean

Eric Sosman

ymuntyan

Peter Michaux

santosh

Malcolm McLean

Flash Gordon

Peter Michaux

santosh

Malcolm McLean

CBFalconer

jaysome

santosh

santosh

jaysome

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads