polymorphic structs with "methods"

S

santosh

jaysome said:
santosh said:
jaysome wrote:

On Mon, 24 Mar 2008 17:46:01 -0500, CBFalconer wrote:

Peter Michaux wrote:

I am thinking of a few situations that involve iterating over a
structure (e.g. array, linked list, tree) and doing "something"
to each node. The structure may be a parse tree where each node
is a struct representing a different kind of language
statement. The structure may be a list of allocated objects
that each require a different destroy function to be called for
garbage collection.

How would you handle these heterogeneous structures in C?

I'd say, probably, the program is fundamentally misdesigned for
C. A parse tree definitely makes sense in a language such as
Lisp. In C, the parsing is better expressed as a hierarchy of
mutually recursive functions.

Are there not thousands of language compilers and interpreters
that are written in C and produce parse trees? Are they all
fundamentally misdesigned for C?

Take a look at hashlib. It was originally designed with the idea
of use in compiler symbol table generation (note the ease of
opening, closing
and nesting tables). Purely standard C, so no portability
problems.

I think hashlib has portability problems, insofar as it does not
compile without error with gcc on 64-bit Ubuntu Linux 7.10.

jaysome@ubuntu:~/downloads/hashlib$ uname -a Linux ubuntu
2.6.22-14-generic #1 SMP Thu Jan 31 23:33:13 UTC 2008 x86_64
GNU/Linux

jaysome@ubuntu:~/downloads/hashlib$ make cc -W -Wall -ansi
-pedantic
-O2 -gstabs+ -c -o hashlib.o hashlib.c cc -W -Wall -ansi
-pedantic
-O2 -gstabs+ -c -o cokusmt.o cokusmt.c In file included from
cokusmt.c:59:
cokusmt.h:9:5: error: #error System long word size not suitable for
cokusMT
make: *** [cokusmt.o] Error 1

In <limits.h>

/* Maximum value an `unsigned long int' can hold. (Minimum is 0.)
*/
# if __WORDSIZE == 64
# define ULONG_MAX 18446744073709551615UL # else
# define ULONG_MAX 4294967295UL
# endif

__WORDSIZE is 64 on my machine, and this is acceptable by both the
C90 and C99 C standards.

Well the README file that comes with hashlib says that "cokusmt.c"
and it's associated header are purely to ensure that regression test
will function on any system, though as you say, the files themselves
do not compile on systems where ULONG_MAX is not 4294967295.

Also the README file notes that under DJGPP, Cygwin or Linux the
only command needed to compile the package should be:

make hashlib

However this fails over here with the following error message:

$ make hashlib
cc -W -Wall -ansi -pedantic -O2 -gstabs+ -c -o hashlib.o
hashlib.c
cc hashlib.o -o hashlib
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../lib/crt1.o: In
function `_start':../sysdeps/i386/elf/start.S:115: undefined
reference to `main' collect2: ld returned 1 exit status make: ***
[hashlib] Error 1 $

I think his Makefile is specific to DJGPP/Windows.

Also for a package that is frequently advertised as being pure ISO C,
he should probably rethink the following statement in line 388 of
hashtest.c:

fflush(stdin);

:)

Good catch.

I suspect that, based on the code:

if (0 == (i % 20000)) {
printf("\r%lu inserted", i);
fflush(stdin);
}

it should be:

fflush(stdio);

Yes, and he should probably fix his makefile so that it doesn't try to
produce an executable for the 'make hashlib' invocation.

<concerning undefined behaviour>

AFAIK fflush(stdin) does not clear the stdin buffer on at least
glibc/Linux, so if code must use fflush(stdin), it has to be guarded by
#ifdefs, so that it is not compiled in for platforms that do not
implement it.

This is true for UB in general, since many cases of UB are actually
pretty well defined _for_ a particular implementation, but unlike
implementation defined behaviour, there is no requirement that it be
so. Thus portable code must at the very least conditionally compile
code invoking UB only under implementations that do define it as
desired.
 
S

santosh

santosh wrote:

<snip>

Also probably a minor typo in the comments in hashlib.c where in line 50
the author claims to have tested it on his system with 512 GB memory,
which I suppose should be 512 MB instead.
 
S

santosh

santosh said:
santosh wrote:

<snip>

Also probably a minor typo in the comments in hashlib.c where in line
50 the author claims to have tested it on his system with 512 GB
memory, which I suppose should be 512 MB instead.

Another issue. On my system (Linux/glibc) the 'runtests' driver fails.
Here are the details:

$ ./runtests
rm -f hashlib.o cokusmt.o markov.o hashtest.o
make hashtest.exe DEBUG=-DNDEBUG
make[1]: Entering directory `/home/santosh/tmp/hashlib'
cc -W -Wall -ansi -pedantic -O2 -gstabs+ -DNDEBUG -c -o hashlib.o
hashlib.c
cc -W -Wall -ansi -pedantic -O2 -gstabs+ -DNDEBUG -c -o cokusmt.o
cokusmt.c
cc -W -Wall -ansi -pedantic -O2 -gstabs+ -DNDEBUG -c -o hashtest.o
hashtest.c
gcc -o hashtest.exe hashtest.o hashlib.o cokusmt.o
make[1]: Leaving directory `/home/santosh/tmp/hashlib'
make markov.exe DEBUG=-DNDEBUG
make[1]: Entering directory `/home/santosh/tmp/hashlib'
cc -W -Wall -ansi -pedantic -O2 -gstabs+ -DNDEBUG -c -o markov.o
markov.c
markov.c: In function ?getcounts?:
markov.c:451: warning: unused parameter ?xtra?
markov.c: At top level:
markov.c:394: warning: ?showprefix? defined but not used
markov.c: In function ?main?:
markov.c:410: warning: ?litem? may be used uninitialized in this
function
gcc -o markov.exe hashlib.o cokusmt.o markov.o
make[1]: Leaving directory `/home/santosh/tmp/hashlib'
echo Test it all by executing "runtests"
Test it all by executing runtests
testcount value is
test number 1 against test1
../runtests: line 6: ./hashtest: No such file or directory
failed at test 1
$

The mistake here is obvious. The compile commands generate
a 'hashtest.exe' file while the test driver looks for a 'hashtest'
file. So I renamed 'hashtest.exe' to 'hashtest' and ran 'runtests'
again:

$ mv -v ./hashtest.exe ./hashtest
$ ./runtests
rm -f hashlib.o cokusmt.o markov.o hashtest.o
make hashtest.exe DEBUG=-DNDEBUG
make[1]: Entering directory `/home/santosh/tmp/hashlib'
cc -W -Wall -ansi -pedantic -O2 -gstabs+ -DNDEBUG -c -o hashlib.o
hashlib.c
cc -W -Wall -ansi -pedantic -O2 -gstabs+ -DNDEBUG -c -o cokusmt.o
cokusmt.c
cc -W -Wall -ansi -pedantic -O2 -gstabs+ -DNDEBUG -c -o hashtest.o
hashtest.c
gcc -o hashtest.exe hashtest.o hashlib.o cokusmt.o
make[1]: Leaving directory `/home/santosh/tmp/hashlib'
make markov.exe DEBUG=-DNDEBUG
make[1]: Entering directory `/home/santosh/tmp/hashlib'
cc -W -Wall -ansi -pedantic -O2 -gstabs+ -DNDEBUG -c -o markov.o
markov.c
markov.c: In function ?getcounts?:
markov.c:451: warning: unused parameter ?xtra?
markov.c: At top level:
markov.c:394: warning: ?showprefix? defined but not used
markov.c: In function ?main?:
markov.c:410: warning: ?litem? may be used uninitialized in this
function
gcc -o markov.exe hashlib.o cokusmt.o markov.o
make[1]: Leaving directory `/home/santosh/tmp/hashlib'
echo Test it all by executing "runtests"
Test it all by executing runtests
testcount value is
test number 1 against test1

test number 2 against test2

test number 3 against test3

test number 4 against test4

Files junk and test4.txt differ
failed at test 4
$

It's obvious that the makefile and the test files really must be fixed
so that they do not make use DOS/Windows idiosyncrasies.
 
F

Flash Gordon

jaysome wrote, On 25/03/08 05:56:
On Tue, 25 Mar 2008 10:44:52 +0530, santosh wrote:


Good catch.

I suspect that, based on the code:

if (0 == (i % 20000)) {
printf("\r%lu inserted", i);
fflush(stdin);
}

it should be:

fflush(stdio);

I think you mean
fflush(stdout);
I found this without a debugger ;-)

Chuck, I hope you are going to fix this quickly.
I recall reading a thread in here recently concerning whether or
not debuggers were useful. I recall that some in here said that reviewing
the source code could take the place of using a debugger. (In my
experience, that's bullshit--there is no substitute for a debugger).

So, Santosh, did you find this error in a debugger or by reading the code?
It was as if they were saying that the source code could and would be
reviewed, and that by reviewing the source code, some/all bugs would be
uncovered (I seem to recall the author of hashlib even making such a
statement to that effect).

Some definitely can because I *have* found and reported serious bugs
during code reviews of other peoples code that I have never seen before.
Apparently this/that was not the case in this particular instance.

We need to know how Santosh found the bug before that can be claimed.
The prevalent epitome of that discussion is the more ironic when one
considers that the use of "fflush(stdin);" is promptly identified as
undefined behavior in this newsgroup (correctly so).

But some in this newsgroup go on to claim that undefined behavior is
undefined behavior--there is no gray area--and furthermore that undefined
behavior (in any form) can smoke your hard drive or result in other forms
of nasty behavior, including but not limited to starting WWIII.

It has also been stated that behaviour C leaves undefined can be defined
by another standard or the platform, and *if* you have good reason for
depending on that documented behaviour for what C leaves undefined it
can be acceptable in the code, although it is not code that is topical here.
What's perhaps most ironic is that there is a lot of truth to this, but
not if you're running one of the state-of-the-art OSes like Linux or XP
or Vista or Mac OS X.

<snip>

It can still trash your data. However, most of the hyperbole about what
UB can lead to is deliberately completely OTT and is done to make the
point in a humorous and memorable way.
 
S

santosh

Flash said:
jaysome wrote, On 25/03/08 05:56:

I think you mean
fflush(stdout);
I found this without a debugger ;-)

Chuck, I hope you are going to fix this quickly.


So, Santosh, did you find this error in a debugger or by reading the
code?

I wouldn't call it "reading the code." I was just glancing rapidly
through the files when this one caught my eye, mainly I admit, due to
the excellent syntax highlighting of Vim.

But apart from this one instance Chuck's code seems pretty robust,
though Valgrind reports a few possible memory leaks for a particular
regression test, but that could be the test driver. I haven't looked
more closely.

However, the makefile is tailored towards Windows/DOS and it fails
out-of-the-box here. 'make hashlib' as the author advices as the
recommended method to build the package on all systems, fails. You need
to use plain make. Also a regression test seems to have failed - I
posted this elsethread. The test driver for UNIX systems tries to
invoke a non-existent executable name, which must currently be fixed by
a manual renaming.

These are some minor problems that have appeared so far. I haven't
really used the package nor examined the code closely.

<snip>
 
L

Leslie Sanford

:

How would you handle these heterogeneous structures in C?

Think about what a heterogeneous collection in a language like C++ is like.
It is a collection of pointers to a base class all of the instances have in
common. So to do this in C, you can have a collection of pointers of the
same type; the type represents a part of the functionality represented by
the instances. I will make a long story short. What you will wind up with
will look a lot like COM. You have a pointer to a struct. You ask it if it
supports a certain interface. If so, it returns to you a pointer to a struct
initialized with the functions for that interface and a pointer to the
instance. You store this in a collection.

Untested!

struct ISomeInterface
{
struct ISomeInterface *instance;

/* Member functions... */
};

struct Gear
{
struct ISomeInterface someInterface;

/* Other stuff.. */
};

struct Cog
{
struct ISomeInterface someInterface;

/* Other stuff.. */
};

struct Gear gear;
struct Cog cog;

void gearInitialize(&gear);
void cogInitialize(&cog);

struct ISomeInterface *someInterface1 = gearGetISomeInterface(&gear);
struct ISomeInterface *someInterface2 = cogGetISomeInterface(&cog);

/* put pointers to inteface in collection... */

Again, untested and written in a bit of a rush. Can be made more general,
but it will get you closer to looking something like COM (yuck).

I'm not necessarrily recommending this approach, but if I absolutely had to
do OO in C, it would look something like this.
 
F

Flash Gordon

santosh wrote, On 25/03/08 08:24:
I wouldn't call it "reading the code." I was just glancing rapidly
through the files when this one caught my eye, mainly I admit, due to
the excellent syntax highlighting of Vim.

<snip>

As you did not run the code Jaysome was still wrong to use this as
evidence that a debugger is desirable.
 
C

CBFalconer

santosh said:
.... snip ...

Also for a package that is frequently advertised as being pure
ISO C, he should probably rethink the following statement in line
388 of hashtest.c:

fflush(stdin);

That is a bug. It should read 'stdout', as is obvious on reading
it. Thanks.

See also my answer to jaysome.
 
C

CBFalconer

jaysome said:
CBFalconer said:
Peter Michaux wrote:
.... snip ...

Take a look at hashlib. It was originally designed with the idea of use
in compiler symbol table generation (note the ease of opening, closing
and nesting tables). Purely standard C, so no portability problems.

I think hashlib has portability problems, insofar as it does not compile
without error with gcc on 64-bit Ubuntu Linux 7.10.

jaysome@ubuntu:~/downloads/hashlib$ uname -a
Linux ubuntu 2.6.22-14-generic #1 SMP Thu Jan 31 23:33:13 UTC 2008 x86_64
GNU/Linux

jaysome@ubuntu:~/downloads/hashlib$ make
cc -W -Wall -ansi -pedantic -O2 -gstabs+ -c -o hashlib.o hashlib.c
cc -W -Wall -ansi -pedantic -O2 -gstabs+ -c -o cokusmt.o cokusmt.c
In file included from cokusmt.c:59:
cokusmt.h:9:5: error: #error System long word size not suitable for
cokusMT
make: *** [cokusmt.o] Error 1

In <limits.h>

/* Maximum value an `unsigned long int' can hold. (Minimum is 0.) */
# if __WORDSIZE == 64
# define ULONG_MAX 18446744073709551615UL
# else
# define ULONG_MAX 4294967295UL
# endif

__WORDSIZE is 64 on my machine, and this is acceptable by both the C90
and C99 C standards.

When I revised cokusmet to make it portable to all standard C
systems, I had (and still have) no means of testing it for longs
larger than 32 bits. So I put that error catch in. The idea was
to leave checking it out to the user with such a system.

I don't know that it doesn't work. I just don't know that it does.

As far as general operation of the makefile under Unix/Linux, that
should function. However it may require some revisions to generate
non .exe files. Notice that there is no need to use runtest.bat,
the script runtest (no extension) is available for bash.

The fflush(stdin) should be fflush(stdout). A definite bug.
 
S

santosh

CBFalconer said:
That is a bug. It should read 'stdout', as is obvious on reading
it. Thanks.

See also my answer to jaysome.

Okay, but did you also see one of my posts where it is mentioned
that "test 4" fails here? If you want, I can email you more details and
the files in question.
 
C

CBFalconer

santosh said:
Okay, but did you also see one of my posts where it is mentioned
that "test 4" fails here? If you want, I can email you more details
and the files in question.

No, I missed that one.
 
C

CBFalconer

CBFalconer said:
No, I missed that one.

Somewhere I announced earlier that I hoped to attack these problems
this weekend. However I have come down with the flu, and can't do
much more than moan, groan, and sleep. So it will be at least
another weekend before I get to anything. I invite the public to
generate the fixes.
 
A

Antoninus Twink

Somewhere I announced earlier that I hoped to attack these problems
this weekend. However I have come down with the flu, and can't do
much more than moan, groan, and sleep.

Couldn't have happened to a nicer guy!

Don't rush back to your computer, now...

And fix your damn sig.
 
S

Stephen Sprunk

Peter Michaux said:
/*
I want to have heterogeneous lists but treat all nodes the same
without checking some sort of struct "type" member and then using a
switch statement to call the appropriate function for that type. This
is in an effort to make my code more modular. Updating the switch
statement when a new "type" is added is not appealing. I've been
working on an idea (surely not original) about having polymorphic
struct "objects" that have function pointer members for "methods". My
code example is below and I would like to know where it sits in the
range from "hideous worst practice ever" to "yep people do that sort
of thing". If there are known improvements I could make I would
appreciate any comments, a link to a web page or book title.
*/

This is sometimes done, but if you need to do this extensively, think twice
about whether you really, really need to use C rather than some language
(like C++) that has all of these things built-in.

For an example of how ugly it can get when you try full-on OOP with C, look
at the source for a GTK+ program -- or at GTK+'s source itself.

Part of programming is learning to use the right tool for the job. C makes
a great hammer, but sometimes what you really need is a screwdriver.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

typedef struct _Object {
void (*prettyPrint)(struct _Object *);
void (*destroy)(struct _Object *);
} Object;

/***********************************/

typedef struct _Number {
/* obj member must be first in this struct
for down casts of Number to Object */
Object obj;
double val;
} Number;

void prettyPrintNumber(Object *obj) {
Number *num = (Number *)obj;
printf("the number is: %g\n", num->val);
}

You shouldn't allow any Object to be printed as if it were a Number, only
Numbers.

void prettyPrintNumber(Number *num) {
printf("the number is: %g\n", num->val);
}

This way, absent a bug-hiding cast, you'll get a compile-time error when you
call this the wrong way. Technically it's broken, since you'll end up
assigning it to a "method" pointer that wants an Object*, but that just
means you need a cast there; all pointers-to-struct have the same
representation, so it's safe in practice.
void destroyNumber(Object *obj) {
Number *num = (Number *)obj;
free(num);
}
Number *createNumber(double val) {
Number *num = (Number *)malloc(sizeof(Number));
num->obj.prettyPrint = prettyPrintNumber;
num->obj.destroy = destroyNumber;
num->val = val;
return num;
}

You've already messed up here, according to the general OO model... There
is no constructor or destructor for Object, which you should be chaining to
in the constructor and destructor for Number. (This gets important once you
have multiple layers of inheritance. If you want multiple inheritance too,
it gets even uglier.)

Object *constructObject(Object *obj) {
obj->prettyPrint = prettyPrintObject;
obj->destroy = destroyObject;
}

Object *newObject(void) {
Object *obj = malloc(sizeof (Object));
if (!obj) do_something();
return constructObject(obj);
}

void destroyObject(Object *obj) { }
void prettyPrintObject(Object *obj) { }

Now that you have those, the relevant items for Number:

Number *constructNumber(Number *num, double val) {
constructObject(&num.obj);
num->obj.prettyPrint = prettyPrintNumber; /* cast left out for clarity */
num->obj.destroy = destroyNumber; /* cast left out for clarity */
num->val = val;
return num;
}


Number *newNumber(double val) {
Number *num = malloc(sizeof *num);
if (!num) do_something();
return constructNumber(num);
}

void destroyNumber(number *num) {
/* val will destroy itself */
destroyObject(num->obj);
}

The reason I have broken out "new" and "construct" is that you can't chain
to a parent constructor _that also allocates_ using the above model. I
prefer to have the caller do (de)allocation, but I've kept a "new" form to
make adapting your code easier. There's a different form that makes the obj
member a pointer which solves that (and also solves multiple inheritance,
but introduces other problems with polymorphism...).

All of the above comments apply to your String objects, so I've removed that
code.
/***********************************/

int main(void)
{
printf("hello, world\n");

Object *n = (Object *)createNumber(21);

With my model:

Object *n = (Object *)newNumber(21);

You could also do this:

Number foo; constructNumber(&foo, 21);
Object *n = (Object *) foo;
/* don't need to know n is a Number to print it */
n->prettyPrint(n);

Object *s = (Object *)createString("test");

See above.
s->prettyPrint(s);

n->destroy(n);
s->destroy(s);

Since my destructors above don't actually deallocate:

n->destroy(n); free(n);
s->destroy(s); free(s);

If your Number and String are automatics, you can leave out the call to
free(). Or you might separate destroy() and destruct(), with the former
doing a free() after calling the latter, and the latter chaining up to its
parents' destruct()s.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,233
Latest member
AlyssaCrai

Latest Threads

Top