Code Review requested: Postscript Interpreter

luserXtrog · Dec 19, 2010

I feel I need a fresh perspective (or many, ideally)
on my program. It's grown to where I can't quite keep
it all in my head and making new additions has become
a game of "how did I do this elsewhere?"

A zip file containing c and postscript source and a makefile
are available at:
http://code.google.com/p/xpost/downloads/list

I chose a BSD licence because I don't know any better.

There are probably too few comments.
So even comments like "this part needs more comments"
are desirable.

And in more than a few places I'm certainly guilty
of attempting to be cute and/or clever. But to all
appearances, it all works somehow.

A little toc:
arr.c arr.h array operators (functions)
bool.c bool.h boolean operators
color.c color.h color operators
control.c control.h control operators
dic.c dic.h dictionary operators
err.c err.h error handling (c-part)
err.ps error handling (ps-part)
file.c file.h file operators
global.c global.h global variables (yes, I know. bad. sorry)
all the stacks are here
init.c init.h initialize (c-part)
init.ps initialize (ps-part)
lim.h implementation limits
main.c main function and central loop
math.c math.h math operators
matrix.c matrix.h matrix operators
obj.h the object structure
oper.c oper.h the operator interface (function-pointer-objects)
paint.c paint.h just stroke
path.c path.h path construction (no curves)
poly.c poly.h polymorphic operators
squiggle.ps a doodle (no showpage)
sta.c sta.h stack manipulation operators
str.c str.h string operators
tok.c tok.h the token operator (the lexical scanner)
type.c type.h type and attribute operators
vm.c vm.h virtual memory (mmap requires POSIX)
x.c x.h the X11 functions

TIA

Barry Schwarz · Dec 19, 2010

On Sun, 19 Dec 2010 03:44:17 -0800 (PST), luserXtrog

snip

A little toc:
arr.c arr.h array operators (functions)
bool.c bool.h boolean operators
color.c color.h color operators
control.c control.h control operators
dic.c dic.h dictionary operators
err.c err.h error handling (c-part)
err.ps error handling (ps-part)
file.c file.h file operators
global.c global.h global variables (yes, I know. bad. sorry)
all the stacks are here
init.c init.h initialize (c-part)
init.ps initialize (ps-part)
lim.h implementation limits
main.c main function and central loop
math.c math.h math operators

Why would you introduce unnecessary confusion by naming your header
file the same as a standard header?

Gene · Dec 20, 2010

I feel I need a fresh perspective (or many, ideally)
on my program. It's grown to where I can't quite keep
it all in my head and making new additions has become
a game of "how did I do this elsewhere?"

A zip file containing c and postscript source and a makefile
are available at:http://code.google.com/p/xpost/downloads/list

I chose a BSD licence because I don't know any better.

There are probably too few comments.
So even comments like "this part needs more comments"
are desirable.

And in more than a few places I'm certainly guilty
of attempting to be cute and/or clever. But to all
appearances, it all works somehow.

I count a bit over 2,700 sloc. It's typical for an inexperienced
programmer to start losing control of a program at about this size if
there's been no design work or scaffolding before coding. If that's
what's happened, you won't regain control. Get a good book on data
structures and another on software design. Read them. Start over.
Chalk this one up to a learning experience.

BartC · Dec 20, 2010

Gene said:
I count a bit over 2,700 sloc. It's typical for an inexperienced
programmer to start losing control of a program at about this size if
there's been no design work or scaffolding before coding. If that's
what's happened, you won't regain control. Get a good book on data
structures and another on software design. Read them. Start over.
Chalk this one up to a learning experience.

I had a quick look. 2700 loc seems tiny for any sort of interpreter.

My main criticism might be that it is split up into too many files,
averaging just 100 lines per module and 23 lines per header file.

I'm not surprised it's difficult to keep it all together. In fact I'd be
tempted to put it all into one file.

luserXtrog · Dec 20, 2010

This is already the fourth start-over! For a sense of where I started,
you could search for the thread "Embarrassing Spaghetti Code Needs
Stylistic Advice" in clc about a year ago.

I had a quick look. 2700 loc seems tiny for any sort of interpreter.

Well it's still incomplete. I imagine the size will more than double
by the time I get all the standard operators finished.

My main criticism might be that it is split up into too many files,
averaging just 100 lines per module and 23 lines per header file.

I'm not surprised it's difficult to keep it all together. In fact I'd be
tempted to put it all into one file.

The first version was a single file. Then I read somewhere that
source files should be no more than 200-300 lines (Art of Unix
Programming, maybe?). Thereafter, the single file began to seem
unweildy. So when I started the first rewrite I tried to keep things
smaller.

With version 3, I tried partitioning along "logical groupings" of
functions (mostly following the categories from the Postscript
manual itself). With this fourth try, I've tried to strictly
follow the categories from the manual and let the file sizes take
care of themselves. Learning how to use ctags has made editing
multiple files almost as easy the single file was.

Adding a new operator is probably the most troublesome part, lately.
It requires an operator function in the .c file, a declaration in
the .h file, and an entry in the OPERATORS macro (at the end, to
avoid recompiling everything) in oper.h. And the makefile doesn't
really know all the dependencies so init.c has to be touched
so the new operator can get installed in the dictionary at startup.

luser- -droog · Dec 20, 2010

On Sun, 19 Dec 2010 03:44:17 -0800 (PST), luserXtrog

snip

Why would you introduce unnecessary confusion by naming your header
file the same as a standard header?

I must have thought that if I could keep them straight, everyone
else could too. Fallacious perhaps, but all too common among us
introverts.

Oh, and I apologize for forgetting to set followups in the original.
I fully intended to decide which group should house the thread
and set the followup right up until I completely forgot and just
hit send.

Malcolm McLean · Dec 20, 2010

The first version was a single file. Then I read somewhere that
source files should be no more than 200-300 lines (Art of Unix
Programming, maybe?). Thereafter, the single file began to seem
unweildy. So when I started the first rewrite I tried to keep things
smaller.

What's important is that source files should be organised logically,
holding related functions (which you seem to have done), and with
controlled dependencies - the last is the hard part and often there
are forces pulling you both ways.

luserXtrog · Dec 20, 2010

What's important is that source files should be organised logically,
holding related functions (which you seem to have done), and with
controlled dependencies - the last is the hard part and often there
are forces pulling you both ways.

My use of header files got rather convoluted while trying to avoid
circular dependecies. So I stuffed everything in a controlled order
in one place (global.h) and had everything else include that. I'm
hoping if I side-step the issue long enough eventually I'll be
running rings around it.

BartC · Dec 20, 2010

luserXtrog said:
The first version was a single file. Then I read somewhere that
source files should be no more than 200-300 lines (Art of Unix
Programming, maybe?). Thereafter, the single file began to seem
unweildy. So when I started the first rewrite I tried to keep things
smaller.

If your editing tools present all the entities in the project (functions,
variables, types, macros, etc) as a kind of database, then their location in
a specific file, and the number of such files, becomes less important.

With the low-level tools I use, which rely on me *remembering* where
everything is, then too many files can generate real problems.

As it is, the byte-sizes of your modules, are more like the line-counts of
mine...

But the 200-300 lines per source file rule sounds nonsense to me, if you
have to know yourself where everything lives (Unix -- or is it Linux --
source code is supposed to be 4Mloc, which would make it some 16,000 files
according to that rule; a tad unmanageable.)

(In one interpreter project of mine, the core of it occupies three modules:
the interpreter itself (7500 lines, half of that in-line asm), implementing
it's operators (4500 ) and implementing it's built-in functions (3500). And
I still have trouble knowing where a function lives! (This is for bytecode,
so no parsing is needed.) A newer, more ambitious project however averages
1300 lines per file, but is in early stages so that figure will grow.)

The other thing I noticed is that you have a lot of names starting with "O",
which look a bit like "0" (apart from names which are just "o"); in other
words, a bit strange...

ImpalerCore · Dec 20, 2010

I feel I need a fresh perspective (or many, ideally)
on my program. It's grown to where I can't quite keep
it all in my head and making new additions has become
a game of "how did I do this elsewhere?"

A zip file containing c and postscript source and a makefile
are available at:http://code.google.com/p/xpost/downloads/list

I chose a BSD licence because I don't know any better.

There are probably too few comments.
So even comments like "this part needs more comments"
are desirable.

And in more than a few places I'm certainly guilty
of attempting to be cute and/or clever. But to all
appearances, it all works somehow.

<snip>

I think you're getting to the point where you're going to need a
documentation system. First off, browsing for functionality in a
browser is much easier than grepping files. You forget things like
order of arguments, semantics for elements in a struct, or the meaning
of return values. I recommend spending some time learning and
creating some documentation in Doxygen (or similar) to get a feel for
what's possible. Without a good system of documentation, you will
likely spend lots of additional time rereading code to learn how to
use the functions you've created months ago. Here's an example of
some doxygenated comments from my list library.

\code snippet
/*!
* \struct c_list
* \brief The \c c_list struct is used as the list node for a
* double-linked list.
*/
struct c_list
{
/*!
* \brief This variable references the list node's object, which
* can be a pointer to any type, and may point to a
* dynamically allocated object.
*/
void* object;

/*! \brief This variable links to the previous object in the list.
*/
struct c_list* prev;

/*! \brief This variable links to the next object in the list. */
struct c_list* next;
};

#if defined(C_ALIAS_TYPES)
/*! \brief Alias the <tt>struct c_list</tt> type. */
typedef struct c_list c_list;
#endif

/*!
* \brief Adds a new object at the front of a \c c_list.
* \param list A \c c_list.
* \param object The reference to the new object.
* \return The start of the new \c c_list.
*
* \usage
* \include list/c_list_insert_front_example.c
*
* The example above should display the following.
*
* \code
* A coders haiku
* --------------
* A double linked list
* In the right circumstances
* Points to good design
* \endcode
*/
struct c_list* c_list_insert_front( struct c_list* list, void*
object );

/*!
* \brief Adds a new object at the end of a \c c_list.
* \param list A \c c_list.
* \param object The reference to the new object.
* \return The start of the new \c c_list.
*
* The return value is the start of the new list, which may have
* changed.
*
* Note that \c c_list_insert_back has to traverse the entire list to
* find the end, which is inefficient when adding multiple objects.
* A common idiom to avoid the inefficiency is to insert the objects
* at the front of the list and reverse the list when all the objects
* have been added.
*
* \usage
* \include list/c_list_insert_back_example.c
*
* The example above should display the following.
*
* \code
* A coders haiku
* --------------
* Inserting objects
* End of a very long list
* Extra long coffee break
* \endcode
*/
struct c_list* c_list_insert_back( struct c_list* list, void*
object );
\endcode

\code snippet c_list_insert_front_example.c
#include <stdio.h>
#include <string.h>
#include <VH/common/config.h>
#include <VH/common/macros.h>
#include <VH/common/alloc.h>
#include <VH/common/strops.h>
#include <VH/common/list.h>

int main( void )
{
c_list* haiku = NULL;
c_list* l;
size_t i;
char* s;

char* haiku_strings[] = {
"Points to good design",
"In the right circumstances",
"A double linked list"
};

for ( i = 0; i < C_ARRAY_N( haiku_strings ); ++i )
{
s = c_strdup( haiku_strings );
if ( s ) {
haiku = c_list_insert_front( haiku, s );
}
}

printf( "A coders haiku\n" );
printf( "--------------\n" );
for ( l = haiku; l != NULL; l = l->next ) {
printf( "%s\n", (char*)l->object );
}

c_list_free( haiku, c_free );

return EXIT_SUCCESS;
}
\endcode

In my documentation, I describe the parameters, return values,
semantic details if needed, and provide an example that demonstrates
its usage with expected results.

The drawback of this is that it is a *lot* more work, especially if
you want to create (non-boring) examples that demonstrate something
interesting about the semantics of the function. The payoff is that
you can go back at a later time and grok the function much easier than
having to reread code you wrote months or years ago. And if you make
the time to go through the documentation, any semantic quirks that pop
up (like NULL pointers) will be addressed in the documentation, and
hopefully won't bite you again or someone that follows you. Again let
me re-emphasize, doing what I do is a *lot* of extra work, and may not
be compatible in some work environments.

Second, I think you may want to look at partitioning functionality
into a couple of libraries. Library is the basic component of reuse
in C, and if or when you do something new, the work you put into the
functionality for the postscript parser will be easier to apply to
something else if pertinent pieces are nicely encapsulated in a
library.

That's all I got for now.

Best regards,
John D.

luser- -droog · Dec 21, 2010

<snip>

I think you're getting to the point where you're going to need a
documentation system. First off, browsing for functionality in a
browser is much easier than grepping files. You forget things like
order of arguments, semantics for elements in a struct, or the meaning
of return values. I recommend spending some time learning and
creating some documentation in Doxygen (or similar) to get a feel for
what's possible. Without a good system of documentation, you will
likely spend lots of additional time rereading code to learn how to
use the functions you've created months ago.

Agreed. One of my dreams for the project is to make it a self-
documenting
literate program (the uber-quine) producing a pdf book describing
itself.
But I really should learn some sort of documenting system now to get
the
whole process started.

Here's an example of
some doxygenated comments from my list library.

In my documentation, I describe the parameters, return values,
semantic details if needed, and provide an example that demonstrates
its usage with expected results.

I've tried to avoid the need for this level of detail in comments
by building, as directly as I could, a mapping between the published
standard and the semantics of the program. Hence parameters and return
values for the O* functions are directly from the Adobe book.
But, of course, that has the drawback that anyone who doesn't own
the book can't make as much sense of the program.

Point taken. Each function should have some description.

The drawback of this is that it is a *lot* more work, especially if
you want to create (non-boring) examples that demonstrate something
interesting about the semantics of the function. The payoff is that
you can go back at a later time and grok the function much easier than
having to reread code you wrote months or years ago. And if you make
the time to go through the documentation, any semantic quirks that pop
up (like NULL pointers) will be addressed in the documentation, and
hopefully won't bite you again or someone that follows you. Again let
me re-emphasize, doing what I do is a *lot* of extra work, and may not
be compatible in some work environments.

Second, I think you may want to look at partitioning functionality
into a couple of libraries. Library is the basic component of reuse
in C, and if or when you do something new, the work you put into the
functionality for the postscript parser will be easier to apply to
something else if pertinent pieces are nicely encapsulated in a
library.

Indeed. I've been trying to build up the graphics functionality as
a library. It's a lot of work just to track down the sources (texts
and journals from 70s-80s), let alone understanding and implementing
the algorithms. (I've lost count of how many times I've read about
the Bresenham line drawing algorithm; I'm still not sure I "get it.")

As for this project, I'm having some trouble envisioning which pieces
should be partitioned off. They all seem so interrelated! The parser
(just a scanner, really; I think there's one point where it recurses
and that's only for scanning literal procedures) has to know about
the object types and how to create each of them.

I think the virtual memory store for composite objects (dictionaries,
arrays,
and strings) might be the best thing to break off first. I've just
discovered
in another thread that my simplistic implementation (with each save-
level
as an anonymous mmap) doesn't duplicate a legacy quirk of the original
Adobe implementation (restoring an earlier save-level doesn't rollback
string contents) which all other interpreters have followed.

So I need to modify the implentation of this part without disturbing
the
rest of the program. "Modularity to the rescue?"

That's all I got for now.

Much obliged.

luser- -droog · Dec 21, 2010

I feel I need a fresh perspective (or many, ideally)
on my program. It's grown to where I can't quite keep
it all in my head and making new additions has become
a game of "how did I do this elsewhere?"

A zip file containing c and postscript source and a makefile
are available at:http://code.google.com/p/xpost/downloads/list

I chose a BSD licence because I don't know any better.

There are probably too few comments.
So even comments like "this part needs more comments"
are desirable.

And in more than a few places I'm certainly guilty
of attempting to be cute and/or clever. But to all
appearances, it all works somehow.

A little toc:
arr.c arr.h array operators (functions)
bool.c bool.h boolean operators
color.c color.h color operators
control.c control.h control operators
dic.c dic.h dictionary operators
err.c err.h error handling (c-part)
err.ps error handling (ps-part)
file.c file.h file operators
global.c global.h global variables (yes, I know. bad. sorry)
all the stacks are here
init.c init.h initialize (c-part)
init.ps initialize (ps-part)

init.ps defines a procedure just before it begins executing
user statements that shows off its one trick. The code
suggest you can run it two ways, but 'fill' isn't implemented
yet in this version so only 'stroke' works:

634(1)02:44 AM:xpost 0> xpost
initgraphics...found a TrueColor class visual at default depth.
drawWindow()
Xpost Version 0c
PS>{stroke}wheel

I probably should've turned off the 'printf's before uploading.
The text just indicates how the lines are being batched up for
X11.

luser- -droog · Dec 22, 2010

I feel I need a fresh perspective (or many, ideally)
on my program. It's grown to where I can't quite keep
it all in my head and making new additions has become
a game of "how did I do this elsewhere?"

A zip file containing c and postscript source and a makefile
are available at:http://code.google.com/p/xpost/downloads/list

I have uploaded a revised version which includes
commentary for all functions and at the tops of
all files. Should increase legibility.

A little toc:
arr.c arr.h array operators (functions)
bool.c bool.h boolean operators
color.c color.h color operators
control.c control.h control operators
dic.c dic.h dictionary operators
err.c err.h error handling (c-part)
err.ps error handling (ps-part)
file.c file.h file operators
global.c global.h global variables (yes, I know. bad. sorry)
all the stacks are here
init.c init.h initialize (c-part)
init.ps initialize (ps-part)
lim.h implementation limits
main.c main function and central loop
math.c math.h math operators
matrix.c matrix.h matrix operators
obj.h the object structure
oper.c oper.h the operator interface (function-pointer-objects)
paint.c paint.h just stroke
path.c path.h path construction (no curves)
poly.c poly.h polymorphic operators
squiggle.ps a doodle (no showpage)
sta.c sta.h stack manipulation operators
str.c str.h string operators
tok.c tok.h the token operator (the lexical scanner)
type.c type.h type and attribute operators
vm.c vm.h virtual memory (mmap requires POSIX)
x.c x.h the X11 functions

I'm investigating using Cairo for the graphics.
That would eliminate 10 files.

luser- -droog · Dec 29, 2010

Switching to cairo has dramatically accelerated my efforts.
As per suggestions, I have
- reduced the number of files (by consolidating the graphics)
- increased file sizes (by writing more functions)
- added comments for all operators (even those that don't exist)

http://code.google.com/p/xpost/downloads/list

Any advice or comments are greatly appreciated.

One question.
When including a header from a location not in the compiler
search path, is it better to pack the path into the #include
directive, thus

#include <cairo/cairo.h>

or as a command-line option via the makefile, thus

CFLAGS=-I/usr/include/cairo

?

Jorgen Grahn · Dec 29, 2010

["Followup-To:" header set to comp.lang.c.]

.

One question.
When including a header from a location not in the compiler
search path, is it better to pack the path into the #include
directive, thus

#include <cairo/cairo.h>

or as a command-line option via the makefile, thus

CFLAGS=-I/usr/include/cairo

?

IMHO,

#include <cairo/cairo.h>

is the better one. It says the file is cairo/cairo.h, relative to some
base include path. One well-known such path is /usr/include/.

Alternatively, do what the cairo documentation says.

/Jorgen

luser- -droog · Dec 29, 2010

["Followup-To:" header set to comp.lang.c.]

On Wed, 2010-12-29, luser- -droog wrote:

...

One question.
When including a header from a location not in the compiler
search path, is it better to pack the path into the #include
directive, thus

Click to expand...

#include <cairo/cairo.h>

Click to expand...

or as a command-line option via the makefile, thus

?

Click to expand...

IMHO,

#include <cairo/cairo.h>

is the better one. It says the file is cairo/cairo.h, relative to some
base include path. One well-known such path is /usr/include/.

Sadly, it doesn't work. cairo.h can't find its other files.

Alternatively, do what the cairo documentation says.

Yeah. I saw all that pkg-config stuff at
http://cairographics.org/FAQ/ .
I'm not sure why, but I don't like it.
Probably misunderstanding masquerading as fear.

tlvp · Dec 30, 2010

Switching to cairo has dramatically accelerated my efforts.
As per suggestions, I have
- reduced the number of files (by consolidating the graphics)
- increased file sizes (by writing more functions)
- added comments for all operators (even those that don't exist)

http://code.google.com/p/xpost/downloads/list

Any advice or comments are greatly appreciated.

One question.
When including a header from a location not in the compiler
search path, is it better to pack the path into the #include
directive, thus

#include <cairo/cairo.h>

Like Jorgen, I'd prefer the include line above -- but on the grounds that any human being reviewing the code learns at one glance where cairo.h is, and needn't go chasing through the makefile's CFLAGS options.

Cheers, -- tlvp

Nick Keighley · Dec 30, 2010

I count a bit over 2,700 sloc. It's typical for an inexperienced
programmer to start losing control of a program at about this size if
there's been no design work or scaffolding before coding. If that's
what's happened, you won't regain control. Get a good book on data
structures and another on software design. Read them. Start over.
Chalk this one up to a learning experience.

it'snot impossible to regain control. Refactorise madly. Though
personnally I'd do the redesign. Hopefully there will be lots of
utilities to salvage from the first design.

Nick Keighley · Dec 30, 2010

I had a quick look. 2700 loc seems tiny for any sort of interpreter.

well certainly for a Postscript interpreter!

My main criticism might be that it is split up into too many files,
averaging just 100 lines per module and 23 lines per header file.

I'm not surprised it's difficult to keep it all together. In fact I'd be
tempted to put it all into one file.

sailing perilously close to my personnel limits to filesize. If he's
going to implement a complete Postscript interpreter I'd expect it to
get too large for a single file.

Jorgen Grahn · Dec 31, 2010

["Followup-To:" header set to comp.lang.c.]

On Wed, 2010-12-29, luser- -droog wrote:

...

One question.
When including a header from a location not in the compiler
search path, is it better to pack the path into the #include
directive, thus

Click to expand...

#include <cairo/cairo.h>

Click to expand...

or as a command-line option via the makefile, thus

?

Click to expand...

IMHO,

#include <cairo/cairo.h>

is the better one. It says the file is cairo/cairo.h, relative to some
base include path. One well-known such path is /usr/include/.

Click to expand...

Sadly, it doesn't work. cairo.h can't find its other files.

*Checking on my own system, which has this "cairo" thing installed*

Yeah, /usr/include/cairo/cairo.h contains lines like

#include <cairo-features.h>
#include <cairo-deprecated.h>

It's very unclear to me why they chose to do it that way -- it would
have worked if they had written

#include "cairo-features.h"
#include "cairo-deprecated.h"

because that causes the search to include the directory where
<cairo/cairo.h> was found. Perhaps that's a gcc-ism, and they need to
support some compiler which doesn't do it like that? Most other
libraries on my system either use (a) the second form above, or
(b) the equivalent of #include <cairo/cairo-features.h>.

Anyway, then they're really not intending you to say <cairo/cairo.h> and you
cannot do it. It's an unfortunate choice IMHO to *both* make that
decision and install in a non-standard location (/usr/include/cairo/)
because that forces them to ...

Yeah. I saw all that pkg-config stuff at
http://cairographics.org/FAQ/ .
I'm not sure why, but I don't like it.
Probably misunderstanding masquerading as fear.

.... invent strange ways for your build system to find the right
compiler flags. That's what that pkg-config stuff is, nothing more.

% pkg-config --cflags --libs cairo
-D_REENTRANT -I/usr/include/cairo -I/usr/include/freetype2
-I/usr/include/directfb -I/usr/include/libpng12
-I/usr/include/pixman-1 -lcairo

/Jorgen

Code Review requested: Postscript Interpreter

luserXtrog

Barry Schwarz

Gene

BartC

luserXtrog

luser- -droog

Malcolm McLean

luserXtrog

BartC

ImpalerCore

luser- -droog

luser- -droog

luser- -droog

luser- -droog

Jorgen Grahn

luser- -droog

tlvp

Nick Keighley

Nick Keighley

Jorgen Grahn

Members online

Forum statistics

Latest Threads