Code to automatically write C get/set functions from a set of typedef structs?

D

David Mathog

I have been doing a bit of work lately with Windows Enhanced
Metafiles. The structures for these are defined in an include
wingdi.h and for most structures there are corresponding GDI get/set
functions. Imagine for a minute that you wanted to write basic get/
set functions in C from scratch with as little work as possible. It
seems like it should be (largely) possible to do so automatically by
recursively processing the structs. This cannot be an original idea,
has anybody ever seen this done before?

This is what I mean. starting from lines like these in the header
file (pointer variants of struct names removed for clarity):

typedef struct POINTL { LONG x; LONG y; } POINTL;
typedef struct RECTL { LONG left; LONG top; LONG right; LONG bottom; }
RECTL;
typedef struct tagEMR { DWORD iType; DWORD nSize; } EMR;
typedef struct tagEMRARC { EMR emr; RECTL rclBox; POINTL ptlStart;
POINTL ptlEnd; }
EMRARC,EMRARCTO,, EMRCHORD, EMRPIE;

So for instance:

int getEMR(void *ptr, DWORD *iType, DWORD *nsize){
offset=0;
offset+=getDWORD(ptr+offset,iType);
offset+=getDWORD(ptr+offset,nsize);
return(offset);
}

int getEMRPIE(void *ptr, EMR **emr, RECTL **rclBox, POINTL **
ptlStart, POINTL **ptlEnd){
int offset=0;
offset += getEMR(ptr+offset,emr);
offset += getRECTL(ptr+offset,rclBox);
offset += getPOINTL(ptr+offset,ptlstart);
offset += getPOINTL(ptr+offset,ptlstart);
return(offset);
}

The set functions would be pretty similar except with EMR *emr, RECTL
*rclBox and so forth.

I understand that in this particular set of structs there is a further
complication because some fields are the offsets to data (from the
ptr) and not actually the data itself. For those one would recurse
through structs that are read and set up the function with extra "get"
calls and parameters. For instance, the EMRTEXT struct has two such
offsets:

offString offset to the characters
offDx offset to the intercharacter spacing array

and any struct that references an EMRTEXT would automatically add
those parameters to its call list and would add get functions to
retrieve them. However, since the data type associated with the
offset is always
an integer, it would put in something like this for the programmer to
fix up later:

#typedef TYPE_offString /* FIXME *./
#typedef TYPE_offDx /* FIXME */
int getSOMESTRUCT(.....TYPE_offString **offString, TYPE_offDx
**offDx){
....
offset+=getNOTDEFINEDYET(ptr,offString); /* FIXME, may need
count */
offset+=getNOTDEFINEDYET(ptr,offDx); /* FIXME, may need count */
return(offset);
}

Some work would still be necessary, like writing the basic things such
as "getDWORD", but this approach should automate at least 95% of the
work.

Thoughts?
 
G

Guest

I have been doing a bit of work lately with Windows Enhanced
Metafiles. The structures for these are defined in an include
wingdi.h and for most structures there are corresponding GDI get/set
functions. Imagine for a minute that you wanted to write basic get/
set functions in C from scratch with as little work as possible. It
seems like it should be (largely) possible to do so automatically by
recursively processing the structs. This cannot be an original idea,
has anybody ever seen this done before?

I think the Ark used code generation for its original code.
This is what I mean. starting from lines like these in the header
file (pointer variants of struct names removed for clarity):

typedef struct POINTL { LONG x; LONG y; } POINTL;
typedef struct RECTL { LONG left; LONG top; LONG right; LONG bottom; }
RECTL;
typedef struct tagEMR { DWORD iType; DWORD nSize; } EMR;
typedef struct tagEMRARC { EMR emr; RECTL rclBox; POINTL ptlStart;
POINTL ptlEnd; }
EMRARC,EMRARCTO,, EMRCHORD, EMRPIE;

So for instance:

int getEMR(void *ptr, DWORD *iType, DWORD *nsize){
offset=0;
offset+=getDWORD(ptr+offset,iType);
offset+=getDWORD(ptr+offset,nsize);
return(offset);
}

int getEMRPIE(void *ptr, EMR **emr, RECTL **rclBox, POINTL **
ptlStart, POINTL **ptlEnd){
int offset=0;
offset += getEMR(ptr+offset,emr);
offset += getRECTL(ptr+offset,rclBox);
offset += getPOINTL(ptr+offset,ptlstart);
offset += getPOINTL(ptr+offset,ptlstart);
return(offset);
}

The set functions would be pretty similar except with EMR *emr, RECTL
*rclBox and so forth.

I understand that in this particular set of structs there is a further
complication because some fields are the offsets to data (from the
ptr) and not actually the data itself. For those one would recurse
through structs that are read and set up the function with extra "get"
calls and parameters. For instance, the EMRTEXT struct has two such
offsets:

offString offset to the characters
offDx offset to the intercharacter spacing array

and any struct that references an EMRTEXT would automatically add
those parameters to its call list and would add get functions to
retrieve them. However, since the data type associated with the
offset is always
an integer, it would put in something like this for the programmer to
fix up later:

#typedef TYPE_offString /* FIXME *./
#typedef TYPE_offDx /* FIXME */
int getSOMESTRUCT(.....TYPE_offString **offString, TYPE_offDx
**offDx){
...
offset+=getNOTDEFINEDYET(ptr,offString); /* FIXME, may need
count */
offset+=getNOTDEFINEDYET(ptr,offDx); /* FIXME, may need count */
return(offset);
}

Some work would still be necessary, like writing the basic things such
as "getDWORD", but this approach should automate at least 95% of the
work.

Thoughts?

Fowler's "Domain Specific Languages" is a good read
 
I

Ian Collins

I have been doing a bit of work lately with Windows Enhanced
Metafiles. The structures for these are defined in an include
wingdi.h and for most structures there are corresponding GDI get/set
functions. Imagine for a minute that you wanted to write basic get/
set functions in C from scratch with as little work as possible. It
seems like it should be (largely) possible to do so automatically by
recursively processing the structs. This cannot be an original idea,
has anybody ever seen this done before?

If it can be done with C, it invariably has.

Why don't you just define the structures in something other than C and
generate what ever code you want from there?
 
B

BGB

If it can be done with C, it invariably has.

Why don't you just define the structures in something other than C and
generate what ever code you want from there?

actually, processing C code directly and generating code from it is also
possible (and saves the effort of having to manually convert the data
into some other format).


to simplify matters, if it is known that the data will only be of a
certain form or be processed in certain ways, it may not be necessary
even to have a full-language parser, but the code can instead be read
line-by-line or similar, and anything which matches the intended pattern
will be processed (though there is sometimes the issue of making the
matcher general enough that it can deal with common variations, yet
specific enough that it avoids "false positives").


some of this is partly because although the actual C rules for
declarations are fairly hairy, the vast majority of them follow simple
patterns that are fairly easy to detect.


although, I have not done this in particular (parsing structs, and
generating get/set functions), I have done similar: parsing structs and
generating database entries, parsing function declarations and
generating headers, parsing special prototypes or declarations and
generating wrapper functions, ...
 
I

Ian Collins

actually, processing C code directly and generating code from it is also
possible (and saves the effort of having to manually convert the data
into some other format).

True enough. But often the C code is one of the outputs of a
conversion. Database tables are one example where it is easier to
describe the data in a table (I use OpenOffice) and generate all sorts
of outputs from there.

Using a format with readily available tools is a big win.
 
B

BGB

True enough. But often the C code is one of the outputs of a conversion.
Database tables are one example where it is easier to describe the data
in a table (I use OpenOffice) and generate all sorts of outputs from there.

Using a format with readily available tools is a big win.

it depends on the task.

some of my tools generate C code and headers using big text-files as input.


other ones parse C code, or headers, and may spit out more C code or
headers following after any information or embedded commands located
within the source files or headers (often written in a form which is
transparent to the normal C compiler).

some others spit out a database, currently because a database can be
processed more efficiently by later tools (for example, running a full
parser and churning through a bunch of headers can be fairly costly, but
if a database is spit out by the tool, then this can be queried more
efficiently later, or used by the application at run-time).
 
I

Ian Collins

it depends on the task.

some of my tools generate C code and headers using big text-files as input.

But you had to write the tools, didn't you?

One recent project I worked on used XML files as the source of just
about everything including the documentation. All that was required to
generate stuff was a collection of XSLT transforms.
 
N

Nobody

some of this is partly because although the actual C rules for
declarations are fairly hairy, the vast majority of them follow simple
patterns that are fairly easy to detect.

The main problem is the preprocessor. The syntax of a C source file is
little more than a sequence of preprocessor tokens, with no higher-level
structure. The BNF grammar which most people think of as "C syntax"
describes what comes out of the preprocessor, not what goes into it.

But if you just run the code through the preprocessor, its output is going
to contain a lot of stuff that you probably don't want to wrap; you don't
want to generate set/get methods for the members of e.g. FILE, right?

This issue has to be dealt with by anyone who implements e.g. syntax
highlighting or auto-indent. Invariably, the solution is to simply ignore
preprocessor directives and treat macro names as identifiers. If you abuse
the preprocessor, you lose.
 
B

BGB

The main problem is the preprocessor. The syntax of a C source file is
little more than a sequence of preprocessor tokens, with no higher-level
structure. The BNF grammar which most people think of as "C syntax"
describes what comes out of the preprocessor, not what goes into it.

But if you just run the code through the preprocessor, its output is going
to contain a lot of stuff that you probably don't want to wrap; you don't
want to generate set/get methods for the members of e.g. FILE, right?

This issue has to be dealt with by anyone who implements e.g. syntax
highlighting or auto-indent. Invariably, the solution is to simply ignore
preprocessor directives and treat macro names as identifiers. If you abuse
the preprocessor, you lose.

in tools of this sort (where one is processing the code line-by-line and
by matching patterns), the typical practice is to simply ignore the
preprocessor (and thus anything defined in included headers, ...).

yes, the preprocessor can do all manner of things, but the tool doesn't
need to really care, since it only cares about whatever code matches the
patterns it is looking for (ignoring everything else), and it matches
directly against the source text (and if the programmer does something
that renders the pattern invisible to the tool, so be it, the tool wont
see it).

in this case, a lot of these tools don't apply BNFs or parse the full
syntax, but more typically will work by "splitting each line into tokens
and checking against known patterns" (typically taking whitespace into
account as well, ...).


some of these tools then make impositions on the allowed format of C
declarations as well, for example, one of my major ones does not allow
type-names to be spread over multiple tokens (for example, "unsigned
char c;").

instead, there is a common typedef ("typedef unsigned char byte;"), and
all such declarations are written as "byte c;".

some other tools go and interpret such types as if they were built-in,
and other ones just don't care (a type is a type).


after a while, a person may not notice or care too much that they are
working in a restricted subset of the language (except maybe when
importing code from elsewhere and having it blow up the tools due to
deviating from the established rules, leaving the option to either alter
the code or fiddle with the tool).

a recent example of this being adding some code which used function
declarations like "int foo (...)" rather than "int foo(...)", and so
the tool was thrown off by the added space. another case involved
including comments in the arguments list:
int foo(
type arga, //comment
type argb, //comment
...
)
which again confused the tool, causing it to produce broken output (it
didn't strip off comments initially, as comments were used in many cases
to encode commands for the tool).

in both cases, the tool was modified to accept the additional syntax.


in other cases, I do have tools which have full parsers, and this need
to be able to deal with "whatever comes their way".


or such...
 
I

Ian Collins

The main problem is the preprocessor. The syntax of a C source file is
little more than a sequence of preprocessor tokens, with no higher-level
structure. The BNF grammar which most people think of as "C syntax"
describes what comes out of the preprocessor, not what goes into it.

But if you just run the code through the preprocessor, its output is going
to contain a lot of stuff that you probably don't want to wrap; you don't
want to generate set/get methods for the members of e.g. FILE, right?

This issue has to be dealt with by anyone who implements e.g. syntax
highlighting or auto-indent. Invariably, the solution is to simply ignore
preprocessor directives and treat macro names as identifiers. If you abuse
the preprocessor, you lose.

Things are better in modern tools, for example the NetBeans (and
presumably Eclipse) editor is able to correctly highlight conditionally
compiled code blocks. It can also expand macros in line, which is very
cute.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,043
Latest member
CannalabsCBDReview

Latest Threads

Top