meta pre-processor

D

David Thompson

| On 20 July, 22:27, (e-mail address removed) wrote:
|
|> I'm starting to design a meta pre-processor for the C language.  It could
|> also be used for other languages, by C is my focus.  The purpose of this
|> pre-processor is to select sections of code from files to be included in
|> other files.  The goal is the ability to code related items such as structs,
|> and functions or macros that use those structs, in a single file, while the
|> actual code needs to be split up into separate .c and .h files, and combined
|> the parts from other code, for compiling.  This is to allow my code to be
|> organized more like modules.
|
|
| I think you've re-discovered Literate Programming

Except that my intentions are not related to documentation specifically,
but rather, code modularization and organization. I suppose that can be
thought of as literate programming to the extent that such organization

Exactly. Literate programming, as proposed (invented?) by Knuth et al,
focusses on presenting the pieces of your program in a 'logical,
understandable (to humans)' fashion, and then automatically reordering
and reformatting as needed to make compilable code.

Unfortunately, AFAICT it takes a lot of skill (even genius) to do this
well, and I believe also some knowledge about -- and thus constraints
on -- your audience. So it hasn't caught on widely.

The same or similar tools can also reformat documentation embedded in
source code. This has proven easier and more popular, and it makes it
more likely to have correct doc which is a good thing -- but it is NOT
LP, even though it is sometimes miscalled that.

What you say sounds much closer to LP, though perhaps not exaclty.
Which is not a bad thing. Knuth's desiderata are generally good.
 
G

Guest

|> Well, maybe I will consider environment variable substitution:
|>
|> ### target code "${FUNCTION_NAME}.c"
|> ### target funproto "header.h"
|
| I would say that it's pointless to expose this division into header files and
| .c files. Your processor can emit complete translation units which can be passed
| straight to a compiler. These would just be temporary files with generated
| names. Or even just pipes.
|
| I suggest you integrate it with a build system; you want to support
| incremental recompiling right? That can't be based on timestamps of your files.

Information from many files needs to be accessed by many OTHER files. So a
single file cannot just be piped to a compiler. It's work product will be
more than just the code for a single compilation unit. Part of what this
is going to be used for will be different ways to split the files in certain
cases. Input file XYZZY.x may produce FOO.c and BAR.c in one case, and may
produce only FOOBAR.c in another case. Part of Makefile will be produced at
the same time to accompdate it.

I already have a hacked-together system, where headers are extracted from
the source files. Yes, it does create time dependency cycles that Makefile
cannot resolve, requiring rebuilding from scratch for the projects I do use
this on. If one source file is modified, the header has to be rebuilt since
it depends on every source file it can be extracted from, and then every
source file depends on the header. You wouldn't want to do this in a project
that takes hours to compile from scratch. But for projects that take just a
few minutes, recompiling from scratch is no big deal and incremental compiles
are overrated. The ability to have modular organization is more beneficial
for programmer time than incremental compiling, anyway.


|> ### begin code
|> ### begin funproto
|> double ${FUNCTION_NAME}( int foo, double bar )
|> ### end funproto
|> {
|> bar += (double) foo;
|> return bar;
|> }
|> ### end code
|
| If we modify the code above, but not the prototype, how will you ensure that
| only the code is recompiled, but not anything which depends on the prototype?

I don't feel the need to do that. Incremental compiling isn't as important
to me.


| If interfaces and implementations are split into separate files, then you have
| the file timestamps.

Product files could have code segments from more than one source file.
Imagine a graph from N source files to M intermediate files. Not all of
the N*M vectors will used. Most won't. But a substantial portion of them
could be.


| You might want some kind of hash based scheme; your preprocessor can detect
| when an interface has changed by a change in the hash.

I presume this means keeping a hash of the previous run of the meta
preprocessor, along with the previous timestamps. If the final result
has the same hash, either use the previous file if it has not been
discarded or has already been replaced, or set the timestamp on the new
file to what was stored with the hash. This could work, and I think it
is a good idea. It would still need to be done after all files are done
since these meta product files can be sourced from more than one file.


| The integrated build system can then only generate the text that needs to be
| generated and compile only what needs to be compiled.

IMHO, incremental compile is overrated. Projects are large as the Linux
kernel could benefit from it, but projects as critical as the Linux kernel
I would be sure to do compile from scratch at the end, anyway. Incremental
is good to test changes to code when the next step the programmer will do
is make more changes. But once it is time even for a beta release or a
release candidate, I always do a full compile from scratch (none of my
projects take more than 10 minutes on a 2.66 GHz quad processor but I would
do this even if it took a few hours ... and that would be a HUGE project).
 
G

Guest

| On 21 Jul 2009 15:21:51 GMT, (e-mail address removed) wrote:
|
|> | On 20 July, 22:27, (e-mail address removed) wrote:
|> |
|> |> I'm starting to design a meta pre-processor for the C language.  It could
|> |> also be used for other languages, by C is my focus.  The purpose of this
|> |> pre-processor is to select sections of code from files to be included in
|> |> other files.  The goal is the ability to code related items such as structs,
|> |> and functions or macros that use those structs, in a single file, while the
|> |> actual code needs to be split up into separate .c and .h files, and combined
|> |> the parts from other code, for compiling.  This is to allow my code to be
|> |> organized more like modules.
|> |
|> |
|> | I think you've re-discovered Literate Programming
|>
|> Except that my intentions are not related to documentation specifically,
|> but rather, code modularization and organization. I suppose that can be
|> thought of as literate programming to the extent that such organization
|
| Exactly. Literate programming, as proposed (invented?) by Knuth et al,
| focusses on presenting the pieces of your program in a 'logical,
| understandable (to humans)' fashion, and then automatically reordering
| and reformatting as needed to make compilable code.
|
| Unfortunately, AFAICT it takes a lot of skill (even genius) to do this
| well, and I believe also some knowledge about -- and thus constraints
| on -- your audience. So it hasn't caught on widely.
|
| The same or similar tools can also reformat documentation embedded in
| source code. This has proven easier and more popular, and it makes it
| more likely to have correct doc which is a good thing -- but it is NOT
| LP, even though it is sometimes miscalled that.
|
| What you say sounds much closer to LP, though perhaps not exaclty.
| Which is not a bad thing. Knuth's desiderata are generally good.

His concept is to do this in a documentary way. It would be more like
prose to describe what is happening, and sufficient clues or hints in
that prose for code to be derived. My concept doesn't create the prose
so much as it is just modularization. I want to be able create a module
of code that adds in a feature to a program, even of the implementation
of that feature really requires integration in a number of places. What
we often see is this coming in the form of a patch that is applied to
multiple files. But that method is troublesome to apply, the way patches
are normally applied, as the entire project progresses towards future
versions. Subsequent changes have to consider the past patch as already
applied. That cripples the ability to make a patch be optional.

One project I am working on is a small image building library for 2D images
based on shape obejcts and their transforms. It will include a few basic
shapes and transformations. I want it to be modular to that more shapes
and transformations can be added, or they can even be removed or replaced.
In this case, it's an object class, too. But I want to be able to put all
of the class definition in one file, even though parts will be exported to
headers for both internal use and API exposure, as well as code to be
compiled. Also doing this to produce documentation such as man pages is
also of interest but I have not yet attempted that (this might approximate
the literate programming concept).
 
N

Nobody

If we modify the code above, but not the prototype, how will you ensure that
only the code is recompiled, but not anything which depends on the prototype?

If interfaces and implementations are split into separate files, then you have
the file timestamps.

You might want some kind of hash based scheme; your preprocessor can detect
when an interface has changed by a change in the hash.

The integrated build system can then only generate the text that needs to be
generated and compile only what needs to be compiled.

When generating source code with a pre-processor, it's common to only
update the output file if it has changed, e.g.:

foo.c: foo.c.in
preprocess < foo.c.in > foo.c.tmp
if [ -f foo.c ] && cmp foo.c foo.c.tmp ; then \
rm foo.c.tmp ; \
else \
mv -f foo.c.tmp foo.c ; \
fi
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,608
Members
45,246
Latest member
softprodigy

Latest Threads

Top