Compiling Data Files Into a Program

C

cppaddict

Let's say you want to implement a Dictionary class, which contains a
vector of DictionaryEntry. Assume each DictionaryEntry has two
members, a word and a definition.

Now assume your program needs to create a Dictionary *object* to be
populated with values that come from a text file with a format like
this:

<dict.txt>

APPLE
a fruit

ANT
an insect

....etc...

</dict.txt>

Clearly it would not be hard to write a parser that went through the
text file and populated the object. This, however, makes the program
depenedent on an uncompiled text file, which could be a problem if,
eg, the words and definitions were all top secret.

One solution which I don't like is to write a program that converts
the text file into a .cpp file which, in turn, defines the dictionary
object you'll need and populates it. The result of the conversion
might look like:

<dict.cpp>
Dictionary SECRET_DICTIONARY;

DictionaryEntry E1("APPLE","a fruit");
SECRET_DICTIONARY.addEntry(E1);

DictionaryEntry E2("ANT","an insect");
SECRET_DICTIONARY.addEntry(E2);

....etc...
</dict.cpp>

Then your client program could #include dict.cpp and use
SECRET_DICTIONARY as needed. Of course, this requires you to:

1. Write, compile, and run the conversion program to produce dict.cpp
2. #include dict.cpp in your program, and then compile that.

Thus a two step compilation process. Is there a better way to handle
this situation?

Thanks for any suggestions,
cpp
 
J

Joe Laughlin

cppaddict said:
Let's say you want to implement a Dictionary class, which
contains a vector of DictionaryEntry. Assume each
DictionaryEntry has two members, a word and a definition.

Now assume your program needs to create a Dictionary
*object* to be populated with values that come from a
text file with a format like this:

<dict.txt>

APPLE
a fruit

ANT
an insect

...etc...

</dict.txt>

Clearly it would not be hard to write a parser that went
through the text file and populated the object. This,
however, makes the program depenedent on an uncompiled
text file, which could be a problem if, eg, the words and
definitions were all top secret.

One solution which I don't like is to write a program
that converts the text file into a .cpp file which, in
turn, defines the dictionary object you'll need and
populates it. The result of the conversion might look
like:

<dict.cpp>
Dictionary SECRET_DICTIONARY;

DictionaryEntry E1("APPLE","a fruit");
SECRET_DICTIONARY.addEntry(E1);

DictionaryEntry E2("ANT","an insect");
SECRET_DICTIONARY.addEntry(E2);

...etc...
</dict.cpp>

Then your client program could #include dict.cpp and use
SECRET_DICTIONARY as needed. Of course, this requires
you to:

1. Write, compile, and run the conversion program to
produce dict.cpp
2. #include dict.cpp in your program, and then compile
that.

Thus a two step compilation process. Is there a better
way to handle this situation?

Thanks for any suggestions,
cpp

Encrypt the text file that contains the definitions.
 
D

Dan Mills

cppaddict wrote:

Clearly it would not be hard to write a parser that went through the
text file and populated the object. This, however, makes the program
depenedent on an uncompiled text file, which could be a problem if,
eg, the words and definitions were all top secret.

In that case it would be better to associate an MD5 hash of the word with a
definition as otherwise anyone running strings on your executable would
have the word available. I am not sure that there is much you can do about
the definitionss themselves but the keyword you can protect with a one way
hash.
One solution which I don't like is to write a program that converts
the text file into a .cpp file which, in turn, defines the dictionary
object you'll need and populates it. The result of the conversion
might look like:

<Snip>

Well if you are going for something that is actually secret then having the
plain text converted on a development machine it never leaves would surely
be a good thing?

Sure you have to write the utility, but that is not that hard, a simple sed
or awk script springs to mind. It may even be doable with 'tr' (not sure).
Then your client program could #include dict.cpp and use

Why not link dict.o, it may be that it is easier on your platform to produce
a object file from a flat text file then it is to produce C++ and even if
C++ is the intermediate target, I would still not include it directly.
extern and let the linker sort them out!
SECRET_DICTIONARY as needed. Of course, this requires you to:

1. Write, compile, and run the conversion program to produce dict.cpp
2. #include dict.cpp in your program, and then compile that.

Thus a two step compilation process. Is there a better way to handle
this situation?

What is the problem with a 2 step process, that is why make exists.

dictionary.cpp: dictionary.csv script.awk
awk .... whatever

dictionary.o: dictionary.cpp
CC dictionary.cpp -odictionary.o (or whatever)

Thus any change to the dictionary or to the awk script that produces it will
result in dictionary.cpp being regenerated and then in dictionary.o being
regenerated followed by (I assume) a relink.

There are (in most cases) highly platform dependent ways to include a data
file as an object at link time, but they tend to be non portable.
Depending on your platform, I would look at the man pages for binutils.

I would also note that if I can read the executable I can probably find your
word list even if it is linked in as a object. Also, any platform dependent
trick like this will probably result in (at best) a C style string which is
the entire contents of the file.

In general I would put the data file in whatever location your OS has for
platform independent data files, and possibly do something clever with the
permissions, platform dependent tricks have a nasty tendency to bite you!

Regards, Dan.
 
C

cppaddict

Dan, thanks very much for your comments. A couple points of
clarification:
Why not link dict.o, it may be that it is easier on your platform to produce
a object file from a flat text file then it is to produce C++ and even if
C++ is the intermediate target, I would still not include it directly.
extern and let the linker sort them out!

How would you do this? Assuming the existence of dict.o, what is the
code to extern it?

I would also note that if I can read the executable I can probably find your
word list even if it is linked in as a object. Also, any platform dependent
trick like this will probably result in (at best) a C style string which is
the entire contents of the file.

How is it that someone could read the strings in the executable.
Wouldn't that require difficult reverse engineering?

Thanks again,
cpp
 
D

Daniel T.

cppaddict said:
How is it that someone could read the strings in the executable.
Wouldn't that require difficult reverse engineering?

No, a simple hex editor (which can be downloaded from any number of web
sites for free) can let someone look at your code. What they will see is
a bunch of garbage (where the actual code is) and occasionally some
blocks of text like, "Apple\0" and "a fruit\0".

Joe Laughlin had the best solution. While developing the program, keep
the dict.txt file unencrypted, but write the code so that adding a
decrypter will be a simple matter of changing one line of code. Then,
when everything works and you are ready to ship, add that line, encrypt
the file and test. If everything works fine, hand it over to your
customers. QED.
 
C

cppaddict

No, a simple hex editor (which can be downloaded from any number of web
sites for free) can let someone look at your code. What they will see is
a bunch of garbage (where the actual code is) and occasionally some
blocks of text like, "Apple\0" and "a fruit\0".

Very interesting.... and good to know.

Thanks,
cpp
 
D

Dan Mills

cppaddict said:
Very interesting.... and good to know.


Or even easier, the 'strings' utility will scan any file (as a raw stream of
bytes) and print any string of printable ascii longer then a specified
number of characters (sometimes very revealing on word documents....).

As to how to link an object file with the rest of your code, that is
platform dependent but assuming the object file contains a C (to avoid name
mangling issues) object called dict of type array of dictionary, I would
stick


typedef struct {
char * word;
char * definition;
} dictionary; /* Or whatever your structure is */


extern struct dictionary dict[];

in a suitable header file (say dict.h). This should then allow your code
using dict to compile to object files, then just link them all together in
whatever way your build enviroment provides.

As you have not told us what toolchain you are using it is kind of hard to
be more precise.

Regards, Dan (who is much more comfortable in C then C++ which is why the
above has a C flavour).
 
T

Thomas Matthews

cppaddict said:
Let's say you want to implement a Dictionary class, which contains a
vector of DictionaryEntry. Assume each DictionaryEntry has two
members, a word and a definition.

Now assume your program needs to create a Dictionary *object* to be
populated with values that come from a text file with a format like
this:

<dict.txt>

APPLE
a fruit

ANT
an insect

...etc...

</dict.txt>

Clearly it would not be hard to write a parser that went through the
text file and populated the object. This, however, makes the program
depenedent on an uncompiled text file, which could be a problem if,
eg, the words and definitions were all top secret.

One solution which I don't like is to write a program that converts
the text file into a .cpp file which, in turn, defines the dictionary
object you'll need and populates it. The result of the conversion
might look like:

<dict.cpp>
Dictionary SECRET_DICTIONARY;

DictionaryEntry E1("APPLE","a fruit");
SECRET_DICTIONARY.addEntry(E1);

DictionaryEntry E2("ANT","an insect");
SECRET_DICTIONARY.addEntry(E2);

...etc...
</dict.cpp>

Then your client program could #include dict.cpp and use
SECRET_DICTIONARY as needed. Of course, this requires you to:

1. Write, compile, and run the conversion program to produce dict.cpp
2. #include dict.cpp in your program, and then compile that.

Thus a two step compilation process. Is there a better way to handle
this situation?

Thanks for any suggestions,
cpp

On one of my embedded systems applications, I used an assembly language
file to contain the data. The assembly language has a directive for
including a file as binary data. We just placed a global symbol at the
beginning as well as some alignment operators:
ALIGN 3
EXPORT Dictionary_Data
Dictionary_Data
INCBIN "Dictionary_Data.txt"
END

We then refer the to data using the "extern" C keyword. This was a lot
simpler and less error prone than using the C language array
initialization syntax. The assembly language option also allowed us
to force the data on a given alignment boundary, which was required
by the hardware. The C language cannot guarantee that data is placed
on a given alignment boundary.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book
 
P

puppet_sock

cppaddict said:
Let's say you want to implement a Dictionary class, which contains a
vector of DictionaryEntry. Assume each DictionaryEntry has two
members, a word and a definition.
[snip]

What you are really talking about here is a single table database.
C++ is not specifically a database-ish language. If you really need
a huge big dictionary, you really want a database to go with.
So, you should either:
1) buy a standard database tool and do it in that, or
2) roll your own.

Which you do depends on how much your own time is worth, how much
you expect this system to expand, whether there will ever be another
table, what kind of interogation of the table(s) you want to do, etc.
socks
 
R

Ron Samuel Klatchko

cppaddict said:
One solution which I don't like is to write a program that converts
the text file into a .cpp file which, in turn, defines the dictionary
object you'll need and populates it. The result of the conversion
might look like:

<dict.cpp>
Dictionary SECRET_DICTIONARY;

DictionaryEntry E1("APPLE","a fruit");
SECRET_DICTIONARY.addEntry(E1);

DictionaryEntry E2("ANT","an insect");
SECRET_DICTIONARY.addEntry(E2);

...etc...
</dict.cpp>

Then your client program could #include dict.cpp and use
SECRET_DICTIONARY as needed. Of course, this requires you to:

1. Write, compile, and run the conversion program to produce dict.cpp
2. #include dict.cpp in your program, and then compile that.

Thus a two step compilation process. Is there a better way to handle
this situation?

Actually that method is fine and is used all over the place.

#include is normally handled by the preprocessor. From this point of view,
the preprocessor is simply a conversion program that reads multiple source
files and creates a single file that is handed off to the actual compiler.

lex/yacc (i.e. flex/bison) are even more like what you propose. You write
a lexer or a parser in a special language that more closely models what
you are doing. lex/yacc then convert the file into a C file which is
compiled normally.

So go ahead and do it that way. Give a dictionary file a special extension
so build tools can recognize what it is.

samuel
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top