Compiling Data Files Into a Program

Discussion in 'C++' started by cppaddict, Jun 11, 2004.

  1. cppaddict

    cppaddict Guest

    Let's say you want to implement a Dictionary class, which contains a
    vector of DictionaryEntry. Assume each DictionaryEntry has two
    members, a word and a definition.

    Now assume your program needs to create a Dictionary *object* to be
    populated with values that come from a text file with a format like
    this:

    <dict.txt>

    APPLE
    a fruit

    ANT
    an insect

    ....etc...

    </dict.txt>

    Clearly it would not be hard to write a parser that went through the
    text file and populated the object. This, however, makes the program
    depenedent on an uncompiled text file, which could be a problem if,
    eg, the words and definitions were all top secret.

    One solution which I don't like is to write a program that converts
    the text file into a .cpp file which, in turn, defines the dictionary
    object you'll need and populates it. The result of the conversion
    might look like:

    <dict.cpp>
    Dictionary SECRET_DICTIONARY;

    DictionaryEntry E1("APPLE","a fruit");
    SECRET_DICTIONARY.addEntry(E1);

    DictionaryEntry E2("ANT","an insect");
    SECRET_DICTIONARY.addEntry(E2);

    ....etc...
    </dict.cpp>

    Then your client program could #include dict.cpp and use
    SECRET_DICTIONARY as needed. Of course, this requires you to:

    1. Write, compile, and run the conversion program to produce dict.cpp
    2. #include dict.cpp in your program, and then compile that.

    Thus a two step compilation process. Is there a better way to handle
    this situation?

    Thanks for any suggestions,
    cpp
     
    cppaddict, Jun 11, 2004
    #1
    1. Advertising

  2. cppaddict

    Joe Laughlin Guest

    cppaddict wrote:
    > Let's say you want to implement a Dictionary class, which
    > contains a vector of DictionaryEntry. Assume each
    > DictionaryEntry has two members, a word and a definition.
    >
    > Now assume your program needs to create a Dictionary
    > *object* to be populated with values that come from a
    > text file with a format like this:
    >
    > <dict.txt>
    >
    > APPLE
    > a fruit
    >
    > ANT
    > an insect
    >
    > ...etc...
    >
    > </dict.txt>
    >
    > Clearly it would not be hard to write a parser that went
    > through the text file and populated the object. This,
    > however, makes the program depenedent on an uncompiled
    > text file, which could be a problem if, eg, the words and
    > definitions were all top secret.
    >
    > One solution which I don't like is to write a program
    > that converts the text file into a .cpp file which, in
    > turn, defines the dictionary object you'll need and
    > populates it. The result of the conversion might look
    > like:
    >
    > <dict.cpp>
    > Dictionary SECRET_DICTIONARY;
    >
    > DictionaryEntry E1("APPLE","a fruit");
    > SECRET_DICTIONARY.addEntry(E1);
    >
    > DictionaryEntry E2("ANT","an insect");
    > SECRET_DICTIONARY.addEntry(E2);
    >
    > ...etc...
    > </dict.cpp>
    >
    > Then your client program could #include dict.cpp and use
    > SECRET_DICTIONARY as needed. Of course, this requires
    > you to:
    >
    > 1. Write, compile, and run the conversion program to
    > produce dict.cpp
    > 2. #include dict.cpp in your program, and then compile
    > that.
    >
    > Thus a two step compilation process. Is there a better
    > way to handle this situation?
    >
    > Thanks for any suggestions,
    > cpp


    Encrypt the text file that contains the definitions.
     
    Joe Laughlin, Jun 12, 2004
    #2
    1. Advertising

  3. cppaddict

    Dan Mills Guest

    cppaddict wrote:

    <Snip>

    > Clearly it would not be hard to write a parser that went through the
    > text file and populated the object. This, however, makes the program
    > depenedent on an uncompiled text file, which could be a problem if,
    > eg, the words and definitions were all top secret.


    In that case it would be better to associate an MD5 hash of the word with a
    definition as otherwise anyone running strings on your executable would
    have the word available. I am not sure that there is much you can do about
    the definitionss themselves but the keyword you can protect with a one way
    hash.

    > One solution which I don't like is to write a program that converts
    > the text file into a .cpp file which, in turn, defines the dictionary
    > object you'll need and populates it. The result of the conversion
    > might look like:


    <Snip>

    Well if you are going for something that is actually secret then having the
    plain text converted on a development machine it never leaves would surely
    be a good thing?

    Sure you have to write the utility, but that is not that hard, a simple sed
    or awk script springs to mind. It may even be doable with 'tr' (not sure).

    > Then your client program could #include dict.cpp and use


    Why not link dict.o, it may be that it is easier on your platform to produce
    a object file from a flat text file then it is to produce C++ and even if
    C++ is the intermediate target, I would still not include it directly.
    extern and let the linker sort them out!

    > SECRET_DICTIONARY as needed. Of course, this requires you to:
    >
    > 1. Write, compile, and run the conversion program to produce dict.cpp
    > 2. #include dict.cpp in your program, and then compile that.
    >
    > Thus a two step compilation process. Is there a better way to handle
    > this situation?


    What is the problem with a 2 step process, that is why make exists.

    dictionary.cpp: dictionary.csv script.awk
    awk .... whatever

    dictionary.o: dictionary.cpp
    CC dictionary.cpp -odictionary.o (or whatever)

    Thus any change to the dictionary or to the awk script that produces it will
    result in dictionary.cpp being regenerated and then in dictionary.o being
    regenerated followed by (I assume) a relink.

    There are (in most cases) highly platform dependent ways to include a data
    file as an object at link time, but they tend to be non portable.
    Depending on your platform, I would look at the man pages for binutils.

    I would also note that if I can read the executable I can probably find your
    word list even if it is linked in as a object. Also, any platform dependent
    trick like this will probably result in (at best) a C style string which is
    the entire contents of the file.

    In general I would put the data file in whatever location your OS has for
    platform independent data files, and possibly do something clever with the
    permissions, platform dependent tricks have a nasty tendency to bite you!

    Regards, Dan.
    --
    And on the evening of the first day, the lord said.... LX1, Go!
    And there was light.
    The email address *IS* valid, do not remove the spamblock.
     
    Dan Mills, Jun 12, 2004
    #3
  4. cppaddict

    cppaddict Guest

    Dan, thanks very much for your comments. A couple points of
    clarification:

    >Why not link dict.o, it may be that it is easier on your platform to produce
    >a object file from a flat text file then it is to produce C++ and even if
    >C++ is the intermediate target, I would still not include it directly.
    >extern and let the linker sort them out!


    How would you do this? Assuming the existence of dict.o, what is the
    code to extern it?


    >I would also note that if I can read the executable I can probably find your
    >word list even if it is linked in as a object. Also, any platform dependent
    >trick like this will probably result in (at best) a C style string which is
    >the entire contents of the file.


    How is it that someone could read the strings in the executable.
    Wouldn't that require difficult reverse engineering?

    Thanks again,
    cpp
     
    cppaddict, Jun 12, 2004
    #4
  5. cppaddict

    Daniel T. Guest

    cppaddict <> wrote:

    >>I would also note that if I can read the executable I can probably find your
    >>word list even if it is linked in as a object. Also, any platform dependent
    >>trick like this will probably result in (at best) a C style string which is
    >>the entire contents of the file.

    >
    >How is it that someone could read the strings in the executable.
    >Wouldn't that require difficult reverse engineering?


    No, a simple hex editor (which can be downloaded from any number of web
    sites for free) can let someone look at your code. What they will see is
    a bunch of garbage (where the actual code is) and occasionally some
    blocks of text like, "Apple\0" and "a fruit\0".

    Joe Laughlin had the best solution. While developing the program, keep
    the dict.txt file unencrypted, but write the code so that adding a
    decrypter will be a simple matter of changing one line of code. Then,
    when everything works and you are ready to ship, add that line, encrypt
    the file and test. If everything works fine, hand it over to your
    customers. QED.
     
    Daniel T., Jun 12, 2004
    #5
  6. cppaddict

    cppaddict Guest

    >No, a simple hex editor (which can be downloaded from any number of web
    >sites for free) can let someone look at your code. What they will see is
    >a bunch of garbage (where the actual code is) and occasionally some
    >blocks of text like, "Apple\0" and "a fruit\0".


    Very interesting.... and good to know.

    Thanks,
    cpp
     
    cppaddict, Jun 12, 2004
    #6
  7. cppaddict

    Dan Mills Guest

    cppaddict wrote:

    >>No, a simple hex editor (which can be downloaded from any number of web
    >>sites for free) can let someone look at your code. What they will see is
    >>a bunch of garbage (where the actual code is) and occasionally some
    >>blocks of text like, "Apple\0" and "a fruit\0".

    >
    > Very interesting.... and good to know.



    Or even easier, the 'strings' utility will scan any file (as a raw stream of
    bytes) and print any string of printable ascii longer then a specified
    number of characters (sometimes very revealing on word documents....).

    As to how to link an object file with the rest of your code, that is
    platform dependent but assuming the object file contains a C (to avoid name
    mangling issues) object called dict of type array of dictionary, I would
    stick


    typedef struct {
    char * word;
    char * definition;
    } dictionary; /* Or whatever your structure is */


    extern struct dictionary dict[];

    in a suitable header file (say dict.h). This should then allow your code
    using dict to compile to object files, then just link them all together in
    whatever way your build enviroment provides.

    As you have not told us what toolchain you are using it is kind of hard to
    be more precise.

    Regards, Dan (who is much more comfortable in C then C++ which is why the
    above has a C flavour).

    --
    And on the evening of the first day, the lord said.... LX1, Go!
    And there was light.
    The email address *IS* valid, do not remove the spamblock.
     
    Dan Mills, Jun 13, 2004
    #7
  8. cppaddict wrote:

    > Let's say you want to implement a Dictionary class, which contains a
    > vector of DictionaryEntry. Assume each DictionaryEntry has two
    > members, a word and a definition.
    >
    > Now assume your program needs to create a Dictionary *object* to be
    > populated with values that come from a text file with a format like
    > this:
    >
    > <dict.txt>
    >
    > APPLE
    > a fruit
    >
    > ANT
    > an insect
    >
    > ...etc...
    >
    > </dict.txt>
    >
    > Clearly it would not be hard to write a parser that went through the
    > text file and populated the object. This, however, makes the program
    > depenedent on an uncompiled text file, which could be a problem if,
    > eg, the words and definitions were all top secret.
    >
    > One solution which I don't like is to write a program that converts
    > the text file into a .cpp file which, in turn, defines the dictionary
    > object you'll need and populates it. The result of the conversion
    > might look like:
    >
    > <dict.cpp>
    > Dictionary SECRET_DICTIONARY;
    >
    > DictionaryEntry E1("APPLE","a fruit");
    > SECRET_DICTIONARY.addEntry(E1);
    >
    > DictionaryEntry E2("ANT","an insect");
    > SECRET_DICTIONARY.addEntry(E2);
    >
    > ...etc...
    > </dict.cpp>
    >
    > Then your client program could #include dict.cpp and use
    > SECRET_DICTIONARY as needed. Of course, this requires you to:
    >
    > 1. Write, compile, and run the conversion program to produce dict.cpp
    > 2. #include dict.cpp in your program, and then compile that.
    >
    > Thus a two step compilation process. Is there a better way to handle
    > this situation?
    >
    > Thanks for any suggestions,
    > cpp


    On one of my embedded systems applications, I used an assembly language
    file to contain the data. The assembly language has a directive for
    including a file as binary data. We just placed a global symbol at the
    beginning as well as some alignment operators:
    ALIGN 3
    EXPORT Dictionary_Data
    Dictionary_Data
    INCBIN "Dictionary_Data.txt"
    END

    We then refer the to data using the "extern" C keyword. This was a lot
    simpler and less error prone than using the C language array
    initialization syntax. The assembly language option also allowed us
    to force the data on a given alignment boundary, which was required
    by the hardware. The C language cannot guarantee that data is placed
    on a given alignment boundary.

    --
    Thomas Matthews

    C++ newsgroup welcome message:
    http://www.slack.net/~shiva/welcome.txt
    C++ Faq: http://www.parashift.com/c -faq-lite
    C Faq: http://www.eskimo.com/~scs/c-faq/top.html
    alt.comp.lang.learn.c-c++ faq:
    http://www.raos.demon.uk/acllc-c /faq.html
    Other sites:
    http://www.josuttis.com -- C++ STL Library book
     
    Thomas Matthews, Jun 14, 2004
    #8
  9. cppaddict

    Guest

    cppaddict <> wrote in message news:<>...
    > Let's say you want to implement a Dictionary class, which contains a
    > vector of DictionaryEntry. Assume each DictionaryEntry has two
    > members, a word and a definition.

    [snip]

    What you are really talking about here is a single table database.
    C++ is not specifically a database-ish language. If you really need
    a huge big dictionary, you really want a database to go with.
    So, you should either:
    1) buy a standard database tool and do it in that, or
    2) roll your own.

    Which you do depends on how much your own time is worth, how much
    you expect this system to expand, whether there will ever be another
    table, what kind of interogation of the table(s) you want to do, etc.
    socks
     
    , Jun 14, 2004
    #9
  10. cppaddict <> wrote in message news:<>...
    > One solution which I don't like is to write a program that converts
    > the text file into a .cpp file which, in turn, defines the dictionary
    > object you'll need and populates it. The result of the conversion
    > might look like:
    >
    > <dict.cpp>
    > Dictionary SECRET_DICTIONARY;
    >
    > DictionaryEntry E1("APPLE","a fruit");
    > SECRET_DICTIONARY.addEntry(E1);
    >
    > DictionaryEntry E2("ANT","an insect");
    > SECRET_DICTIONARY.addEntry(E2);
    >
    > ...etc...
    > </dict.cpp>
    >
    > Then your client program could #include dict.cpp and use
    > SECRET_DICTIONARY as needed. Of course, this requires you to:
    >
    > 1. Write, compile, and run the conversion program to produce dict.cpp
    > 2. #include dict.cpp in your program, and then compile that.
    >
    > Thus a two step compilation process. Is there a better way to handle
    > this situation?


    Actually that method is fine and is used all over the place.

    #include is normally handled by the preprocessor. From this point of view,
    the preprocessor is simply a conversion program that reads multiple source
    files and creates a single file that is handed off to the actual compiler.

    lex/yacc (i.e. flex/bison) are even more like what you propose. You write
    a lexer or a parser in a special language that more closely models what
    you are doing. lex/yacc then convert the file into a C file which is
    compiled normally.

    So go ahead and do it that way. Give a dictionary file a special extension
    so build tools can recognize what it is.

    samuel
     
    Ron Samuel Klatchko, Jun 14, 2004
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    4
    Views:
    984
    M.E.Farmer
    Feb 13, 2005
  2. Ajinkya
    Replies:
    65
    Views:
    1,196
    Keith Thompson
    Oct 9, 2007
  3. Garrett Cooper
    Replies:
    0
    Views:
    579
    Garrett Cooper
    Feb 24, 2009
  4. Daniel Berger
    Replies:
    2
    Views:
    186
    Daniel Berger
    Jan 25, 2010
  5. Replies:
    5
    Views:
    511
    Wolf Behrenhoff
    May 15, 2009
Loading...

Share This Page