C source cruncher wanted

Discussion in 'C Programming' started by David Given, Oct 12, 2005.

  1. David Given

    David Given Guest

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    I have a project where I need to distribute 300kB of C source code as part
    of a shell script, and would like to compress it as much as possible.

    Does anyone know where I can get a (open source) tool for crunching C
    programs? That is, removal of whitespace, comments, extraneous characters,
    renaming identifiers to make them as short as possible, etc. Actual
    outright obfuscation is not my goal; I just want to reduce the source size.

    (Yes, I know I could use a tool such as gzip, but for various reasons I'd
    like to make the actual source code as small as possible as well.)

    It seems to be rather hard to find crunchers these days --- I know there
    certainly used to be some, and you can still get them for Javascript, but I
    can't find anything that works on C...

    I'm using a Unix environment.

    - --
    +- David Given --McQ-+ "I must have spent at least ten minutes out of my
    | | life talking to this joker like he was a sane
    | () | person. I want a refund." --- Louann Miller, on
    +- www.cowlark.com --+ rasfw

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.2 (GNU/Linux)

    iD8DBQFDTP0yf9E0noFvlzgRAiHDAJ9DzfPEay2LgUvR9WP9AZAZCxfvhQCeMtBo
    g8Ui+u/TcCMT7jS7BfFOZpc=
    =1Fp3
    -----END PGP SIGNATURE-----
    David Given, Oct 12, 2005
    #1
    1. Advertising

  2. David Given

    Skarmander Guest

    David Given wrote:
    > I have a project where I need to distribute 300kB of C source code as part
    > of a shell script, and would like to compress it as much as possible.
    >
    > Does anyone know where I can get a (open source) tool for crunching C
    > programs? That is, removal of whitespace, comments, extraneous characters,
    > renaming identifiers to make them as short as possible, etc. Actual
    > outright obfuscation is not my goal; I just want to reduce the source size.
    >
    > (Yes, I know I could use a tool such as gzip, but for various reasons I'd
    > like to make the actual source code as small as possible as well.)
    >
    > It seems to be rather hard to find crunchers these days --- I know there
    > certainly used to be some, and you can still get them for Javascript, but I
    > can't find anything that works on C...
    >

    Likely because it makes people go "now what good is that", like I'm
    going right now. Now what good is that? :)

    It's easy enough to write something like that, though. Just grab any
    random C parser + pretty printer from the net and modify it so it prints
    small instead of pretty.

    Renaming identifiers is slightly trickier because you have to take care
    to do it only for non-external symbols (if your code is self-contained,
    this doesn't matter). Additional complications arise depending on
    whether you want to collapse units into one or not, and whether you're
    willing to use #defines or not (I wouldn't bother; too much opportunity
    for error). Then there's the ISO C limit on line length (I forget this
    one; 509 characters?) that you'll have to respect if you want code to
    remain portable.

    But I'll still go on record as saying it's not worth it. Any platform
    that can compile C has gzip (or is capable of decompressing the format,
    at least). The source produced this way is nearly useless to
    maintainers, especially if identifiers are renamed, whether obfuscation
    is your goal or not.

    You also don't save on compilation time: either the code is stable, in
    which case a one-time saving of this magnitude is probably irrelevant,
    or it's not stable, in which case you have to "compress" it every time
    you change the original, which takes more time than just feeding it to
    the compiler, unless your compiler really sucks. About the only thing I
    can imagine this is good for is reduced transmission times over a
    network, but again, gzip is your friend. In fact, HTTP has built-in support.

    S.
    Skarmander, Oct 12, 2005
    #2
    1. Advertising

  3. David Given

    Dale Guest

    David Given <> wrote in
    news:m073f.444$:

    > -----BEGIN PGP SIGNED MESSAGE-----
    > Hash: SHA1
    >
    > I have a project where I need to distribute 300kB of C source code as
    > part of a shell script, and would like to compress it as much as
    > possible.
    >
    > Does anyone know where I can get a (open source) tool for crunching C
    > programs? That is, removal of whitespace, comments, extraneous
    > characters, renaming identifiers to make them as short as possible,
    > etc. Actual outright obfuscation is not my goal; I just want to reduce
    > the source size.
    >
    > (Yes, I know I could use a tool such as gzip, but for various reasons
    > I'd like to make the actual source code as small as possible as well.)
    >
    > It seems to be rather hard to find crunchers these days --- I know
    > there certainly used to be some, and you can still get them for
    > Javascript, but I can't find anything that works on C...
    >
    > I'm using a Unix environment.


    Well, there's always sed. You could use that to remove all the spaces,
    tabs. newline characters and whatever else you want to be rid of. Just
    replace the unwanted characters with nothing (e.g.., s/\ //g to get rid of
    spaces).
    Dale, Oct 12, 2005
    #3
  4. David Given

    Eric Sosman Guest

    David Given wrote On 10/12/05 08:07,:
    > -----BEGIN PGP SIGNED MESSAGE-----
    > Hash: SHA1
    >
    > I have a project where I need to distribute 300kB of C source code as part
    > of a shell script, and would like to compress it as much as possible.
    >
    > Does anyone know where I can get a (open source) tool for crunching C
    > programs? That is, removal of whitespace, comments, extraneous characters,
    > renaming identifiers to make them as short as possible, etc. Actual
    > outright obfuscation is not my goal; I just want to reduce the source size.
    > [...]


    CB Falconer (anybody know why he's been so silent of late?)
    has made mention of an identifier-renaming program he wrote;
    you might be able to modify it to squeeze out excess white space
    at the same time. I don't have a link to his code repository,
    but if you Google your way through some of his postings to this
    group you'll probably find it.

    (Still, I've got to echo Skarmander's question: "Now, what
    good is that?")

    --
    Eric Sosman, Oct 12, 2005
    #4
  5. David Given

    Kevin Handy Guest

    Dale wrote:
    > David Given <> wrote in
    > news:m073f.444$:
    >
    >
    >>-----BEGIN PGP SIGNED MESSAGE-----
    >>Hash: SHA1
    >>
    >>I have a project where I need to distribute 300kB of C source code as
    >>part of a shell script, and would like to compress it as much as
    >>possible.
    >>
    >>Does anyone know where I can get a (open source) tool for crunching C
    >>programs? That is, removal of whitespace, comments, extraneous
    >>characters, renaming identifiers to make them as short as possible,
    >>etc. Actual outright obfuscation is not my goal; I just want to reduce
    >>the source size.
    >>
    >>(Yes, I know I could use a tool such as gzip, but for various reasons
    >>I'd like to make the actual source code as small as possible as well.)
    >>
    >>It seems to be rather hard to find crunchers these days --- I know
    >>there certainly used to be some, and you can still get them for
    >>Javascript, but I can't find anything that works on C...
    >>
    >>I'm using a Unix environment.

    >
    >
    > Well, there's always sed. You could use that to remove all the spaces,
    > tabs. newline characters and whatever else you want to be rid of. Just
    > replace the unwanted characters with nothing (e.g.., s/\ //g to get rid of
    > spaces).


    Idon'tthinkyouwanttoremoveallspaces,especiallyinquotedstrings.

    I really don't understand the need for this "crunching", unless
    it is for obfusation, and then there are probably better ways
    of doing that. Running it through a "pretty-printer" like indent
    would "decrunch" the source.

    Now, if the original poster would specify why he wants to do
    this, we an comment on it intelligently.

    ----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
    http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
    ----= East and West-Coast Server Farms - Total Privacy via Encryption =----
    Kevin Handy, Oct 12, 2005
    #5
  6. David Given

    David Given Guest

    Kevin Handy wrote:
    [...]
    > I really don't understand the need for this "crunching", unless
    > it is for obfusation, and then there are probably better ways
    > of doing that. Running it through a "pretty-printer" like indent
    > would "decrunch" the source.
    >
    > Now, if the original poster would specify why he wants to do
    > this, we an comment on it intelligently.


    Surely that's irrelevant? I do know what I'm looking for, and I do have
    specific reasons for wanting it.

    FWIW, what I've got is a build utility consisting of a shell script
    containing a script and a chunk of source code which is the interpreter for
    the script. When the utility is run for the first time, it will unpack the
    interpreter, compile it, stash the binary somewhere, and then use it to
    invoke the script.

    The interpreter is currently pretty chunky. I want people to be able to
    deploy the utility by just dropping it in to a source distribution, which
    means I want to make it as small as possible. Being able to read the code
    isn't an issue, because if you're developing, you use the full,
    uncompressed source.

    I'm currently building several versions of the shell script package, using
    different encodings for the interpreter source. The uncompressed version is
    about 400kB. The non 7-bit clean version, which is diff unfriendly and uses
    a gzip compressed data chunk, is 100kB. The 7-bit clean version, which uses
    gzip and then uuencode, is 150kB. If I can reduce the size of the
    interpreter source then I can reduce the size of the package, even if it is
    using gzip. It's worth noting that using 'cobfusc -dem' I can reduce the
    source code size by 40%, which reduces the gzip compressed version by 25%,
    so using a code cruncher *is* useful; but cobfusc was not intended for code
    compression, so I can't achieve any further savings.

    None of this is particularly on-topic, which I why I didn't mention it to
    begin with...

    --
    +- David Given --McQ-+ "They laughed at Newton. They laughed at Einstein.
    | | Of course, they also laughed at Bozo the Clown."
    | () | --- Carl Sagan
    +- www.cowlark.com --+
    David Given, Oct 12, 2005
    #6
  7. In article <MIb3f.9528$>, David Given <> writes:
    >
    > FWIW, what I've got is a build utility consisting of a shell script
    > containing a script and a chunk of source code which is the interpreter for
    > the script. When the utility is run for the first time, it will unpack the
    > interpreter, compile it, stash the binary somewhere, and then use it to
    > invoke the script.


    Ah, it's "Revenge of the Shell Archive".

    > The interpreter is currently pretty chunky. I want people to be able to
    > deploy the utility by just dropping it in to a source distribution, which
    > means I want to make it as small as possible. Being able to read the code
    > isn't an issue, because if you're developing, you use the full,
    > uncompressed source.
    >
    > I'm currently building several versions of the shell script package, using
    > different encodings for the interpreter source. The uncompressed version is
    > about 400kB. The non 7-bit clean version, which is diff unfriendly and uses
    > a gzip compressed data chunk, is 100kB. The 7-bit clean version, which uses
    > gzip and then uuencode, is 150kB.


    uuencode is a lousy encoding (its expansion ratio is 5:3). Base64
    would be significantly better (4:3). You should drop about 20KB with
    Base64.

    (Are you using gzip with maximum compression?)

    > If I can reduce the size of the
    > interpreter source then I can reduce the size of the package, even if it is
    > using gzip.


    Someone already suggested modifying a source reformatter (like
    indent), since you basically need a C parser and a backend that
    writes C source in something close to its minimal representation.
    Personally, I wouldn't bother with renaming identifiers - I think
    that's past the point of diminishing returns.

    It shouldn't be hard to remove comments (assuming no pathological
    cases; see the thread starting at [1]) and leading/trailing
    whitespace from your source, if you want to do it yourself. That
    alone should get you some savings.


    1. http://groups.google.com/group/comp.lang.c/msg/41a4486b8ae7dcc1

    --
    Michael Wojcik

    World domination has encountered a momentary setback. Talk amongst
    yourselves. -- Darby Conley
    Michael Wojcik, Oct 13, 2005
    #7
  8. David Given <> writes:

    >-----BEGIN PGP SIGNED MESSAGE-----
    >Hash: SHA1


    >I have a project where I need to distribute 300kB of C source code as part
    >of a shell script, and would like to compress it as much as possible.


    >Does anyone know where I can get a (open source) tool for crunching C
    >programs? That is, removal of whitespace, comments, extraneous characters,
    >renaming identifiers to make them as short as possible, etc. Actual
    >outright obfuscation is not my goal; I just want to reduce the source size.


    IIRC there something like that in "A book on C" by Kelley & Pohl.
    --
    Jan van den Broek
    0xAFDAD00D
    http://huizen.dds.nl/~balglaas/
    the Swampster, Oct 17, 2005
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Harvey
    Replies:
    0
    Views:
    693
    Harvey
    Jul 16, 2004
  2. Harvey
    Replies:
    1
    Views:
    837
    Daniel
    Jul 16, 2004
  3. David M. Siegel
    Replies:
    0
    Views:
    345
    David M. Siegel
    Oct 20, 2003
  4. Replies:
    0
    Views:
    327
  5. Garry Heaton

    HTML whitespace/commnets cruncher

    Garry Heaton, Oct 19, 2003, in forum: Perl Misc
    Replies:
    4
    Views:
    94
    Tad McClellan
    Oct 20, 2003
Loading...

Share This Page