html compression tools (command line)

Discussion in 'HTML' started by Errol Smith, Sep 18, 2004.

  1. Errol Smith

    Errol Smith Guest

    Hi,

    Does anyone know of command line tools for html compression?
    The only one I am aware of is htmlcrunch
    (http://www.markusstengel.de/htmlcr.html) but, frankly, this does not
    perform very well (often makes the input file bigger!).
    This is for my website compression tool 'webpack'
    (http://www.kludgesoft.com/nix/webpack.html - blatant plug :), for
    which I am trying to avoid writing a better html compressor myself.
    Rather not re-invent the wheel, you know.
    Also if there is any information around on making html more
    compressible, I would appreciate pointers to information/tools (the
    only method I've heard of is making all html tags lower case, but
    there may be other methods).
    Any assistance appreciated!

    Errol Smith
    errol <at> ros (dot) com [period] au
     
    Errol Smith, Sep 18, 2004
    #1
    1. Advertising

  2. Errol Smith

    rf Guest

    "Errol Smith" <> wrote in message
    news:...
    > Hi,
    >
    > Does anyone know of command line tools for html compression?


    This has been discussed here before. The general consensus is that it is a
    waste of time. Look to other things first: Image compression. A badly
    compressed image will waste far more bandwidth than compressing the HTML
    will save; Number of images: 10 images on a page results in 10 round trips
    back to the server, a elapsed time of hundreds of milliseconds, perhaps even
    a number of seconds. Compressing the HTML might save ten or so milliseconds.

    > This is for my website compression tool 'webpack'
    > (http://www.kludgesoft.com/nix/webpack.html - blatant plug :),


    I note that you don't compress *this* page :) You even have great sequences
    of cr/lf in there.

    --
    Cheers
    Richard.
     
    rf, Sep 18, 2004
    #2
    1. Advertising

  3. Errol Smith

    Jim Higson Guest

    Errol Smith wrote:

    > Hi,
    >
    > Does anyone know of command line tools for html compression?
    > The only one I am aware of is htmlcrunch
    > (http://www.markusstengel.de/htmlcr.html) but, frankly, this does not
    > perform very well (often makes the input file bigger!).
    > This is for my website compression tool 'webpack'
    > (http://www.kludgesoft.com/nix/webpack.html - blatant plug :), for
    > which I am trying to avoid writing a better html compressor myself.
    > Rather not re-invent the wheel, you know.
    > Also if there is any information around on making html more
    > compressible, I would appreciate pointers to information/tools (the
    > only method I've heard of is making all html tags lower case, but
    > there may be other methods).
    > Any assistance appreciated!
    >
    > Errol Smith
    > errol <at> ros (dot) com [period] au


    gzip is command line!
    With mod_gzip or mod_gunzip on an Apache server all your pages are sent
    gzipped, completely transparently to most browsers (even IE!) but expanded
    for the very few that can't handle content-encoding gzip.

    Will reduce page size by about 60%, but should be in adition to, rather than
    instead of, compact markup.

    Regardless of compression, I try to keep pages below 10k. Like others have
    said, it is easy to have images larger than this size. Well compressed 8bit
    PNGs and jpegs should help here.

    http://www.innerjoin.org/apache-compression/howto.html
     
    Jim Higson, Sep 18, 2004
    #3
  4. Errol Smith

    Jim Higson Guest

    Jim Higson wrote:

    > Errol Smith wrote:
    >
    >> Hi,
    >>
    >> Does anyone know of command line tools for html compression?
    >> The only one I am aware of is htmlcrunch
    >> (http://www.markusstengel.de/htmlcr.html) but, frankly, this does not
    >> perform very well (often makes the input file bigger!).
    >> This is for my website compression tool 'webpack'
    >> (http://www.kludgesoft.com/nix/webpack.html - blatant plug :), for
    >> which I am trying to avoid writing a better html compressor myself.
    >> Rather not re-invent the wheel, you know.
    >> Also if there is any information around on making html more
    >> compressible, I would appreciate pointers to information/tools (the
    >> only method I've heard of is making all html tags lower case, but
    >> there may be other methods).
    >> Any assistance appreciated!
    >>
    >> Errol Smith
    >> errol <at> ros (dot) com [period] au

    >
    > gzip is command line!
    > With mod_gzip or mod_gunzip on an Apache server all your pages are sent
    > gzipped, completely transparently to most browsers (even IE!) but expanded
    > for the very few that can't handle content-encoding gzip.
    >
    > Will reduce page size by about 60%, but should be in adition to, rather
    > than instead of, compact markup.
    >
    > Regardless of compression, I try to keep pages below 10k. Like others have
    > said, it is easy to have images larger than this size. Well compressed
    > 8bit PNGs and jpegs should help here.
    >
    > http://www.innerjoin.org/apache-compression/howto.html



    Incidently, I'd forget about making pages more compressable prior to
    gzipping, information theory is not on your side.

    I'd also forget about writing a better compression tool than gzip, unless
    you are SERIOUSLY into maths. 7zip might give slightly better results in
    some cases, but AFAIK no browsers accept is as a Content-Encoding.
     
    Jim Higson, Sep 18, 2004
    #4
  5. Errol Smith

    Jim Higson Guest

    Jim Higson wrote:

    > Jim Higson wrote:
    >
    >> Errol Smith wrote:
    >>
    >>> Hi,
    >>>
    >>> Does anyone know of command line tools for html compression?
    >>> The only one I am aware of is htmlcrunch
    >>> (http://www.markusstengel.de/htmlcr.html) but, frankly, this does not
    >>> perform very well (often makes the input file bigger!).
    >>> This is for my website compression tool 'webpack'
    >>> (http://www.kludgesoft.com/nix/webpack.html - blatant plug :), for
    >>> which I am trying to avoid writing a better html compressor myself.
    >>> Rather not re-invent the wheel, you know.
    >>> Also if there is any information around on making html more
    >>> compressible, I would appreciate pointers to information/tools (the
    >>> only method I've heard of is making all html tags lower case, but
    >>> there may be other methods).
    >>> Any assistance appreciated!
    >>>
    >>> Errol Smith
    >>> errol <at> ros (dot) com [period] au

    >>
    >> gzip is command line!
    >> With mod_gzip or mod_gunzip on an Apache server all your pages are sent
    >> gzipped, completely transparently to most browsers (even IE!) but
    >> expanded for the very few that can't handle content-encoding gzip.
    >>
    >> Will reduce page size by about 60%, but should be in adition to, rather
    >> than instead of, compact markup.
    >>
    >> Regardless of compression, I try to keep pages below 10k. Like others
    >> have said, it is easy to have images larger than this size. Well
    >> compressed 8bit PNGs and jpegs should help here.
    >>
    >> http://www.innerjoin.org/apache-compression/howto.html

    >
    >
    > Incidently, I'd forget about making pages more compressable prior to
    > gzipping, information theory is not on your side.
    >
    > I'd also forget about writing a better compression tool than gzip, unless
    > you are SERIOUSLY into maths. 7zip might give slightly better results in
    > some cases, but AFAIK no browsers accept is as a Content-Encoding.


    Man, gotta stop replying to myself, but...

    check out Perl's HTLM::Clean it already does much of what you are (maybe)
    trying to do. Anyone on an Apache server can use it as a filter for dynamic
    content, or apply it offline for static pages.

    http://www.perl.com/pub/a/2003/04/17/filters.html
     
    Jim Higson, Sep 18, 2004
    #5
  6. Errol Smith

    Errol Smith Guest

    On Sat, 18 Sep 2004 08:17:12 GMT, "rf" <rf@.invalid> wrote:

    >> Does anyone know of command line tools for html compression?

    >
    >This has been discussed here before. The general consensus is that it is a
    >waste of time. Look to other things first: Image compression. A badly
    >compressed image will waste far more bandwidth than compressing the HTML
    >will save; Number of images: 10 images on a page results in 10 round trips
    >back to the server, a elapsed time of hundreds of milliseconds, perhaps even
    >a number of seconds. Compressing the HTML might save ten or so milliseconds.


    I know, I read previous posts, but I will not be discouraged, as a I
    believe in the every-byte-counts theory :)
    My tool is intended to cover all bases anyway - it already optimises
    JPG, GIF & PNG images. My tools aim is the last-step prior to
    publishing, just to automatically shave off a few K here & there. It
    won't help you if you save your JPG's with 100% quality and use WORD
    as your html editor ;)

    >> This is for my website compression tool 'webpack'
    >> (http://www.kludgesoft.com/nix/webpack.html - blatant plug :),

    >
    >I note that you don't compress *this* page :) You even have great sequences
    >of cr/lf in there.


    Actually I _do_, but like I said, htmlcrunch is not very good :)

    Errol Smith
    errol <at> ros (dot) com [period] au
     
    Errol Smith, Sep 19, 2004
    #6
  7. Errol Smith

    Errol Smith Guest

    On Sat, 18 Sep 2004 13:01:45 +0100, Jim Higson wrote:
    >>>> Does anyone know of command line tools for html compression?

    ....
    >>>> Also if there is any information around on making html more
    >>>> compressible, I would appreciate pointers to information/tools (the
    >>>> only method I've heard of is making all html tags lower case, but
    >>>> there may be other methods).
    >>>
    >>> gzip is command line!
    >>> With mod_gzip or mod_gunzip on an Apache server all your pages are sent
    >>> gzipped, completely transparently to most browsers (even IE!) but
    >>> expanded for the very few that can't handle content-encoding gzip.
    >>>
    >>> Will reduce page size by about 60%, but should be in adition to, rather
    >>> than instead of, compact markup.
    >>>
    >>> Regardless of compression, I try to keep pages below 10k. Like others
    >>> have said, it is easy to have images larger than this size. Well
    >>> compressed 8bit PNGs and jpegs should help here.
    >>>
    >>> http://www.innerjoin.org/apache-compression/howto.html

    >>
    >>
    >> Incidently, I'd forget about making pages more compressable prior to
    >> gzipping, information theory is not on your side.
    >>
    >> I'd also forget about writing a better compression tool than gzip, unless
    >> you are SERIOUSLY into maths. 7zip might give slightly better results in
    >> some cases, but AFAIK no browsers accept is as a Content-Encoding.

    >
    >Man, gotta stop replying to myself, but...
    >
    >check out Perl's HTLM::Clean it already does much of what you are (maybe)
    >trying to do. Anyone on an Apache server can use it as a filter for dynamic
    >content, or apply it offline for static pages.
    >
    >http://www.perl.com/pub/a/2003/04/17/filters.html


    Jim,

    Thankyou very much for your (mutliple) replies!
    I am aware of the gzip functionality in webservers/browsers, I am
    more interested in html cleaning/optimising (ie. "compact markup").
    This means that browsers and/or servers not supporting those encoding
    methods still benefit, plus even with gzip encoding, the resultant
    compressed file will still be smaller than if the html had not been
    compacted first.
    As for the "making more compressible" I know this is a niche topic
    and there is probably not much to be gained but it interests me anyway
    :) (I do have some knowledge/experience of compression). I can see
    how making the case of all tags consistent would improve compression
    (more dictionary matches), but there may be more to it.
    Oh, and I am definately NOT looking to write a new compressor like
    gzip etc, only an HTML compacter :) (7zip is very good but not fully
    cross platform yet. If there is going to be any new kind of encoding
    standard I would expect it to be .bz2, though it may not be suitable
    for on-the-fly compression due it's large block size).
    Perl's HTML::Clean looks like what I need - I will have to experiment
    with it (but first remember how to use Perl! :)
    Thanks again, I will keep hunting.

    Errol Smith
    errol <at> ros (dot) com [period] au
     
    Errol Smith, Sep 19, 2004
    #7
  8. Errol Smith

    Jim Higson Guest

    Errol Smith wrote:

    > On Sat, 18 Sep 2004 13:01:45 +0100, Jim Higson wrote:
    >>>>> Does anyone know of command line tools for html compression?

    > ...
    >>>>> Also if there is any information around on making html more
    >>>>> compressible, I would appreciate pointers to information/tools (the
    >>>>> only method I've heard of is making all html tags lower case, but
    >>>>> there may be other methods).
    >>>>
    >>>> gzip is command line!
    >>>> With mod_gzip or mod_gunzip on an Apache server all your pages are sent
    >>>> gzipped, completely transparently to most browsers (even IE!) but
    >>>> expanded for the very few that can't handle content-encoding gzip.
    >>>>
    >>>> Will reduce page size by about 60%, but should be in adition to, rather
    >>>> than instead of, compact markup.
    >>>>
    >>>> Regardless of compression, I try to keep pages below 10k. Like others
    >>>> have said, it is easy to have images larger than this size. Well
    >>>> compressed 8bit PNGs and jpegs should help here.
    >>>>
    >>>> http://www.innerjoin.org/apache-compression/howto.html
    >>>
    >>>
    >>> Incidently, I'd forget about making pages more compressable prior to
    >>> gzipping, information theory is not on your side.
    >>>
    >>> I'd also forget about writing a better compression tool than gzip,
    >>> unless you are SERIOUSLY into maths. 7zip might give slightly better
    >>> results in some cases, but AFAIK no browsers accept is as a
    >>> Content-Encoding.

    >>
    >>Man, gotta stop replying to myself, but...
    >>
    >>check out Perl's HTLM::Clean it already does much of what you are (maybe)
    >>trying to do. Anyone on an Apache server can use it as a filter for
    >>dynamic content, or apply it offline for static pages.
    >>
    >>http://www.perl.com/pub/a/2003/04/17/filters.html

    >
    > Jim,
    >
    > Thankyou very much for your (mutliple) replies!
    > I am aware of the gzip functionality in webservers/browsers, I am
    > more interested in html cleaning/optimising (ie. "compact markup").
    > This means that browsers and/or servers not supporting those encoding
    > methods still benefit, plus even with gzip encoding, the resultant
    > compressed file will still be smaller than if the html had not been
    > compacted first.
    > As for the "making more compressible" I know this is a niche topic
    > and there is probably not much to be gained but it interests me anyway
    > :) (I do have some knowledge/experience of compression). I can see
    > how making the case of all tags consistent would improve compression
    > (more dictionary matches), but there may be more to it.
    > Oh, and I am definately NOT looking to write a new compressor like
    > gzip etc, only an HTML compacter :) (7zip is very good but not fully
    > cross platform yet. If there is going to be any new kind of encoding
    > standard I would expect it to be .bz2, though it may not be suitable
    > for on-the-fly compression due it's large block size).
    > Perl's HTML::Clean looks like what I need - I will have to experiment
    > with it (but first remember how to use Perl! :)
    > Thanks again, I will keep hunting.
    >
    > Errol Smith
    > errol <at> ros (dot) com [period] au


    I have a better idea of what you're trying to do now. I quite like the idea.
    I don't think very well made pages could be shrunk much, but for some guy's
    homepage you might be onto something. Some ideas:

    * Replacing class and id names with single letter identifiers in the html
    and css? Might not save much if the file is gzipped since they're repeated
    strings anyway, but might be worth a few bytes. Will also make the code
    harder to read, so personally I would avoid.
    * Replacing long URLs to pages on the same site with hrefs to symlinks on
    the server, with much smaller names? Static pages only I'm affraid.
    * Lossy PNG compression (google for it!) and conversion of PNGs to indexed
    * stripping comments, lf and cr. I don't like this much becasue I think you
    should be able to look at the html of a site, but would save a little
    space.
    * A thumbnail maker that makes the thumbs from lossless versions of the
    artwork, not the published jpeg version so images aren't compressed twice.
    I do this on sites I create, my thumbs are a *little* smaller and a *tiny*
    bit higher quality because of it
    * automatic replacing of img tags with objects, where it is more compact, or
    with divs and css where the image isn't content. Not sure how you could
    tell, mind.
    * Decision-tree induction to convert font tags to css. A lot of bad code
    could be made smaller this way
    * Moving embeded css into seperate file, where the same rules are used on
    several pages.
    * Check out advpng, shrinks PNG images down by few percent or so.

    Ok!
     
    Jim Higson, Sep 19, 2004
    #8
  9. Errol Smith

    Sam Hughes Guest

    Jim Higson <> wrote in
    news::


    > * Check out advpng, shrinks PNG images down by few percent or so.


    How does it compare to PNGOUT?
     
    Sam Hughes, Sep 19, 2004
    #9
  10. Errol Smith

    Jim Higson Guest

    Sam Hughes wrote:

    > Jim Higson <> wrote in
    > news::
    >
    >
    >> * Check out advpng, shrinks PNG images down by few percent or so.

    >
    > How does it compare to PNGOUT?


    I just did a little test with the 9 small images you see at the top of my
    client's page here:

    http://www.masmodels.com/portfolio

    advpng : 57.1k
    pngout : 55.7k (97.5 of advpng)

    so pngout is *slightly* better

    However, I sometimes run my scripts an a Linux/PPC computer so programs
    distributed as i386 binary only are not much use to me.

    About pngout - I don't think I'll use it, personally I don't much like
    software I can't modify, and like even less being directed to a 38k HTML
    file (plus images), where I am asked to wait in line for a 28k download!
    --
    Jim
     
    Jim Higson, Sep 19, 2004
    #10
  11. Errol Smith

    Errol Smith Guest

    On Sun, 19 Sep 2004 15:20:23 +0100, Jim Higson <> wrote:

    >I have a better idea of what you're trying to do now. I quite like the idea.
    >I don't think very well made pages could be shrunk much, but for some guy's
    >homepage you might be onto something. Some ideas:


    That's the aim. People who hand-optimise their pages probably won't
    gain much but this is for lazy people like me :)

    >* Replacing class and id names with single letter identifiers in the html
    >and css? Might not save much if the file is gzipped since they're repeated
    >strings anyway, but might be worth a few bytes. Will also make the code
    >harder to read, so personally I would avoid.
    >* Replacing long URLs to pages on the same site with hrefs to symlinks on
    >the server, with much smaller names? Static pages only I'm affraid.
    >* Lossy PNG compression (google for it!) and conversion of PNGs to indexed


    I hadn't thought of most of these things. Although the idea of
    re-coding a whole site using smaller URL's has tempted me but that
    makes the site harder to use, as people (and search engines) look at
    the URL to get an idea of where they are.

    >* stripping comments, lf and cr. I don't like this much becasue I think you
    >should be able to look at the html of a site, but would save a little
    >space.


    This is what my original post was asking about! :) I think it's OK,
    as long as you are only interested in saving space to do this. You can
    always run htmltidy on the code to make it readable again.

    >* A thumbnail maker that makes the thumbs from lossless versions of the
    >artwork, not the published jpeg version so images aren't compressed twice.
    >I do this on sites I create, my thumbs are a *little* smaller and a *tiny*
    >bit higher quality because of it
    >* automatic replacing of img tags with objects, where it is more compact, or
    >with divs and css where the image isn't content. Not sure how you could
    >tell, mind.
    >* Decision-tree induction to convert font tags to css. A lot of bad code
    >could be made smaller this way
    >* Moving embeded css into seperate file, where the same rules are used on
    >several pages.


    More ideas! I will add them to my "investigate further" pile! thanks

    >* Check out advpng, shrinks PNG images down by few percent or so.


    I'm just downloading it and will see how it compares with pngcrush
    when I get a bit more time.
    Thanks for your many suggestions!



    Errol Smith
    errol <at> ros (dot) com [period] au
     
    Errol Smith, Sep 21, 2004
    #11
  12. Errol Smith

    Sam Hughes Guest

    Errol Smith <> wrote in
    news::

    > On Sun, 19 Sep 2004 15:20:23 +0100, Jim Higson <> wrote:
    >
    >>* Check out advpng, shrinks PNG images down by few percent or so.

    >
    > I'm just downloading it and will see how it compares with pngcrush
    > when I get a bit more time.


    Compare it to PNGOUT, why not, which usually outcompresses pngcrush, uymv.
     
    Sam Hughes, Sep 21, 2004
    #12
  13. Errol Smith

    Errol Smith Guest

    On Sat, 18 Sep 2004 13:01:45 +0100, Jim Higson wrote:

    >check out Perl's HTLM::Clean it already does much of what you are (maybe)
    >trying to do. Anyone on an Apache server can use it as a filter for dynamic
    >content, or apply it offline for static pages.
    >
    >http://www.perl.com/pub/a/2003/04/17/filters.html


    I've given HTML::Clean a try and it produces more compact html than
    htmlcrunch, BUT it introduced some errors in a couple of test files
    that htmlcrunch processed OK, so it's not perfect. (I am not really
    surprised because it is about 4 years old now since the last update).
    I will look into it but Perl is not a language I am familiar with so
    I don't know what I'll be able to achieve. (the author hasn't
    responded to my email).


    Errol Smith
    errol <at> ros (dot) com [period] au
     
    Errol Smith, Sep 24, 2004
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. TSchultz55
    Replies:
    0
    Views:
    639
    TSchultz55
    Jul 26, 2005
  2. Adrian Casey
    Replies:
    5
    Views:
    819
    Adrian Casey
    Jan 12, 2005
  3. Replies:
    0
    Views:
    391
  4. key mapping for command-line tools

    , Apr 7, 2006, in forum: C Programming
    Replies:
    4
    Views:
    311
  5. ShortCircuit

    Command Line Tools

    ShortCircuit, Apr 21, 2010, in forum: VHDL
    Replies:
    0
    Views:
    436
    ShortCircuit
    Apr 21, 2010
Loading...

Share This Page