press.js - yet another javascript compressor

N

nick

I'd like to hear this group's reaction on a javascript compression
script I've been working on. It uses the LZW algorithm and base85
encoding to squeeze large scripts down to size.

Quick test...

used this: http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js
original size: 72173
compressed: 44782

You can test it here:
http://pressjs.googlecode.com/svn/trunk/build/test.html

Browse the source:
http://code.google.com/p/pressjs/source/browse/#svn/trunk/src

I'd love to hear what you guys think, esp. any way we could optimize
it for speed or size, or if you catch any bugs / memory leaks /
namespace pollution / stupid programming fails / etc. Thanks!
 
S

Sean Kinsey

I'd like to hear this group's reaction on a javascript compression
script I've been working on. It uses the LZW algorithm and base85
encoding to squeeze large scripts down to size.

Quick test...

used this:http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js
original size: 72173
compressed: 44782

You can test it here:http://pressjs.googlecode.com/svn/trunk/build/test.html

Browse the source:http://code.google.com/p/pressjs/source/browse/#svn/trunk/src

I'd love to hear what you guys think, esp. any way we could optimize
it for speed or size, or if you catch any bugs / memory leaks /
namespace pollution / stupid programming fails / etc. Thanks!


I'm sorry to say that your attempt to 'compress' code has failed. Did
you ever take into consideration that gzip (used to served compressed
files) also use LZW (and in a more efficient way than you are)?

A quick test I did with an input file of 56.3KB:
Direct compression using 7-Zip into a .gz archive = 12KB
Compression using pressjs and then compressed into a .gz archive:
20.9KB

And the same using a minified version of the same script
Direct compression using 7-Zip into a .gz archive = 4.51KB
Compression using pressjs and then compressed into a .gz archive:
7.68KB

Not to mention the added overhead of having to decompress the file
after the UA has downloaded the file.

The only scenario where this method would be beneficial is where gzip
is not used on the server, bad caching directives are used causing the
file to be downloaded in full each time, and the extra time used
downloading is higher than the extra time needed to decompress.
Hopefully that isn't a too-common scenario.

But hey, it was probably fun to create :)
 
N

nick

I'm sorry to say that your attempt to 'compress' code has failed. Did
you ever take into consideration that gzip (used to served compressed
files) also use LZW (and in a more efficient way than you are)?

Yeah, I thought about that but I figured the point of javascript
compressors was that they would be used in environments where gzip
compression on the server is not an option (many shared hosts, which
many people seem content to use, for some reason don't use gzip).
A quick test I did with an input file of 56.3KB:
Direct compression using 7-Zip into a .gz archive = 12KB
Compression using pressjs and then compressed into a .gz archive:
20.9KB
And the same using a minified version of the same script
Direct compression using 7-Zip into a .gz archive = 4.51KB
Compression using pressjs and then compressed into a .gz archive:
7.68KB

I wonder if encoding to base64 would yield better compression ratios
afterwards? Maybe still not as good as using gzip on the uncompressed
file though.

I just did a similar test with Dean Edwards' "packer" with the "Base62
encode" and "Shrink variables" options on and it manages to get a
similar gzip-compressed size to the gzip-compressed size of the
original... If I can achieve a similar gzip-compressed size after
pressing, I think this should be at least as useful as packer (not
sure what this group's opinion of packer is, though).
Not to mention the added overhead of having to decompress the file
after the UA has downloaded the file.

True, although the size overhead is only about 1200 bytes (and
shrinking), and the processing overhead is negligible.
The only scenario where this method would be beneficial is where gzip
is not used on the server, bad caching directives are used causing the
file to be downloaded in full each time, and the extra time used
downloading is higher than the extra time needed to decompress.
Hopefully that isn't a too-common scenario.

It's more common than you might think (shared hosting).
But hey, it was probably fun to create :)

It was :) Thanks for the comments.
 
N

nick

nick :


"Où qu'il réside, même aux îles Caïmans, tout Français inscrit au rôle
paiera son dû dès Noël" (length 92) "compresses" to 118 characters.

Well, you obviously used the wrong text.

"banana cabana banana cabana banana cabana banana cabana banana
cabana" (length 69) compresses to 44 characters! ;)

Fromhttp://code.google.com/p/pressjs/source/browse/trunk/src/compdict.js:

  // Populate table with all possible character codes.
  for(var i = 0; i < 256; ++i) {
    var str = String.fromCharCode(i);
    this.hashtable[str] = this.nextcode++;
  }  

What about character codes >= 256?

I'm pretty sure those characters aren't allowed in a javascript
document? I'm not really sure what's going on there though, I was
puzzled by that bit as well. See my next paragraph.
My general impression is that you are complicating things for no reason.
Why use constructors, prototypes and fancy "//#" pseudo-cpp directives?
Just one file which defines the two functions that compress and expand
would be much easier both to write and to review.

Yeah, that stuff is all part of another GPL program I ripped off to
make this compressor, which in turn is a pretty much direct port of
some c++ code, so it has a very class-like design. I've been going
through and making it more object-based, and trying to learn the
algorithm at the same time. Eventually I'd like to replace all of that
code, but for now I just wanted to see if this whole idea was viable.

Well, the cpp directives were my idea. I like to be able to separate
the files into logical units, and ifdef comes in handy when building
multiple similar-but-different targets (like stand-alone vs embedded
decompressor).

I'm definitely considering merging instream and outstream
functionalities into the compressor / decompressor, but I think I like
the dictionaries in separate files for now.
(I assume that you are doing this for fun, for the challenge of writing
a compressor in javascript. If it is in order to reduce bandwidth
in real applications on the Web, enabling gzip on the server is much
more efficient.)

Yeah, I'm mostly doing it to see if it can be done. Next I want to
experiment with a different compression algorithm or one of the
variations on LZW. Server-side gzip is obviously the better
alternative if it's available; however that's not always the case (see
my response to Sean) and so we have things like "packer" and maybe
this thing.
 
D

David Mark

nick said:
Yeah, I thought about that but I figured the point of javascript
compressors was that they would be used in environments where gzip
compression on the server is not an option (many shared hosts, which
many people seem content to use, for some reason don't use gzip).

Mine doesn't; still I wouldn't use something like this. The largest
benefactors of GZIP are dial-up users and nodems have built-in
compression. ;)
I wonder if encoding to base64 would yield better compression ratios
afterwards?

I seriously doubt it.
Maybe still not as good as using gzip on the uncompressed
file though.

Almost certainly not.
I just did a similar test with Dean Edwards' "packer" with the "Base62
encode" and "Shrink variables" options on and it manages to get a
similar gzip-compressed size to the gzip-compressed size of the
original... If I can achieve a similar gzip-compressed size after
pressing, I think this should be at least as useful as packer (not
sure what this group's opinion of packer is, though).

Packer is a complete waste of time.
True, although the size overhead is only about 1200 bytes (and
shrinking), and the processing overhead is negligible.

Define negligible.
It's more common than you might think (shared hosting).

Shared hosting doesn't automatically fit that bill. Mine doesn't have
GZIP, but I don't use bad "caching directives". And again, modem-based
compression makes all of these "packers" a waste of time.
 
N

nick

Mine doesn't; still I wouldn't use something like this.  The largest
benefactors of GZIP are dial-up users and nodems have built-in
compression.  ;)

I had never heard of it before, but I found a good article on it here:

http://ixbtlabs.com/articles/compressv44vsv42bis/

It looks like text compresses particularly well in their tests.

Is this kind of thing usually enabled by default, or do modem users
have to jump through a bunch of hoops to set it up? I also found a lot
of instructions for enabling modem compression.
I seriously doubt it.

Only one way to find out. ;)
Almost certainly not.

Might be close. Packer-packed scripts can be slightly smaller than
their non-packed equivalents when gzipped.
Packer is a complete waste of time.

Heh, thought you might say that. Packer has failed to work properly
with at least one script I've written... not sure who's fault that
was, but I've never felt comfortable using it.
[...] the processing overhead is negligible.
Define negligible.

I don't notice any time going by at all, and I'm using an old laptop
from 2003 with one gig of ram downclocked to 1.07 GHz so it doesn't
catch on fire. I guess that's not a very scientific test though.
 
D

David Mark

nick said:
I had never heard of it before, but I found a good article on it here:

http://ixbtlabs.com/articles/compressv44vsv42bis/

It looks like text compresses particularly well in their tests.

Yes, extremely well.
Is this kind of thing usually enabled by default, or do modem users
have to jump through a bunch of hoops to set it up? I also found a lot
of instructions for enabling modem compression.

It typically works right out of the box. Has for decades. Skip the
articles about modem init strings. They haven't been a concern for the
average user in decades.
Only one way to find out. ;)


Might be close. Packer-packed scripts can be slightly smaller than
their non-packed equivalents when gzipped.

But then you have to download Packer and wait for it to decompress the
content. It's a waste of time.
Heh, thought you might say that. Packer has failed to work properly
with at least one script I've written... not sure who's fault that
was, but I've never felt comfortable using it.

Even if it worked flawlessly, it would still be a waste of time. The
fact that it introduces an additional point of failure is just a
"bonus". ;)
[...] the processing overhead is negligible.
Define negligible.

I don't notice any time going by at all, and I'm using an old laptop
from 2003 with one gig of ram downclocked to 1.07 GHz so it doesn't
catch on fire. I guess that's not a very scientific test though.

No.
 
D

David Mark

nick said:
GG did something weird with the quotes, sorry about that.

That's alright. Thunderbird did something very weird with the post as
well. It threw an error (too many IP connections to the server or some
such BS), put it in the Sent folder, but didn't send it. Just so
happens I noticed and pasted it into GG. That's probably what caused
whatever weirdness you are referring to.
 
N

nick

nick said:
Is [modem compression] usually enabled by default, or do modem users
have to jump through a bunch of hoops to set it up? I also found a lot
of instructions for enabling modem compression.
It typically works right out of the box.  Has for decades.  Skip the
articles about modem init strings.  They haven't been a concern for the
average user in decades.

I was seeing stuff like this:

http://technet.microsoft.com/en-us/library/cc754722(WS.10).aspx

It doesn't mention what the default setting might be, of course.
Even if [packer] worked flawlessly, it would still be a waste of time.  
The fact that it introduces an additional point of failure is just a
"bonus".  ;)

Hopefully press.js will remain bonus-free, and prove to be an
interesting time-waster.
[...] the processing overhead is negligible.
Define negligible.
I don't notice any time going by at all [...]

"Show me where it's slow!" ;)
 
S

Sean Kinsey

Fromhttp://code.google.com/p/pressjs/source/browse/trunk/src/compdict.js:

  // Populate table with all possible character codes.
  for(var i = 0; i < 256; ++i) {
    var str = String.fromCharCode(i);
    this.hashtable[str] = this.nextcode++;
  }  
What about character codes >= 256?

I'm pretty sure those characters aren't allowed in a javascript
document? I'm not really sure what's going on there though, I was
puzzled by that bit as well. See my next paragraph.

I'm pretty sure UTF-8 is supported quite well by most javascript
parsers :)
 
D

David Mark

nick said:
nick said:
Is [modem compression] usually enabled by default, or do modem users
have to jump through a bunch of hoops to set it up? I also found a lot
of instructions for enabling modem compression.
It typically works right out of the box. Has for decades. Skip the
articles about modem init strings. They haven't been a concern for the
average user in decades.

I was seeing stuff like this:

http://technet.microsoft.com/en-us/library/cc754722(WS.10).aspx

That's simply an instruction on how to enable/disable hardware-based
compression in Windows for installed modems.
It doesn't mention what the default setting might be, of course.
On.
Even if [packer] worked flawlessly, it would still be a waste of time.
The fact that it introduces an additional point of failure is just a
"bonus". ;)

Hopefully press.js will remain bonus-free, and prove to be an
interesting time-waster.

Godspeed. :)
[...] the processing overhead is negligible.
Define negligible.
I don't notice any time going by at all [...]

"Show me where it's slow!" ;)

Try counting one-one thousand, two-one thousand... Or perhaps a
stopwatch. :)
 
N

nick

nick :


They are. Obviously in string literals and property names, but also
in identifiers. See the specs, 7.6 Identifier Names and Identifiers.
There is no way you can assume that String.prototype.charCodeAt returns
a byte - the specs say "a nonnegative integer less than 2^16".

Hmm... so I wonder how this passes the "Où qu'il réside" test? Were
all of those char codes <= 256?

If 16 bit characters are allowed in string literals I should be able
to switch to a base-256 encoding, no?
Why? It seems to me that two straightforward functions is all that
is needed. A bit like those :

http://stackoverflow.com/questions/294297/javascript-implementation-o...

Thanks for the link; I definitely need to look into that.
(Except that they don't work - they fail my "French taxpayers" test
string much worse than your implementation.)

Mine is a able to "compress" and decompress that string, it just fails
at making it any smaller.
Now, *that* I understand :)


I think it is. I am not sure that it will earn or save anybody a cent,
but that is not always the point.

Thanks for the vote of confidence. I'm sure it will be useful for
something, if only as a learning experiment.
If your shared server uses Apache, the SetOutputFilter DEFLATE
directive can be set in the .htaccess file on your directory. It
won't work if the deflate module hasn't been enabled globally, but
if it hasn't and the sysadmin refuses to do so even when you ask,
you may try and see if one of his competitors will.

That may come in handy! PHP output buffers can also do the trick in
situations where the above won't work, but they're a pain in the ass
to use for non php/html files.

Apparently IE (has/had) screwed-up handling of gzipped content,
caching all gzipped files regardless of any caching directives. Has
anyone seen that in practice?
 
T

Thomas 'PointedEars' Lahn

Johannes said:
nick :

Yes. Any character in the Latin-1 Supplement is represented by a number
between 0x0080 and 0x00FF in UTF-16,

No, in Unicode.
which is what javascript uses.

| A conforming [ECMAScript] implementation [...] shall interpret characters
| in conformance with the Unicode Standard, Version 3.0 or later and
| ISO/IEC 10646-1 with either UCS-2 or UTF-16 as the adopted encoding form,
| implementation level 3. If the adopted ISO/IEC 10646-1 subset is not
| otherwise specified, it is presumed to be the BMP subset, collection 300.
| If the adopted encoding form is not otherwise specified, it presumed to be
| the UTF-16 encoding form.

Learn the difference between character set and encoding.
So, for French (except "Å“" and "Å’"), Spanish, German, Portuguese, Danish
and a few others, you should be all right. But it won't work with Greek,
Russian or Chinese, and certainly not with Egyptian hieroglyphs which
require *two* 16-bit char codes.

Modern Greek, Cyrillic as used in Russian requires, and Han characters as
they are used e.g. in Standard Mandarin usually require one _UTF-16 code
unit_, but characters from CJK Extensions-B and -C, and Compatibility
Ideographs Supplement require two of them.

Egyptian hieroglyphs require two _UTF-16 code units_. This is however
unrelated to the fact that their code points require at least two 16-bit
words to be represented in binary. It is a misconception to think of UTF-8,
UTF-16 or UTF-32 as encodings that combine char(acter) codes to represent
another character.

Learn the difference between characters and code units.

<http://unicode.org/faq/>


PointedEars
 
A

Andrea Giammarchi

Packer is a complete waste of time.

you never waste an opportunity to be arrogant, don't ya?

Dean' packer has been revolutionary by its time and it is still widely
adopted, improved, maintained, regardless what *you* think.

A bit more respect for those devs that have been always there teaching
and explaining us with valid software and/or experiments would be
probably more appropriate for this group, isn't it?

Br,
Andrea Giammarchi
 
A

Andrea Giammarchi

.... and btw, for the record, this press.js is a nice experiment as
well. The "decompressor" uses lot of unnecessary spaces and notation
but even if improved other guys already explained the side effect.

The fact hosts do not allow gzip means nothing to me, you can gzip and
deflate on build time then serve already gzipped/deflated files using
proper headers so the host won't be anything different from serving
just a file, and it won't be overloaded because of runtime
compression.

If you want an example, here one of my projects that does exactly what
I have described:
http://code.google.com/p/php-client-booster/

Best Regards,
Andrea Giammarchi
 
D

David Mark

Andrea said:
you never waste an opportunity to be arrogant, don't ya?

I was asked what I thought of that tool and answered. You never waste
an opportunity to drop in and change the subject to me, do you? JFTR, I
am not an appropriate topic here.
Dean' packer has been revolutionary by its time and it is still widely
adopted, improved, maintained, regardless what *you* think.

A complete waste of time. Harmful even. I do all I can to try to steer
people away from using it. What's it to you?
A bit more respect for those devs that have been always there teaching
and explaining us with valid software and/or experiments would be
probably more appropriate for this group, isn't it?

Always there? Valid software? What are you babbling about now?
 
D

David Mark

Andrea said:
... and btw, for the record,

You still don't understand newsgroups, do you?
this press.js is a nice experiment as
well.

Nice in what way?
The "decompressor" uses lot of unnecessary spaces and notation
but even if improved other guys already explained the side effect.

Why don't you quote what you are replying to? It would make things so
much easier. :)
The fact hosts do not allow gzip means nothing to me,

Odd. It means that some hosts do not allow GZIP.
you can gzip and
deflate on build time then serve already gzipped/deflated files using
proper headers so the host won't be anything different from serving
just a file, and it won't be overloaded because of runtime
compression.

That really depends on the host environment, doesn't it? You can't
serve such files without content negotiation. You would have to have
two of each resource as well. Sounds like a nightmare, assuming the
host environment allows for it at all. I promise you can't do that with
static files on my host.
If you want an example, here one of my projects that does exactly what
I have described:
http://code.google.com/p/php-client-booster/

I'll pass. But thanks anyway!
 
A

Andrea Giammarchi

As usual, you don't read/know what you are talking about but you
pretend to understand ...

You still don't understand newsgroups, do you?

true since I can't find a way to ignore your comments, totally
pointless, arrogant, and useless since AFAIK you don't provide any
help ever

Nice in what way?
in the way that ... as experiment, is a nice one. I have created GIF
compressor for JS and tried LZMA (7-Zip) as well. These are nice
experiments if you are interested in compression algos.
For the meaning of "nice" itself I suggest Dictionary Online or
something similar
Odd.  It means that some hosts do not allow GZIP.

... oh, really? ... indeed I have explained why later ...
That really depends on the host environment, doesn't it?

no, it doesn't, it is the UA that sends headers, you can send back
binary data (e.g. files) ? You can send pre compressed files as well
as long as the header function/way to set it works (which host
doesn't? a list please)

You would have to have two of each resource as well.  Sounds like a nightmare

it's called automation, it's used in every IT development process. You
don't have to do anything, the automation (build process) does it for
you. You'll never serve two files, simply the best one per request and
only if necesary (Etag in da house) automatically, but why am I
wasting time again with somebody that ...
I'll pass.  But thanks anyway!

how do you learn things? Uh wait ... you don't have to, do ya? I think
you do big time but never mind, I am off again since all I had to say
is there, now have fun with the rest of your usual flame ;-)

Best Regards,
Andrea Giammarchi
 
D

David Mark

Andrea said:
As usual, you don't read/know what you are talking about but you
pretend to understand ...

With regard to what? Hint: quote.
true since I can't find a way to ignore your comments, totally
pointless, arrogant, and useless since AFAIK you don't provide any
help ever

LOL. You don't read this group at all then?
in the way that ... as experiment, is a nice one. I have created GIF
compressor for JS and tried LZMA (7-Zip) as well. These are nice
experiments if you are interested in compression algos.
Great.

For the meaning of "nice" itself I suggest Dictionary Online or
something similar

I didn't ask what the word meant. Talk about pointless comments.
... oh, really? ... indeed I have explained why later ...

Why what?
no, it doesn't, it is the UA that sends headers, you can send back
binary data (e.g. files) ? You can send pre compressed files as well
as long as the header function/way to set it works (which host
doesn't? a list please)

You can't just plop GZIP files on the server. You have to do content
negotiation and would need two copies of each static file (one
compressed, one not). I know how to do it, I just choose not too. I
suspect I will switch hosts soon anyway now that traffic is getting heavy.
it's called automation, it's used in every IT development process.

Gee, I never thought of that. :)
You
don't have to do anything, the automation (build process) does it for
you.

You are a wizard!
You'll never serve two files, simply the best one per request and
only if necesary (Etag in da house) automatically, but why am I
wasting time again with somebody that ...

I never said anything about serving two files. What would that even
mean? And I don't know why you waste so much time.
how do you learn things? Uh wait ... you don't have to, do ya?
I think
you do big time but never mind, I am off again since all I had to say
is there, now have fun with the rest of your usual flame ;-)

My usual flame?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top