Switching From YUICompressor to Closure Compiler Using ANT <apply>

G

Garrett Smith

I have been using an ANT task with YUI Compressor to apply minfication
all of my javascript, using an <apply>[1] task.

The result is much smaller file sizes, plus helpful info in the console
(my console is in Eclipse).

I tried Google Closure Compiler on a few files and noticed that it does
munge a little more than YUI Compressor (around 10% more). This tool
also emits some helpful warnings to the console.

To account for the difference in the way Closure Compiler jar is
invoked, only a few small changes to my ANT apply task were necessary.

The relevant snippet applies compiler.jar to a fileset, which includes
all of my js files.

<target name="js.minify" depends="js.rollups">
<apply executable="java" parallel="false" verbose="true"
dest="${build}" taskname="js.compile">
<fileset dir="${build}" includes="**/*.js"/>
<arg line="-jar"/>
<arg path="compiler.jar"/>
<arg line="--js"/>
<srcfile/>
<arg line="--js_output_file"/>
<mapper type="glob" from="*.js" to="*-min.js"/>
<targetfile/>
</apply>
</target>

Explanation:
"compiler.jar" relevant arg lines:
--js - the arg line for the file name to compress
--js_output_file - the output file

in ANT:
<srcfile> - the file that gets fed to the executable (compiler.jar).
This filename must be preceeded by the arg line "--js";
<targetfile> - the name of the output file.
This filename is preceded by arg line "-js_output_file"

In a sort of psuedo command line, it would look like:
java -jar -compiler.jar --js <srcfile> --js_output_file <targetfile/>

To get generate the correct command line argument, the <arg> and
<srcfile> elements appear in the following order:

<arg line="--js"/>
<srcfile/>
<arg line="--js_output_file"/>
<mapper type="glob" from="*.js" to="*-min.js"/>
<targetfile/>

[1]http://ant.apache.org/manual/CoreTasks/apply.html
[2]http://code.google.com/closure/compiler/docs/gettingstarted_app.html
 
D

David Mark

I have been using an ANT task with YUI Compressor to apply minfication
all of my javascript, using an <apply>[1] task.

Great. I use BAT files myself.
The result is much smaller file sizes, plus helpful info in the console
(my console is in Eclipse).

Mostly a waste of time these days (let the servers and agents handle
compression). Definitely a bad idea unless you test everything in
"minified" form (and that is a pain). If you are worried about dial-
up users, modems have compression built-in. A few extra KB won't
matter to broadband users. ;)
I tried Google Closure Compiler on a few files and noticed that it does
munge a little more than YUI Compressor (around 10% more). This tool
also emits some helpful warnings to the console.

You would trust a JS tool from Google?
To account for the difference in the way Closure Compiler jar is
invoked, only a few small changes to my ANT apply task were necessary.

So you are going to switch horses just like that? Seems like a bad
idea.
 
G

Garrett Smith

kangax said:
Garrett said:
I have been using an ANT task with YUI Compressor to apply minfication
all of my javascript, using an <apply>[1] task.

The result is much smaller file sizes, plus helpful info in the console
(my console is in Eclipse).

I would not rely on Closure Compiler just yet.

I would like to understand more what it does. I am not advocating it for
production. I guess you could say I'm a free QA for Google.

I have notice that give input:

if(obj.prop) {
meth(obj.prop);
}

the output:

obj.prop&&meth(obj.prop);

I can't see a problem with that, as both cases go through [[ToBoolean]].
They have some nice ideas on decreasing file size; ideas that
YUICompressor doesn't implement.

Such as?
Yet, some things seem to be optimized just a little too much. Look at
what happens with function expressions:

var f = function(){};

becomes:

function f(){}

With this in mind, it's easy to think of example that results in a
different behavior before and after munging:

alert(f);
var f = function(){};

becomes:

alert(f);function f(){};

Oh, no, that changes program behavior.
I don't understand how they could make such obvious mistake; obvious to
anyone understanding the difference between function expressions and
function declarations.

I'm not sure if there is a reasonable explanation for the decision
of changing assignment expression to FunctionDeclaration.

Compilation tools such as this offer potential that has not yet been
realized.
This "quirk" happens in "simple" mode; "Advanced" one does more harmful
things (which, thankfully, Google admits and warns about).

Btw, if you care about Identifiers of NFEs, it's good to know that they
modify (change names) of those too.

I see, yes, they do, don't they.
Finally, it seems that JScript condition compilation statements are
stripped as well (so, for example, your `isMaybeLeak` test would become
defunct).

Ah, yeah, you're right about that.
You can try compressor online at:
<URL: http://closure-compiler.appspot.com/home>
Thanks for the heads-up.


js to js compilation has untapped potential, particularly with lexical
analysis of functions.

For example, a called function can be inlined where it shares scope.

Take two functions, a and b, with a shared scope e.

function e(){
function a(){
b("10");
}
function b(x){ alert(x); }
return a;
}

A possible inline optimization:

function e(){
function a(){
alert("10");
}
return a;
}

That optimization is possible because a and b share scope and do not
use |with| or |arguments|. The result is smaller and more efficient.
 
T

Thomas 'PointedEars' Lahn

Stefan said:
If size is the metric we're optimizing for, then JS minimization + gzip
compression will produce smaller files than gzip alone (obviously).

If you think that would be obvious, you have not understood gzip.


PointedEars
 
G

Gregor Kofler

Stefan Weiss meinte:
If size is the metric we're optimizing for, then JS minimization + gzip
compression will produce smaller files than gzip alone (obviously).

Not necessarily.

Gregor
 
T

Thomas 'PointedEars' Lahn

Stefan said:
I'm not talking about edge cases. Source files where all comments and
unnecessary white space and punctuation have been removed will result in
smaller compressed files.

You will have to prove that.


PointedEars
 
L

Lasse Reichstein Nielsen

Thomas 'PointedEars' Lahn said:
You will have to prove that.

Removing just comments will definitly give a smaller result for any
reasonable and general compression strategy. Comments (non-trivial
ones, at least) contain information, and unless you have a
pathological example, it's information that's unlikely to also occur
in the remaining code. It will cause extra bits in the resulting
compressed data.

Whether removing whitespace and punctuation makes a difference isn't as
obvious, but it's very unlikely to make the result larger (as in: If you
can find a case where it does, please show us).


More generally: consistently and structurally removing characters from
a file is unlikely to make it compress worse. What you are removing
will not increase the total information entrophy of the data. It is
important that the removal is consistent (otherwise the pattern in
where something is removed will itself carry information).

/L
 
L

Lasse Reichstein Nielsen

Stefan Weiss said:
On 15/11/09 16:49, Richard Cornford wrote:

Maybe not much, but "a+b" still compresses better than "a + b". You'll
typically get these cases several times per line. It adds up.

The first time it happens, yes. But if your '+' is always flanked by
spaces, the " + " sequence will quickly compress as well as "+".
I.e., the impact is probably not significant.
But that's purely speculation, ofcourse.

....
I don't agree that the size difference is "extremely small". I just
confirmed this (again) on four relatively large files (concatenated
scripts which are actually deployed like this). Here's what I get:

orig min gzip min+gzip
--------+-------+-------+----------
230K 72K 58K 27K
378K 125K 84K 42K
441K 152K 99K 50K
149K 55K 27K 17K

This is definitly a significant inprovement. Have you tried just
removing the comments and see how well it compresses then?

/L
 
T

Thomas 'PointedEars' Lahn

Lasse said:
Removing just comments will definitly give a smaller result for any
reasonable and general compression strategy. Comments (non-trivial
ones, at least) contain information, and unless you have a
pathological example, it's information that's unlikely to also occur
in the remaining code. It will cause extra bits in the resulting
compressed data.

On the contrary.
Whether removing whitespace and punctuation makes a difference isn't as
obvious, but it's very unlikely to make the result larger (as in: If you
can find a case where it does, please show us).

Et tu, Brute? That is not the way it works.
More generally: consistently and structurally removing characters from
a file is unlikely to make it compress worse.

That this statement is wrong for gzip follows from the definition of the
DEFLATE (LZ77 + Huffman coding) algorithm that gzip uses. LZ77 replaces
duplicate series of bytes within the sliding window with backreferences of
max. 8 bytes (CMIIW), and the Huffman coding for a symbol is shorter the
more likely it is to occur.
What you are removing will not increase the total information entrophy of
the data.

The what?


PointedEars
 
L

Lasse Reichstein Nielsen

Thomas 'PointedEars' Lahn said:
Lasse Reichstein Nielsen wrote:

That this statement is wrong for gzip follows from the definition of the
DEFLATE (LZ77 + Huffman coding) algorithm that gzip uses. LZ77 replaces
duplicate series of bytes within the sliding window with backreferences of
max. 8 bytes (CMIIW), and the Huffman coding for a symbol is shorter the
more likely it is to occur.

Exactly.

Removing whitespace in a structured way will not make such encoding
worse. Converting " + " to "+" *consistently* will give the same number
of back-references, only to a shorter string. However, removing the
whitespace will make the sliding window more efficient - it can span a
larger percentage of the source code, giving more chances of having
a repeated occurence still in the window.

Likewise, removing comments will possibly reduce the number of
back-references, if the remaining source code contains words or
phrases that also occur in the comments. However, if the comments come
after the source code, we are completely removing the back-reference.
If the source comes after the comments, the source becomes the first
occurence of the text, but it shouldn't take up more space than the
first occurence originally in the comments would.
And again, removing the comments makes room for more code in the
sliding window.

All in all, removing content in a consistent and structured way will
generally improve compression, especially using a sliding-window
based encryption method.
The what?

http://en.wikipedia.org/wiki/Information_entropy

/L
 
T

Thomas 'PointedEars' Lahn

Lasse said:
Exactly.

Removing whitespace in a structured way will not make such encoding
worse.

But that applies only if whitespace can be removed without the need for
it to be replaced by something else to keep semantical equivalence.
Converting " + " to "+" *consistently* will give the same number
of back-references, only to a shorter string. However, removing the
whitespace will make the sliding window more efficient - it can span a
larger percentage of the source code, giving more chances of having
a repeated occurence still in the window.

I hadn't thought of that, though. But we are not talking about white-space
around operators, are we?
All in all, removing content in a consistent and structured way will
generally improve compression, especially using a sliding-window
based encryption method.

But we are not talking about arbitrary input either.

That is not what you wrote, though. Hence the question. Consider your
statement being double-checked, then.


PointedEars
 
L

Lasse Reichstein Nielsen

Thomas 'PointedEars' Lahn said:
But that applies only if whitespace can be removed without the need for
it to be replaced by something else to keep semantical equivalence.
True.


I hadn't thought of that, though. But we are not talking about white-space
around operators, are we?

I'm not sure what the "JS compilers" that this started out with does
exactly. I was thinking mainly at a pure minifier.

A minifier should remove any unnecessary whitespace. That typically
means indentation and unnecessary token separations. Newlines can be
icky because of semicolon insertion, but most of the time they can
be removed as well (and when not, they can be consistently converted
to a single semicolon). Some whitespace between tokens cannot be
removed (e.g. "a + +b;"), but the large majority can.
But we are not talking about arbitrary input either.

That's why it works. Javascript is already very structured (as a
formal language, that's to be expected). Removing comments and
reducing whitespace doesn't change the overall structure of the
source, nor will renaming variables (if the same renaming is used for
the same variable name everywhere it's used). Because the program is
still the same, we will still have the same tokens and token
sequences, giving the same opportunities for compression by
back-reference.

The important point is that the removals won't introduce information
into the program. If we only removed some of the whitespaces (e.g.,
the ones with indices that correspond to a one in the binary expansion
of pi), then we would be introducing information in the process. By
being completely consistent, we introduce nothing that wasn't given by
the source already (and if the original author wasn't consistent in
his whitespace placement, we might even remove some unnecessary
information).

That is not what you wrote, though. Hence the question. Consider your
statement being double-checked, then.

ACK. That was my mistake.
/L
 
T

Thomas 'PointedEars' Lahn

Stefan said:
It happens once for each operator, or whatever is usually surrounded by
optional white space.

AIUI, with gzip (and DEFLATE's LZ77) that happens once only if the file
size does not exceed the size of the sliding window (32K or less, at the
implementation's discretion) + 1 Byte. See also RFC 1951.


PointedEars
 
D

Dr J R Stockton

In comp.lang.javascript message <t5pvf5d98m167osh3b3n9hb5rjihcmqu8m@4ax.
com>, Sun, 15 Nov 2009 12:38:24, Hans-Georg Michna <hans-
(e-mail address removed)> posted:
are you aware that what really matters is the size of the
gz-compressed file that is actually delivered to the end user?
You have to compare file sizes after an additional gz
compression.

Not necessarily. Those who are given a limited amount of Web space by
their ISPs will be interested in keeping the size of the files, as
stored on the server, down. They may also be interested in keeping
bandwidth down, if that is also limited. In fact, I use about the same
percentage of each limit at present.
 
G

Garrett Smith

kangax said:
Garrett said:
kangax said:
Garrett Smith wrote:
I have been using an ANT task with YUI Compressor to apply minfication
all of my javascript, using an <apply>[1] task.

The result is much smaller file sizes, plus helpful info in the console
(my console is in Eclipse).

I would not rely on Closure Compiler just yet.
[snip]

I am probably going to be revert to YUI Compressor.

I haven't found a good way around the conditional comment removal and
they do not intend to support preserving conditional comments.

http://code.google.com/p/closure-compiler/issues/detail?id=47&can=1

WONTFIX.
For example, removal of "dead" branches, inlining, etc.

[snip]

I haven't seen the dead branches removal.
You bet :)


Yep. I noticed that there are more optimizations possible there. One
just needs to know ECMAScript syntax and what could be changed into
something shorter without changing program behavior (or at least
changing it in unobservable way).

if (foo) throw 0.1;

becomes:

if(foo)throw 0.1;

but could be easily shortened to:

if(foo)throw.1;

Wow yeah, I could really use that, too. :-D

Odd that here they change throw.1 to throw 0.1;

Interesting.

[...]
(in advanced mode)

I don't really know about the modes.
function f(){ return 'foo'; }
alert(f());

becomes:

alert("foo");

but:

(function(){
function f(){ return 'foo'; }
alert(f());
})();

becomes:

(function(){function a(){return"foo"}alert(a())})();

instead of shorter (and functionally identical):

(function(){alert("foo")})();

or even:

alert("foo");

Closure optimizations require through understanding of scope (function
and eval), else the result will be bugs and missed optimizations:

function e(){
var unusedVar;
function a(){
b(12);
}
function b(s){
alert(s);
}
function c(){
throw.2;
}
}

e() results does absolutely nothing. It can be optimized to:-

function e(){}

- and the result would be identical program behavior, would be smaller,
would be more efficient for interpretation.

yet with
closure compiler:-

function e(){function c(){a(12)}function a(b){alert(b)}function
d(){throw 0.2;}var f};

The promise of dead code removal falls short.

Warnings that the code is unused would be more useful.

e.g. "function x is declared but is never used [file.js, line 0]".
since, as I understand, it doesn't really matter if function is called
from within global code or from within anonymous function, unless this
calling function is something like a reference to global `eval` and so
could declare variables in a wrong place (which would then make it an
indirect eval call — something that could throw exception as allowed by
specification) or if non-standard extensions are involved (e.g. `caller`
from JS 1.5).

An indirect eval can throw an EvalError for ES3. I was aware of Opera
throwing for that during migration to an ES4 draft. That feature was
abandoned along with ES4.

If a global property (function f) is moved to a local property,
and invoked inline, any other scripts referencing -f- identifier
fail.

Inlined function calls for local scope would be safer, as the
interpreter (and the person invoking it) would not have to know about
any other global references to - f-.

The problem with eval can be demonstrated in an example:

function e(){
function a(){
b(12);
}
function b(s){
alert(s);
}
function c(){
eval("b(23)");
}
}

If |eval| is used directly, identifiers cannot be removed from e.

If |eval| is used indirectly, problems would also occur, but that would
be expected.

Regardless, I would rather be warned of possible dead code than have it
removed automatically. That way I can make the discrimination of what
needs to be removed (and looking over the code once again is a good
thing; not a waste of time at all).
 
T

Thomas 'PointedEars' Lahn

Garrett said:
Closure optimizations require through understanding of scope (function
and eval), else the result will be bugs and missed optimizations:

function e(){
var unusedVar;
function a(){
b(12);
}
function b(s){
alert(s);
}
function c(){
throw.2;
}
}

e() results does absolutely nothing. It can be optimized to:-

function e(){}

- and the result would be identical program behavior, would be smaller,
would be more efficient for interpretation.

yet with
closure compiler:-

function e(){function c(){a(12)}function a(b){alert(b)}function
d(){throw 0.2;}var f};

The promise of dead code removal falls short.

Warnings that the code is unused would be more useful.

e.g. "function x is declared but is never used [file.js, line 0]".

IMHO, doing that is the task of the IDE (e.g. Eclipse JSDT, which does it),
not of the source code compressor.
[...]
Regardless, I would rather be warned of possible dead code than have it
removed automatically. That way I can make the discrimination of what
needs to be removed (and looking over the code once again is a good
thing; not a waste of time at all).

See above.


PointedEars
 
G

Garrett Smith

kangax said:
Garrett said:
kangax said:
Garrett Smith wrote:
kangax wrote:
Garrett Smith wrote:
I have been using an ANT task with YUI Compressor to apply
minfication
all of my javascript, using an <apply>[1] task.

The result is much smaller file sizes, plus helpful info in the
console
(my console is in Eclipse).

I would not rely on Closure Compiler just yet.
[snip]

[snips]
function e(){
var unusedVar;
function a(){
b(12);
}
function b(s){
alert(s);
}
function c(){
throw.2;
}
}

e() results does absolutely nothing. It can be optimized to:-

function e(){}

or to nothing at all, since `e` is not referenced anywhere :) (and if
that's all the code there is, obviously)

I think there's a problem there:

this["e"]();

If |e| is removed altogether, a TypeError would result.

Any other less obvious way of getting the global object makes the
proposed optimization not possible.
There are many more optimizations possible. They are only scratching the
surface here. For example, function inlining that they perform looks
very simplistic:

[snip example]

Compiler doesn't build a Tree of [[Scope]]. That is where the best
optimizations could be realized, but they miss that.

[...]
It should be safe to perform inlining in global scope too, as long as
minifier is aware of all the code. It gets tricky if parts of that code
are in html, though :)

Removing global identifiers would be a problem for lazy-load scripts
that may want to use that identifier. Frames that use that identifier,
where square-bracket notation is used.

It is unsafe to remove global identifiers altogether.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top