PHP function to compact javascript code?

M

Matt Kruse

I am far from a PHP expert, and I've been struggling to create a function
which will take a javascript .js file and "compact" it as much as possible.
Meaning, remove all comments and unnecessary whitespace. "Obfuscation" is
not necessary. Obviously, the compacted javascript should run identically to
the original javascript.

Can anyone point me to an existing function that does this?

Thanks!
 
J

JDS

Can anyone point me to an existing function that does this?

I can't. But a beter question is, why bother? Is your javascript stuff
really so huge that taking out the whitespace will matter that much?
 
R

RobG

Matt said:
I am far from a PHP expert, and I've been struggling to create a function
which will take a javascript .js file and "compact" it as much as possible.
Meaning, remove all comments and unnecessary whitespace. "Obfuscation" is
not necessary. Obviously, the compacted javascript should run identically to
the original javascript.

Can anyone point me to an existing function that does this?

Thanks!

<URL:http://www.crockford.com/javascript/jsmin.html>


" JSMin is a filter which removes comments and unnecessary whitespace
from JavaScript files. It typically reduces filesize by half, resulting
in faster downloads. It also encourages a more expressive programming
style because it eliminates the download cost of clean, literate
self-documentation. "

Obfuscation is not performed, uglification is. ;-)
 
M

Matt Kruse

JDS said:
I can't. But a beter question is, why bother? Is your javascript
stuff really so huge that taking out the whitespace will matter that
much?

Whitespace reduction often isn't as dramatic as comment removal, but with
both combined - yes.

If comments are used liberally and documentation for a library is included
within the source file itself, a js file could be, say, 50k or more.

When compacted, the same file could be 25k or even less. Since it has the
same functionality, there's no reason _not_ to compact it and save some
bandwidth and download time.
 
T

Tim Roberts

Colin McKinnon said:
...
Regular expressions can be used for all of this in PHP, although I'd
probably use str_replce for most of the whitespace stuff:

$js=str_replace("\t", ' ', $js); // tab chars
$js=str_replace("\r", '', $js); // strip CR
$js=str_replace("\n\n", "\n", $js); // double NL
$js=str_replace(' ', ' ', $js); // double space -> single space

It isn't that easy. This won't strip // type comments, and it will screw
up all of the string constants. It isn't rocket science, but you basically
need to implement a miniature Javascript parser to make this work reliably.
 
A

ASM

No version for Mac ? :-(
1) It ain't PHP. I want PHP :)

I do not know PHP but I heard something about gzip ...

if your server has the good library
and if you accept files *.php

<?php ob_start('ob_gzhandler'); ?>
<!DOCTYPE HTML PUBLIC
blah blah and rest of page
</html>

would compress file on fly via gzip
(I've seen gain : up to 90% if no image)

You can aslo compress your files at home with gZip (*.gz)
and then upload them on your server

Every browser (even NC3)
can decompress this datas on fly in few mili-seconds
 
M

Martin Bialasinski

Matt Kruse said:
1) It ain't PHP. I want PHP :)

Convert it from C to PHP :)

But if you can compile C on your server, I would not bother and just
use system() to call it.

http://jscompact.sourceforge.net/ is an alternative for jsmin that
uses spidermonkey. Using a real JS engine, it should be even
better.


Bye,
Martin
 
D

Dr John Stockton

JRS: In article <[email protected]>, dated
Sun, 14 Aug 2005 21:17:43, seen in JDS
I can't. But a beter question is, why bother? Is your javascript stuff
really so huge that taking out the whitespace will matter that much?

Very probably. Though the percentage of superfluous material is as
important as the overall hugeness.

I wrote a tool to indent Pascal; that work enables me to realise some of
the problems. One must be able to recognise, reliably, whether one is
in string or in comment, even though strings may contain comment markers
and comment may contain string quotes and quotes may be escaped.

For a worst(?)-case, consider
eval("X = 3 /* rhubarb's nice \" *" + "/ + 4")
in which the comment closer does not appear in the source file.

It would be easier if the author used a subset of javascript : for
example, no eval, no literal // in a string (use /\/ instead).

It might be better if some comment were preserved : say /** ... **/ .
Then an author could put /** (c) Fred 1984 **/ without losing it.

Then there's probably little to gain from removing superfluous
whitespace within a line, especially if it is consistently-sized and
therefore compressible (I remove trailing whitespace from all my pages
automatically as part of the checking process, using MiniTrue).

One should check javascript-specific editors; a tool that understands
the language should be able to omit superfluities. Something like what
AMAYA and HTML-kit can (potentially?) do for HTML.

Another approach would be to remove all //-to-line-end, then all /** ...
**/ comment, then all leading and trailing whitespace, then all blank
lines. An author writing code to be so processed would probably get
over 95% of possible reduction. AFAICS, MiniTrue could easily do that,
though it might require four passes.
 
R

Randy Webb

Matt Kruse said the following on 8/14/2005 10:32 PM:
Whitespace reduction often isn't as dramatic as comment removal, but with
both combined - yes.

If comments are used liberally and documentation for a library is included
within the source file itself, a js file could be, say, 50k or more.

When compacted, the same file could be 25k or even less. Since it has the
same functionality, there's no reason _not_ to compact it and save some
bandwidth and download time.

Read the file in with PHP.
Split it on line boundaries (if it doesn't already) so that you end up
with an array that is the same length as the number of lines of code.

Loop through the array and remove any empty entries (blank lines in the
code), entries that start with \\.

Then you would have to write a loop that would find comments that start
with \* and find the next entry that ended in *\ and remove them.

Should remove all the comments (unless they are inline comments :) )

var k = "My mama"; //This var keeps track of my Mama

Sounds like fun to try to write it but it's been a while for PHP, might
give me a good reason to freshen up on it. If not, write it in JS and
let you convert it to PHP :)
 
V

Vladdy

Matt said:
I am far from a PHP expert, and I've been struggling to create a function
which will take a javascript .js file and "compact" it as much as possible.
Meaning, remove all comments and unnecessary whitespace. "Obfuscation" is
not necessary. Obviously, the compacted javascript should run identically to
the original javascript.

Can anyone point me to an existing function that does this?

Thanks!
Works for me:

function compress($code)
{ // Remove multiline comment
$mlcomment = '/\/\*(?!-)[\x00-\xff]*?\*\//';
$code = preg_replace($mlcomment,"",$code);

// Remove single line comment
$slcomment = '/[^:]\/\/.*/';
$code = preg_replace($slcomment,"",$code);

// Remove extra spaces
$extra_space = '/\s+/';
$code = preg_replace($extra_space," ",$code);

// Remove spaces that can be removed
$removable_space = '/\s?([\{\};\=\(\)\\\/\+\*-])\s?/';
$code = preg_replace('/\s?([\{\};\=\(\)\/\+\*-])\s?/',"\\1",$code);

return $code;
}

Make sure your JS code has all ";" otherwise you will get syntax errors
after removing new lines that were interpreted as ";"
 
C

Csaba Gabor

Randy said:
Read the file in with PHP.
Split it on line boundaries (if it doesn't already) so that you end up
with an array that is the same length as the number of lines of code.

Loop through the array and remove any empty entries (blank lines in the
code), entries that start with \\.

Then you would have to write a loop that would find comments that start
with \* and find the next entry that ended in *\ and remove them.

The above will not work (even if the \ are replaced with /). Consider
the javascript below. What, if anything, should appear on the screen?
Should the FF javascript console show any errors? How, if at all, does
your answer change if all the double quotes are replace by single
quotes?

var foo="\npart 1/*\
/* part2\
// Hi mom";
// This is a comment for sure */ /* but is it a supercomment?"
alert("fu:\n"+foo);//"
//which leads into */ alert("bar:\n"+foo); /* */


// Csaba Gabor from Vienna */"'\
 
R

rh

Dr said:
JRS: In article <[email protected]>, dated
Sun, 14 Aug 2005 21:17:43, seen in JDS
[...]

For a worst(?)-case, consider
eval("X = 3 /* rhubarb's nice \" *" + "/ + 4")
in which the comment closer does not appear in the source file.

It would be easier if the author used a subset of javascript : for
example, no eval, no literal // in a string (use /\/ instead).

Compression of dynamically generated code wouldn't be included in the
requirement for a compression utility, I wouldn't think. Moreover,
recall "eval" (that isn't supposed to be used in any case ;-)) isn't
the only way to dynamically generate code in Javascript.

The only constraint I would anticipate would be that the input source
constitute a syntactically correct program.
[...]

Another approach would be to remove all //-to-line-end, then all /** ...
**/ comment, then all leading and trailing whitespace, then all blank
lines.

[...]

Curious. That approach would select the "//-to-line-end" based on what,
e.g. in the following statement? :

var test = /test////test/ is a RegExp literal followed by //
.test("test");

Also, what do you forsee happening with:

var x = 1 /* //test */ + 2;

or the myriad of literals that might take on the look of Javascript
code or comments?

../rh
 
D

Dr John Stockton

JRS: In article <[email protected]>,
dated Tue, 16 Aug 2005 20:15:28, seen in rh
Dr John Stockton wrote:
Another approach would be to remove all //-to-line-end, then all /** ...
**/ comment, then all leading and trailing whitespace, then all blank
lines.

[...]

Curious. That approach would select the "//-to-line-end" based on what,
e.g. in the following statement? :

var test = /test////test/ is a RegExp literal followed by //
.test("test");

You missed remembering an earlier paragraph, which read :

It would be easier if the author used a subset of javascript :
for example, no eval, no literal // in a string (use /\/
instead).

The effect of your slightly-hard-to-read statement can easily be
obtained, in a manner compatible with my suggestion, by preceding the
comment with a space, which also makes it more legible.


Please don't quote sigs.
 
R

rh

Dr said:
JRS: In article <[email protected]>,
dated Tue, 16 Aug 2005 20:15:28, seen in rh
Dr John Stockton wrote:
Another approach would be to remove all //-to-line-end, then all /** ...
**/ comment, then all leading and trailing whitespace, then all blank
lines.

[...]

Curious. That approach would select the "//-to-line-end" based on what,
e.g. in the following statement? :

var test = /test////test/ is a RegExp literal followed by //
.test("test");

You missed remembering an earlier paragraph, which read :

It would be easier if the author used a subset of javascript :
for example, no eval, no literal // in a string (use /\/
instead).

Not really. It didn't appear to continue to pertain following your
intervening suggestion to check out editors that understand the
language and then moving on to "Another approach". Even so, as I
believe you recognize, the qualification doesn't cover off the second
example:

var x = 1 /* //test */ + 2;
The effect of your slightly-hard-to-read statement can easily be
obtained, in a manner compatible with my suggestion, by preceding the
comment with a space, which also makes it more legible.

Constraints of the nature you suggest make the success of the
compression highly prone to authoring error. Of course, whoever is
creating the utility can decide to impose whatever constraints they
wish. On the other hand, those who choose utilities will do so based,
in part, on constraints they may face.

It's not a completely trivial endeavor to create a compression utility
that properly accomodates literals and comments, but then it really
shouldn't be much beyond, either. It most certainly would be worth the
added effort.
Please don't quote sigs.

The quoted sig was the result of an authoring error. ;-)

../rh
 
A

andreas.maurer1971

Hi,

first of all I'm neither a JavaScript nor a RegExp expert but maybe
with the following steps some of the pros can find a good function to
compact the code:

0) Remove everything from // until the end of the line as long as a
semikolon, maybe with a space before it, is in front of the "//" AND as
long as there is NO escape character in front of the semikolon.
1) Remove all line breaks => you should get a single but very long line
2) Remove 2 or more spaces behind each other and replace them with a
single space => Repeat this step until you have only single spaces
Note: Will cause a problem, if there are two ore more wanted spaces in
a row.
3) Replace everything between /* and */ as long as a semikolon ";" is
directly (maybe followed by a single space) in front of the /* and
there is NO escape character in front of the semikolon.

HTH,

Andy
 
I

Ira Baxter

Tim Roberts said:
Colin McKinnon

It isn't that easy. This won't strip // type comments, and it will screw
up all of the string constants. It isn't rocket science, but you basically
need to implement a miniature Javascript parser to make this work reliably.

Obfuscation may not be necessary, but parsing basically is necessary.
Our JavaScript obfuscator will also compress.
See
http://www.semanticdesigns.com/Products/Obfuscators/ECMAScriptObfuscator.html.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,798
Messages
2,569,649
Members
45,382
Latest member
tallzebra

Latest Threads

Top