$zip->AddTreeMatching() speed?

P

Patrick Flaherty

Hi,

Using perl to archive a bunch of things in a bunch of locations (into a ZIP
file).

Was using functions (for each tree) such as this:

sub process_file_develtree {
$this_file = $File::Find::name;
if ($this_file =~ m/.*\.(bas$|cpp$|pl$|[ce]$|xml$|sql$|vba$|vbs$)/i) {
print $this_file . "\n";
$zip->addFile($this_file);
}
}

The above, I found, gave me absolute pathnames in the ZIP file. And on my
target machine I never found a way of unziping except to do an unzip -l. Get
the name and then extract to stdout and capture the file that way.

Looking then more closely at the ZIP module, it _seems_ the only way to get
relative pathnames is to use

$zip->addTreeMatching( '.', undef,
'\.(bas$|cpp$|pl$|[ce]$|xml$|sql$|vba$|vbs$)'; );

The undef being the key (to relative pathnames).

This works. But it seems _much_ slower.

Explanations? Ideas?

thanx.

pat
 
A

A. Sinan Unur

Using perl to archive a bunch of things in a bunch of locations (into
a ZIP file).

Was using functions (for each tree) such as this:

It would have been better if you had posted a short but complete script.
sub process_file_develtree {
$this_file = $File::Find::name;
if ($this_file =~
m/.*\.(bas$|cpp$|pl$|[ce]$|xml$|sql$|vba$|vbs$)/i) {

This is unnecessarily hard to read:

/\.(bas|cpp|pl|[ce]|xml|sql|vba|vbc)\z/i

should better.
print $this_file . "\n";
$zip->addFile($this_file);
}
}

The above, I found, gave me absolute pathnames in the ZIP file. And
on my target machine I never found a way of unziping except to do an
unzip -l. Get the name and then extract to stdout and capture the
file that way.

Looking then more closely at the ZIP module, it _seems_ the only way
to get relative pathnames

The fact that you want relative pathnames means all of these files live
under the same root. What is wrong with a chdir to that directory?
$zip->addTreeMatching( '.', undef,
'\.(bas$|cpp$|pl$|[ce]$|xml$|sql$|vba$|vbs$)'; );

Why not mention which module you are using? I am going to assume
Archive::Zip. I am also not going to bother with writing separate
implementations and profiling them, but instead take your word that this
is slow.

In that case, I wonder why you don't want to use the second, optional,
argument to addFile:

#!/usr/bin/perl

use strict;
use warnings;

use Archive::Zip qw( :ERROR_CODES :CONSTANTS );
use File::Find;
use File::Spec::Functions qw( catfile canonpath );

chdir or die "Cannot chdir to $ENV{HOME}: $!";

my $zip = Archive::Zip->new;

find(\&zip_adder, '.');

$zip->writeToFileNamed('test.zip') == AZ_OK
or die "Cannot write to test.zip: $!";

sub zip_adder {
my $this = canonpath $File::Find::name;
if ( $this =~ /\.pl\z/i ) {
$zip->addFile(catfile($ENV{HOME}, $this), $this)
or warn "Cannot add $this\n";
}
}

__END__

D:\Home\asu1\UseNet\clpmisc> zip-test.pl

D:\Home\asu1\UseNet\clpmisc> unzip -l ..\..\test.zip

....

308 11/10/05 17:48 UseNet/clpmisc/proc/mon.pl
676 11/10/05 18:07 UseNet/clpmisc/proc/proc.pl
339 02/21/05 18:10 UseNet/clpmisc/s/c.pl
368 02/21/05 18:10 UseNet/clpmisc/s/s.pl
774 04/14/05 13:11 UseNet/clpmisc/Scalar-Util-Clone-0.04/Makefile.PL
112 01/31/06 17:01 UseNet/clpmisc/t/t.pl
544 03/15/05 15:18 UseNet/clpmisc/T1/Makefile.PL
383 01/25/05 15:34 UseNet/clpmisc/test/myren.pl

....
 
P

Patrick Flaherty

Hi,

thanx for your response. Mine below.

It would have been better if you had posted a short but complete script.

The script itself it not short (it does a series of locations). I tried to
determine what an intelligle funcitonal block would be. Apparently you don't
fully agree.
sub process_file_develtree {
$this_file = $File::Find::name;
if ($this_file =~
m/.*\.(bas$|cpp$|pl$|[ce]$|xml$|sql$|vba$|vbs$)/i) {

This is unnecessarily hard to read:

/\.(bas|cpp|pl|[ce]|xml|sql|vba|vbc)\z/i

should better.

Don't quite understand your point here. Mostly you seem to taking the '$'
(end-of-name)s off.
The fact that you want relative pathnames means all of these files live
under the same root. What is wrong with a chdir to that directory?

Yes that's exactly what I use: chdir. (in my new script that is [new being
addTreeMatching() as opposed to old addFile()] ).
$zip->addTreeMatching( '.', undef,
'\.(bas$|cpp$|pl$|[ce]$|xml$|sql$|vba$|vbs$)'; );

Why not mention which module you are using? I am going to assume
Archive::Zip. I am also not going to bother with writing separate
implementations and profiling them, but instead take your word that this
is slow.

Yes I'm using Archive::Zip
In that case, I wonder why you don't want to use the second, optional,
argument to addFile:

#!/usr/bin/perl

use strict;
use warnings;

use Archive::Zip qw( :ERROR_CODES :CONSTANTS );
use File::Find;
use File::Spec::Functions qw( catfile canonpath );

chdir or die "Cannot chdir to $ENV{HOME}: $!";

my $zip = Archive::Zip->new;

find(\&zip_adder, '.');

$zip->writeToFileNamed('test.zip') == AZ_OK
or die "Cannot write to test.zip: $!";

sub zip_adder {
my $this = canonpath $File::Find::name;
if ( $this =~ /\.pl\z/i ) {
$zip->addFile(catfile($ENV{HOME}, $this), $this)
or warn "Cannot add $this\n";
}
}

__END__

D:\Home\asu1\UseNet\clpmisc> zip-test.pl

D:\Home\asu1\UseNet\clpmisc> unzip -l ..\..\test.zip

...

308 11/10/05 17:48 UseNet/clpmisc/proc/mon.pl
676 11/10/05 18:07 UseNet/clpmisc/proc/proc.pl
339 02/21/05 18:10 UseNet/clpmisc/s/c.pl
368 02/21/05 18:10 UseNet/clpmisc/s/s.pl
774 04/14/05 13:11 UseNet/clpmisc/Scalar-Util-Clone-0.04/Makefile.PL
112 01/31/06 17:01 UseNet/clpmisc/t/t.pl
544 03/15/05 15:18 UseNet/clpmisc/T1/Makefile.PL
383 01/25/05 15:34 UseNet/clpmisc/test/myren.pl

...

Ah. The second, optional argument to AddFile - that would seem to be the key
(um so to speak).

Thanx. I'll give it a try.

pat
 
G

George

Patrick said:
m/.*\.(bas$|cpp$|pl$|[ce]$|xml$|sql$|vba$|vbs$)/i) {

A. Sinan Unur says...
This is unnecessarily hard to read:

/\.(bas|cpp|pl|[ce]|xml|sql|vba|vbc)\z/i

should better.

Patrick said:
Don't quite understand your point here. Mostly you seem to taking the '$'
(end-of-name)s off.

....and replacing them with one single "\z" at the end.

I find Sinan's regular expression easier to read and it's equivalent to
the original regular expression ===> except for the subtle difference
between "$" (which matches the end or before newline at the end) and
"\z" (which matches only at the end), but I don't think this matters
very much.
 
A

A. Sinan Unur

Patrick said:
m/.*\.(bas$|cpp$|pl$|[ce]$|xml$|sql$|vba$|vbs$)/i) {

A. Sinan Unur says...
This is unnecessarily hard to read:

/\.(bas|cpp|pl|[ce]|xml|sql|vba|vbc)\z/i

should better.

Patrick said:
Don't quite understand your point here. Mostly you seem to taking
the '$' (end-of-name)s off.

...and replacing them with one single "\z" at the end.

And replacing the completely unnecessary .* at the beginning.
I find Sinan's regular expression easier to read and it's equivalent
to the original regular expression ===> except for the subtle
difference between "$" (which matches the end or before newline at the
end) and "\z" (which matches only at the end), but I don't think this
matters very much.

It depends on whether one is comfortable with occasionally accepting
names such as "test.bas\n" into the archive.

By the way, note that I should have used non-capturing parantheses
above.

Sinan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,906
Latest member
SkinfixSkintag

Latest Threads

Top