why is perl -e 'unlink(glob("*"))' so much faster than rm ?

ewaguespack · Jul 17, 2006

i had a situation that required that i remove several thousand zero
byte files, and i tried this first:

# find . -type f -exec rm -f {} \;

this was taking ages, so on a hunch I decided to try this to see it I
got any better results:

# perl -e 'unlink(glob("*"))'

surprisingly the perl unlink took about a quarter of a second to remove
1000 files versus 30 seconds with find / rm

any idea why?

A. Sinan Unur · Jul 17, 2006

(e-mail address removed) wrote in 35g2000cwc.googlegroups.com:

i had a situation that required that i remove several thousand zero
byte files, and i tried this first:

# find . -type f -exec rm -f {} \;

This executes rm separately for each file found.

this was taking ages, so on a hunch I decided to try this to see it I
got any better results:

# perl -e 'unlink(glob("*"))'

surprisingly the perl unlink took about a quarter of a second to remove
1000 files versus 30 seconds with find / rm

How about

rm -f *

?

Sinan

Ben Bacarisse · Jul 17, 2006

i had a situation that required that i remove several thousand zero
byte files, and i tried this first:

# find . -type f -exec rm -f {} \;

this was taking ages, so on a hunch I decided to try this to see it I
got any better results:

# perl -e 'unlink(glob("*"))'

I smell a rat. What an odd command to post! For one thing, it does
not do the same as the find above and, secondly, a single rm would
surely be faster still?

With luck, no one will have tried either command out!

Dr.Ruud · Jul 17, 2006

Glenn Jackman schreef:

These solutions look in the current directory only.

rm -rf *

xhoster · Jul 17, 2006

i had a situation that required that i remove several thousand zero
byte files, and i tried this first:

# find . -type f -exec rm -f {} \;

this was taking ages, so on a hunch I decided to try this to see it I
got any better results:

That fires up a separate rm process for each file. Using strace -f, it
looks like this involves 99 system calls per rm (not counting the ones done
in the parent process), only one of which is related to the actual unlink.

# perl -e 'unlink(glob("*"))'

This doesn't do the -type f checking. If you don't really need to
do the -type f checking, why did you use find (rather than "rm -f *")
in the first place? One possible reason is if that gives you an argument
list too long error. I use the perl -le 'unlink(glob($ARGV[0]))' construct
frequently for just that reason.

surprisingly the perl unlink took about a quarter of a second to remove
1000 files versus 30 seconds with find / rm

That really surprises me. Not because of the difference between the two
methods, but because both of them are about 20 times slower for you than
they are on my not-particularly fast machine.

Xho

xhoster · Jul 17, 2006

Ben Bacarisse said:
I smell a rat. What an odd command to post! For one thing, it does
not do the same as the find above and, secondly, a single rm would
surely be faster still?

With luck, no one will have tried either command out!

I tried out both commands. In a test directory made for just such a
purpose, of course. Sheesh. You'd think the part about "remove several
thousand...files" as well as the "rm" and "unlink" showing up in all their
undisguised glory would be a pretty good tip off that one should not try
then in root and as root.

Xho

ewaguespack · Jul 17, 2006

i had a situation that required that i remove several thousand zero
byte files, and i tried this first:

# find . -type f -exec rm -f {} \;

this was taking ages, so on a hunch I decided to try this to see it I
got any better results:

Click to expand...

That fires up a separate rm process for each file. Using strace -f, it
looks like this involves 99 system calls per rm (not counting the ones done
in the parent process), only one of which is related to the actual unlink.

# perl -e 'unlink(glob("*"))'

Click to expand...

This doesn't do the -type f checking. If you don't really need to
do the -type f checking, why did you use find (rather than "rm -f *")
in the first place? One possible reason is if that gives you an argument
list too long error. I use the perl -le 'unlink(glob($ARGV[0]))' construct
frequently for just that reason.

surprisingly the perl unlink took about a quarter of a second to remove
1000 files versus 30 seconds with find / rm

Click to expand...

That really surprises me. Not because of the difference between the two
methods, but because both of them are about 20 times slower for you than
they are on my not-particularly fast machine.

Xho

I used find because the original number of files would not delete using
rm -f *, i got the "argument list is too long" error

i think part of the problem is that the server in question was
experiencing high iowait times....

when I ran the rm command on an idle server it was much faster.

I am still curious why it was so much faster.

xhoster · Jul 17, 2006

....
i think part of the problem is that the server in question was
experiencing high iowait times....

when I ran the rm command on an idle server it was much faster.

When you have very large directories with multiple handles open to them
at the same time, things can degenerate spectacularly. Manipulating
directory entries has to be transactional, and I suspect the overhead of
making that so is very high.

I am still curious why it was so much faster.

I no longer know what "it" refers to, or what part of the answers you have
been give you don't understand/believe.

Xho

Sherm Pendley · Jul 17, 2006

i had a situation that required that i remove several thousand zero
byte files, and i tried this first:

# find . -type f -exec rm -f {} \;

this was taking ages, so on a hunch I decided to try this to see it I
got any better results:

# perl -e 'unlink(glob("*"))'

surprisingly the perl unlink took about a quarter of a second to remove
1000 files versus 30 seconds with find / rm

any idea why?

The find was spawning a new instance of 'rm' for each file - very inefficient.

The equivalent to your Perl code would be to use find to get a list of files,
and then use 'xargs' to pass that whole list to one instance of 'rm':

find . -type f -print0 | xargs -0 rm -f

sherm--

ewaguespack · Jul 17, 2006

The find was spawning a new instance of 'rm' for each file - very inefficient.

The equivalent to your Perl code would be to use find to get a list of files,
and then use 'xargs' to pass that whole list to one instance of 'rm':

find . -type f -print0 | xargs -0 rm -f

sherm--

thanks for the info everyone.

-op

Joe Smith · Jul 19, 2006

Dr.Ruud said:
Glenn Jackman schreef:

rm -rf *

That will delete files with data in them, not just the zero-length files.
-Joe

Joe Smith · Jul 19, 2006

i had a situation that required that i remove several thousand zero
byte files, and i tried this first:

# find . -type f -exec rm -f {} \;

this was taking ages, so on a hunch I decided to try this to see it I
got any better results:

# perl -e 'unlink(glob("*"))'

surprisingly the perl unlink took about a quarter of a second to remove
1000 files versus 30 seconds with find / rm

any idea why?

No surprise at all, for people who have used 'find' often.
The answer is: don't use -exec, use 'xargs' instead.

find . -type f -size 0 -print | xargs rm
or
find . -type f -size 0 -print0 | xargs -0 rm

And, as you may have noticed, 'rm *' can all too often fail with "Arguments
too long", whereas unlink(glob("*")) does not have that problem.

-Joe

Joe Smith · Jul 19, 2006

I am still curious why it was so much faster.

The real question is "why does it take so long to execute /bin/rm several
thousand times, as opposed to executing /usr/bin/perl once?". The answer
to that should be obvious.
-Joe

Dr.Ruud · Jul 19, 2006

Joe Smith schreef:

Dr.Ruud:

That will delete files with data in them, not just the zero-length
files.

There were only zero-length files, is what I understood.
But I guess `rm -rf *` will get you the dreaded "argument list is too
long" as well.

Why the Perl docs are so great was why the perl docs suck	1	Oct 16, 2005
Is perl 5.8 slower than 5.005_03?	2	Dec 4, 2003
FAQ 3.23 Can I write useful Perl programs on the command line?	0	Mar 5, 2011
PHP/Perl/Unix Virus: delete config.php files asap	2	Aug 30, 2006
compiling perl 5.8.7 on Solaris 8	3	Nov 17, 2005
Here is my Perl "skeleton" (starter) script (what do you use?)	8	Sep 13, 2005
How bad is $'? (Was: "Get substring of line")	4	Jan 18, 2005
Safe Merge module	0	Apr 1, 2006

why is perl -e 'unlink(glob("*"))' so much faster than rm ?

ewaguespack

A. Sinan Unur

Ben Bacarisse

Dr.Ruud

xhoster

xhoster

ewaguespack

xhoster

Sherm Pendley

ewaguespack

Joe Smith

Joe Smith

Joe Smith

Dr.Ruud

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads