why is perl -e 'unlink(glob("*"))' so much faster than rm ?

E

ewaguespack

i had a situation that required that i remove several thousand zero
byte files, and i tried this first:

# find . -type f -exec rm -f {} \;

this was taking ages, so on a hunch I decided to try this to see it I
got any better results:

# perl -e 'unlink(glob("*"))'

surprisingly the perl unlink took about a quarter of a second to remove
1000 files versus 30 seconds with find / rm

any idea why?
 
A

A. Sinan Unur

(e-mail address removed) wrote in 35g2000cwc.googlegroups.com:
i had a situation that required that i remove several thousand zero
byte files, and i tried this first:

# find . -type f -exec rm -f {} \;

This executes rm separately for each file found.
this was taking ages, so on a hunch I decided to try this to see it I
got any better results:

# perl -e 'unlink(glob("*"))'

surprisingly the perl unlink took about a quarter of a second to remove
1000 files versus 30 seconds with find / rm

How about

rm -f *

?

Sinan
 
B

Ben Bacarisse

i had a situation that required that i remove several thousand zero
byte files, and i tried this first:

# find . -type f -exec rm -f {} \;

this was taking ages, so on a hunch I decided to try this to see it I
got any better results:

# perl -e 'unlink(glob("*"))'

I smell a rat. What an odd command to post! For one thing, it does
not do the same as the find above and, secondly, a single rm would
surely be faster still?

With luck, no one will have tried either command out!
 
X

xhoster

i had a situation that required that i remove several thousand zero
byte files, and i tried this first:

# find . -type f -exec rm -f {} \;

this was taking ages, so on a hunch I decided to try this to see it I
got any better results:

That fires up a separate rm process for each file. Using strace -f, it
looks like this involves 99 system calls per rm (not counting the ones done
in the parent process), only one of which is related to the actual unlink.
# perl -e 'unlink(glob("*"))'

This doesn't do the -type f checking. If you don't really need to
do the -type f checking, why did you use find (rather than "rm -f *")
in the first place? One possible reason is if that gives you an argument
list too long error. I use the perl -le 'unlink(glob($ARGV[0]))' construct
frequently for just that reason.
surprisingly the perl unlink took about a quarter of a second to remove
1000 files versus 30 seconds with find / rm

That really surprises me. Not because of the difference between the two
methods, but because both of them are about 20 times slower for you than
they are on my not-particularly fast machine.

Xho
 
X

xhoster

Ben Bacarisse said:
I smell a rat. What an odd command to post! For one thing, it does
not do the same as the find above and, secondly, a single rm would
surely be faster still?

With luck, no one will have tried either command out!

I tried out both commands. In a test directory made for just such a
purpose, of course. Sheesh. You'd think the part about "remove several
thousand...files" as well as the "rm" and "unlink" showing up in all their
undisguised glory would be a pretty good tip off that one should not try
then in root and as root.

Xho
 
E

ewaguespack

i had a situation that required that i remove several thousand zero
byte files, and i tried this first:

# find . -type f -exec rm -f {} \;

this was taking ages, so on a hunch I decided to try this to see it I
got any better results:

That fires up a separate rm process for each file. Using strace -f, it
looks like this involves 99 system calls per rm (not counting the ones done
in the parent process), only one of which is related to the actual unlink.
# perl -e 'unlink(glob("*"))'

This doesn't do the -type f checking. If you don't really need to
do the -type f checking, why did you use find (rather than "rm -f *")
in the first place? One possible reason is if that gives you an argument
list too long error. I use the perl -le 'unlink(glob($ARGV[0]))' construct
frequently for just that reason.
surprisingly the perl unlink took about a quarter of a second to remove
1000 files versus 30 seconds with find / rm

That really surprises me. Not because of the difference between the two
methods, but because both of them are about 20 times slower for you than
they are on my not-particularly fast machine.

Xho

I used find because the original number of files would not delete using
rm -f *, i got the "argument list is too long" error

i think part of the problem is that the server in question was
experiencing high iowait times....

when I ran the rm command on an idle server it was much faster.

I am still curious why it was so much faster.
 
X

xhoster

....
i think part of the problem is that the server in question was
experiencing high iowait times....

when I ran the rm command on an idle server it was much faster.

When you have very large directories with multiple handles open to them
at the same time, things can degenerate spectacularly. Manipulating
directory entries has to be transactional, and I suspect the overhead of
making that so is very high.
I am still curious why it was so much faster.

I no longer know what "it" refers to, or what part of the answers you have
been give you don't understand/believe.

Xho
 
S

Sherm Pendley

i had a situation that required that i remove several thousand zero
byte files, and i tried this first:

# find . -type f -exec rm -f {} \;

this was taking ages, so on a hunch I decided to try this to see it I
got any better results:

# perl -e 'unlink(glob("*"))'

surprisingly the perl unlink took about a quarter of a second to remove
1000 files versus 30 seconds with find / rm

any idea why?

The find was spawning a new instance of 'rm' for each file - very inefficient.

The equivalent to your Perl code would be to use find to get a list of files,
and then use 'xargs' to pass that whole list to one instance of 'rm':

find . -type f -print0 | xargs -0 rm -f

sherm--
 
E

ewaguespack

The find was spawning a new instance of 'rm' for each file - very inefficient.

The equivalent to your Perl code would be to use find to get a list of files,
and then use 'xargs' to pass that whole list to one instance of 'rm':

find . -type f -print0 | xargs -0 rm -f

sherm--



thanks for the info everyone.

-op
 
J

Joe Smith

i had a situation that required that i remove several thousand zero
byte files, and i tried this first:

# find . -type f -exec rm -f {} \;

this was taking ages, so on a hunch I decided to try this to see it I
got any better results:

# perl -e 'unlink(glob("*"))'

surprisingly the perl unlink took about a quarter of a second to remove
1000 files versus 30 seconds with find / rm

any idea why?

No surprise at all, for people who have used 'find' often.
The answer is: don't use -exec, use 'xargs' instead.

find . -type f -size 0 -print | xargs rm
or
find . -type f -size 0 -print0 | xargs -0 rm

And, as you may have noticed, 'rm *' can all too often fail with "Arguments
too long", whereas unlink(glob("*")) does not have that problem.

-Joe
 
J

Joe Smith

I am still curious why it was so much faster.

The real question is "why does it take so long to execute /bin/rm several
thousand times, as opposed to executing /usr/bin/perl once?". The answer
to that should be obvious.
-Joe
 
D

Dr.Ruud

Joe Smith schreef:
Dr.Ruud:

That will delete files with data in them, not just the zero-length
files.

There were only zero-length files, is what I understood.
But I guess `rm -rf *` will get you the dreaded "argument list is too
long" as well.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,019
Latest member
RoxannaSta

Latest Threads

Top