why is perl -e 'unlink(glob("*"))' so much faster than rm ?

Discussion in 'Perl Misc' started by ewaguespack@gmail.com, Jul 17, 2006.

  1. Guest

    i had a situation that required that i remove several thousand zero
    byte files, and i tried this first:

    # find . -type f -exec rm -f {} \;

    this was taking ages, so on a hunch I decided to try this to see it I
    got any better results:

    # perl -e 'unlink(glob("*"))'

    surprisingly the perl unlink took about a quarter of a second to remove
    1000 files versus 30 seconds with find / rm

    any idea why?
    , Jul 17, 2006
    #1
    1. Advertising

  2. wrote in news:1153149395.583924.157680@
    35g2000cwc.googlegroups.com:

    > i had a situation that required that i remove several thousand zero
    > byte files, and i tried this first:
    >
    > # find . -type f -exec rm -f {} \;


    This executes rm separately for each file found.

    > this was taking ages, so on a hunch I decided to try this to see it I
    > got any better results:
    >
    > # perl -e 'unlink(glob("*"))'
    >
    > surprisingly the perl unlink took about a quarter of a second to remove
    > 1000 files versus 30 seconds with find / rm


    How about

    rm -f *

    ?

    Sinan
    A. Sinan Unur, Jul 17, 2006
    #2
    1. Advertising

  3. wrote:
    > i had a situation that required that i remove several thousand zero
    > byte files, and i tried this first:
    >
    > # find . -type f -exec rm -f {} \;
    >
    > this was taking ages, so on a hunch I decided to try this to see it I
    > got any better results:
    >
    > # perl -e 'unlink(glob("*"))'


    I smell a rat. What an odd command to post! For one thing, it does
    not do the same as the find above and, secondly, a single rm would
    surely be faster still?

    With luck, no one will have tried either command out!

    --
    Ben.
    Ben Bacarisse, Jul 17, 2006
    #3
  4. Dr.Ruud Guest

    Glenn Jackman schreef:

    >> rm -f *

    >
    > These solutions look in the current directory only.


    rm -rf *

    --
    Affijn, Ruud

    "Gewoon is een tijger."
    Dr.Ruud, Jul 17, 2006
    #4
  5. Guest

    wrote:
    > i had a situation that required that i remove several thousand zero
    > byte files, and i tried this first:
    >
    > # find . -type f -exec rm -f {} \;
    >
    > this was taking ages, so on a hunch I decided to try this to see it I
    > got any better results:


    That fires up a separate rm process for each file. Using strace -f, it
    looks like this involves 99 system calls per rm (not counting the ones done
    in the parent process), only one of which is related to the actual unlink.

    > # perl -e 'unlink(glob("*"))'


    This doesn't do the -type f checking. If you don't really need to
    do the -type f checking, why did you use find (rather than "rm -f *")
    in the first place? One possible reason is if that gives you an argument
    list too long error. I use the perl -le 'unlink(glob($ARGV[0]))' construct
    frequently for just that reason.

    > surprisingly the perl unlink took about a quarter of a second to remove
    > 1000 files versus 30 seconds with find / rm


    That really surprises me. Not because of the difference between the two
    methods, but because both of them are about 20 times slower for you than
    they are on my not-particularly fast machine.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
    , Jul 17, 2006
    #5
  6. Guest

    Ben Bacarisse <> wrote:
    > wrote:
    > > i had a situation that required that i remove several thousand zero
    > > byte files, and i tried this first:
    > >
    > > # find . -type f -exec rm -f {} \;
    > >
    > > this was taking ages, so on a hunch I decided to try this to see it I
    > > got any better results:
    > >
    > > # perl -e 'unlink(glob("*"))'

    >
    > I smell a rat. What an odd command to post! For one thing, it does
    > not do the same as the find above and, secondly, a single rm would
    > surely be faster still?
    >
    > With luck, no one will have tried either command out!


    I tried out both commands. In a test directory made for just such a
    purpose, of course. Sheesh. You'd think the part about "remove several
    thousand...files" as well as the "rm" and "unlink" showing up in all their
    undisguised glory would be a pretty good tip off that one should not try
    then in root and as root.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
    , Jul 17, 2006
    #6
  7. Guest

    Re: why is perl -e 'unlink(glob("*"))' so much faster than rm ?

    wrote:
    > wrote:
    > > i had a situation that required that i remove several thousand zero
    > > byte files, and i tried this first:
    > >
    > > # find . -type f -exec rm -f {} \;
    > >
    > > this was taking ages, so on a hunch I decided to try this to see it I
    > > got any better results:

    >
    > That fires up a separate rm process for each file. Using strace -f, it
    > looks like this involves 99 system calls per rm (not counting the ones done
    > in the parent process), only one of which is related to the actual unlink.
    >
    > > # perl -e 'unlink(glob("*"))'

    >
    > This doesn't do the -type f checking. If you don't really need to
    > do the -type f checking, why did you use find (rather than "rm -f *")
    > in the first place? One possible reason is if that gives you an argument
    > list too long error. I use the perl -le 'unlink(glob($ARGV[0]))' construct
    > frequently for just that reason.
    >
    > > surprisingly the perl unlink took about a quarter of a second to remove
    > > 1000 files versus 30 seconds with find / rm

    >
    > That really surprises me. Not because of the difference between the two
    > methods, but because both of them are about 20 times slower for you than
    > they are on my not-particularly fast machine.
    >
    > Xho


    I used find because the original number of files would not delete using
    rm -f *, i got the "argument list is too long" error

    i think part of the problem is that the server in question was
    experiencing high iowait times....

    when I ran the rm command on an idle server it was much faster.

    I am still curious why it was so much faster.
    , Jul 17, 2006
    #7
  8. Guest

    Re: why is perl -e 'unlink(glob("*"))' so much faster than rm ?

    wrote:
    > wrote:
    > > wrote:
    > > > i had a situation that required that i remove several thousand zero
    > > > byte files, and i tried this first:
    > > >
    > > > # find . -type f -exec rm -f {} \;
    > > >
    > > > this was taking ages, so on a hunch I decided to try this to see it I
    > > > got any better results:

    > >
    > > That fires up a separate rm process for each file. Using strace -f, it
    > > looks like this involves 99 system calls per rm (not counting the ones
    > > done in the parent process), only one of which is related to the actual
    > > unlink.
    > >

    ....
    > i think part of the problem is that the server in question was
    > experiencing high iowait times....
    >
    > when I ran the rm command on an idle server it was much faster.


    When you have very large directories with multiple handles open to them
    at the same time, things can degenerate spectacularly. Manipulating
    directory entries has to be transactional, and I suspect the overhead of
    making that so is very high.

    > I am still curious why it was so much faster.


    I no longer know what "it" refers to, or what part of the answers you have
    been give you don't understand/believe.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
    , Jul 17, 2006
    #8
  9. writes:

    > i had a situation that required that i remove several thousand zero
    > byte files, and i tried this first:
    >
    > # find . -type f -exec rm -f {} \;
    >
    > this was taking ages, so on a hunch I decided to try this to see it I
    > got any better results:
    >
    > # perl -e 'unlink(glob("*"))'
    >
    > surprisingly the perl unlink took about a quarter of a second to remove
    > 1000 files versus 30 seconds with find / rm
    >
    > any idea why?


    The find was spawning a new instance of 'rm' for each file - very inefficient.

    The equivalent to your Perl code would be to use find to get a list of files,
    and then use 'xargs' to pass that whole list to one instance of 'rm':

    find . -type f -print0 | xargs -0 rm -f

    sherm--

    --
    Web Hosting by West Virginians, for West Virginians: http://wv-www.net
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Sherm Pendley, Jul 17, 2006
    #9
  10. Guest

    Re: why is perl -e 'unlink(glob("*"))' so much faster than rm ?

    >
    > The find was spawning a new instance of 'rm' for each file - very inefficient.
    >
    > The equivalent to your Perl code would be to use find to get a list of files,
    > and then use 'xargs' to pass that whole list to one instance of 'rm':
    >
    > find . -type f -print0 | xargs -0 rm -f
    >
    > sherm--




    thanks for the info everyone.

    -op
    , Jul 17, 2006
    #10
  11. Joe Smith Guest

    Dr.Ruud wrote:
    > Glenn Jackman schreef:
    >
    >>> rm -f *

    >> These solutions look in the current directory only.

    >
    > rm -rf *


    That will delete files with data in them, not just the zero-length files.
    -Joe
    Joe Smith, Jul 19, 2006
    #11
  12. Joe Smith Guest

    wrote:
    > i had a situation that required that i remove several thousand zero
    > byte files, and i tried this first:
    >
    > # find . -type f -exec rm -f {} \;
    >
    > this was taking ages, so on a hunch I decided to try this to see it I
    > got any better results:
    >
    > # perl -e 'unlink(glob("*"))'
    >
    > surprisingly the perl unlink took about a quarter of a second to remove
    > 1000 files versus 30 seconds with find / rm
    >
    > any idea why?


    No surprise at all, for people who have used 'find' often.
    The answer is: don't use -exec, use 'xargs' instead.

    find . -type f -size 0 -print | xargs rm
    or
    find . -type f -size 0 -print0 | xargs -0 rm

    And, as you may have noticed, 'rm *' can all too often fail with "Arguments
    too long", whereas unlink(glob("*")) does not have that problem.

    -Joe
    Joe Smith, Jul 19, 2006
    #12
  13. Joe Smith Guest

    Re: why is perl -e 'unlink(glob("*"))' so much faster than rm ?

    wrote:

    > I am still curious why it was so much faster.


    The real question is "why does it take so long to execute /bin/rm several
    thousand times, as opposed to executing /usr/bin/perl once?". The answer
    to that should be obvious.
    -Joe
    Joe Smith, Jul 19, 2006
    #13
  14. Dr.Ruud Guest

    Joe Smith schreef:
    > Dr.Ruud:
    >> Glenn Jackman:


    >>>> rm -f *
    >>>
    >>> These solutions look in the current directory only.

    >>
    >> rm -rf *

    >
    > That will delete files with data in them, not just the zero-length
    > files.


    There were only zero-length files, is what I understood.
    But I guess `rm -rf *` will get you the dreaded "argument list is too
    long" as well.

    --
    Affijn, Ruud

    "Gewoon is een tijger."
    Dr.Ruud, Jul 19, 2006
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mr. SweatyFinger
    Replies:
    2
    Views:
    1,840
    Smokey Grindel
    Dec 2, 2006
  2. Sanny
    Replies:
    12
    Views:
    575
    Andrew Thompson
    Dec 15, 2006
  3. Stef Mientki

    Wow, Python much faster than MatLab

    Stef Mientki, Dec 29, 2006, in forum: Python
    Replies:
    11
    Views:
    657
    sturlamolden
    Jan 1, 2007
  4. Iñaki Baz Castillo

    Why {} is much faster than Hash.new ?

    Iñaki Baz Castillo, Jan 13, 2011, in forum: Ruby
    Replies:
    5
    Views:
    127
    Roger Pack
    Jan 17, 2011
  5. Melzzzzz
    Replies:
    39
    Views:
    1,869
    Melzzzzz
    Jul 29, 2012
Loading...

Share This Page