Re: list.sort(func) speed

Discussion in 'Python' started by Michael Peuser, Aug 31, 2003.

  1. "mackstann" <> schrieb im Newsbeitrag
    news:...
    > I have the following function that compares two filenames based on their
    > basename only:
    >
    > _basenameCmp = lambda self, *pair: cmp(*map(os.path.basename, pair))



    Please note that all os.path functions are *awfully* slow! If ever possible
    use a nonportable self made scheme.

    It drove me nearly crazy when I detected that os.path split and join
    operations took even longer than raw disk access in a directory scan
    program....

    (The other issue (decorated sorting) is more relevent of course.)

    Kindly
    Michael P
    Michael Peuser, Aug 31, 2003
    #1
    1. Advertising

  2. Michael Peuser

    Peter Otten Guest

    Michael Peuser wrote:

    > It drove me nearly crazy when I detected that os.path split and join
    > operations took even longer than raw disk access in a directory scan
    > program....


    Now, that was a fast disk :) I fear, that some data was already in the
    cache when your improved version was executed. Please post some sample
    code/timings for the unbelieving.

    > Please note that all os.path functions are *awfully* slow! If ever
    > possible use a nonportable self made scheme.


    If ever possible, use a *portable* scheme shipped with the distribution.
    Chances are that it is ahead of your ad hoc solution in terms of
    correctness. If you have written code that greatly improves on the current
    implementations, you might consider submitting a patch...

    > (The other issue (decorated sorting) is more relevent of course.)

    D'accord. Namely, basename() is not the bottleneck here.

    Backslashes-removingly yours,
    Peter
    Peter Otten, Aug 31, 2003
    #2
    1. Advertising

  3. Michael Peuser

    mackstann Guest

    On Sun, Aug 31, 2003 at 11:21:59AM +0200, Peter Otten wrote:
    > > (The other issue (decorated sorting) is more relevent of course.)

    > D'accord. Namely, basename() is not the bottleneck here.


    Not sure exactly what the bottleneck is, but take this code:

    print time.time()
    songs = [ pair[::-1] for pair in
    enumerate(map(os.path.basename, self.songs)) ]
    print time.time()
    songs.sort()
    print time.time()
    self.songs = [ self.songs[index] for basename,index in songs ]
    print time.time()

    and the output is:

    1062322841.2
    1062322841.25
    1062322841.25
    1062322841.26

    So it seems that the first list comp with [::-1] and basename and
    whatnot is the bottleneck, although I'm not sure exactly which aspect of
    it is the bottleneck. The time between the first two time.time()
    outputs is always either .05 or .06 seconds. I added this little
    function:

    def f(fname):
    place = fname.rfind("/")
    if place != -1:
    return fname[place:]
    else:
    return fname

    And changed the aforementioned line to this:

    songs = [ pair[::-1] for pair in
    enumerate(map(f, self.songs)) ]

    This reduced it to .04 seconds. So I can shave about 10-20 milliseconds
    using this, although I tend to avoid doing things like this, since it
    involves more code for me to worry about, and I definitely like to take
    advantage of Python's "batteries" that they say are included. :) (and
    generally work well and are silly to duplicate)

    So basename() is not the bottleneck, but accounts for perhaps 1/6th of
    the time needed overall. I just wonder if the other 5/6th could be
    reduced further by doing something that I'm not thinking of.

    --
    m a c k s t a n n mack @ incise.org http://incise.org
    The sooner all the animals are dead, the sooner we'll find their
    money.
    -- Ed Bluestone, "The National Lampoon"
    mackstann, Aug 31, 2003
    #3
  4. "Peter Otten" <> schrieb im Newsbeitrag
    news:bisf20$na3$01$-online.com...
    > Michael Peuser wrote:
    >
    > > It drove me nearly crazy when I detected that os.path split and join
    > > operations took even longer than raw disk access in a directory scan
    > > program....

    >
    > Now, that was a fast disk :) I fear, that some data was already in the
    > cache when your improved version was executed. Please post some sample
    > code/timings for the unbelieving.


    I did some weeks ago: news:bhd9vg$1s7$05$-online.com...

    >
    > > Please note that all os.path functions are *awfully* slow! If ever
    > > possible use a nonportable self made scheme.

    >
    > If ever possible, use a *portable* scheme shipped with the distribution.
    > Chances are that it is ahead of your ad hoc solution in terms of
    > correctness. If you have written code that greatly improves on the current
    > implementations, you might consider submitting a patch...


    I am definitly on your side what general portability is concerned. However
    the format of file names is quite well defined and had not changed in
    Windows Unix and MacOs for a long time ;-)

    It depends of course on your clients and they generally work with Windows
    and are happy when you say it *could* run on Unix. (I am a little bit
    unhappy about the situation in this NG, where a lot of posters are not aware
    of expectations, requirements and needs of Windows users. There *is*
    competition in the Windows market!)

    >
    > > (The other issue (decorated sorting) is more relevent of course.)

    > D'accord. Namely, basename() is not the bottleneck here.


    Of course not - this was a side isssue. Reducing a factor of a quadratic
    problem is generally better than a linear speed-up. Though it depends on the
    order of magnitude ;-)

    Kindly
    Michael P
    Michael Peuser, Aug 31, 2003
    #4
  5. Michael Peuser

    Peter Otten Guest

    Michael Peuser wrote:

    > I did some weeks ago: news:bhd9vg$1s7$05$-online.com...

    The link doesn't work for me, but I think I found it with google.

    > the format of file names is quite well defined and had not changed in
    > Windows Unix and MacOs for a long time ;-)


    I've taken a look into the (posix) path implementation, and there is a small
    overhead in basename() (it's implemented using os.path.split() which guards
    against special cases, and returns both folder and filename).

    You recommend win32api.FindFiles in the old thread, which is non-portable
    but definitely not a "self made scheme". I have to admit the numbers are
    striking. Wild guess: time is burnt reading directories rather than
    processing file name strings. Anyway, standard Python seems to have room
    for improvement in the area :-(

    Peter
    Peter Otten, Aug 31, 2003
    #5
  6. Michael Peuser

    Peter Otten Guest

    mackstann wrote:

    > So basename() is not the bottleneck, but accounts for perhaps 1/6th of
    > the time needed overall. I just wonder if the other 5/6th could be
    > reduced further by doing something that I'm not thinking of.


    I've tinkered with your code a bit and could not come up with something
    faster :-(

    However, your songs seem to correspond to files, so why not store them in
    your list as (filename, directory) or (filename, fullpath) tuples in the
    first place?

    Personally, I would go with a Song class, where the artist, length etc.
    could successively be added, but that would rather be trading speed for
    features...

    Peter
    Peter Otten, Aug 31, 2003
    #6
  7. "Peter Otten" <> schrieb im Newsbeitrag
    news:bisoke$ql$01$-online.com...

    > You recommend win32api.FindFiles in the old thread, which is non-portable
    > but definitely not a "self made scheme". I have to admit the numbers are
    > striking. Wild guess: time is burnt reading directories rather than
    > processing file name strings. Anyway, standard Python seems to have room
    > for improvement in the area :-(
    >

    One has to be flexible.... I recall a program change I made some weeks ago,
    where I changed os.path.splitext(x) into x[:-4] because I knew the
    extension of the some 2000 filenames I wanted to display on a Tk widget.
    This improves display time from 1,3 to 0,7 secs.

    Kindly
    Michael P
    Michael Peuser, Aug 31, 2003
    #7
  8. Michael Peuser

    Chad Netzer Guest

    On Sun, 2003-08-31 at 13:33, mackstann wrote:

    > No, but the thought of inserting things in order did not quite occur to
    > me. Can you give me an simple example of exactly how to go about this?


    Don't have time for an example at the moment. But the bisect module
    will do it for you.

    help('bisect')
    help('bisect.insort')

    The basic way in which it works (I assume, having not looked at the
    code), is a binary search algorithm, which is very fast.

    --
    Chad Netzer
    Chad Netzer, Aug 31, 2003
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. CRON
    Replies:
    24
    Views:
    200,238
    Adrienne Boswell
    Jun 20, 2006
  2. Johnny
    Replies:
    3
    Views:
    435
    Robert Kern
    Aug 23, 2005
  3. Hari Sekhon
    Replies:
    0
    Views:
    477
    Hari Sekhon
    Jun 20, 2006
  4. Vinko Vrsalovic

    int func() v/s int func(void)

    Vinko Vrsalovic, Jan 21, 2005, in forum: C Programming
    Replies:
    14
    Views:
    1,274
    Villy Kruse
    Jan 24, 2005
  5. Alex Vinokur
    Replies:
    6
    Views:
    337
    Tor Rustad
    Nov 18, 2006
Loading...

Share This Page