sort array, apply rearrangement to second

Discussion in 'Python' started by Victor Eijkhout, Mar 31, 2010.

  1. I have two arrays, made with numpy. The first one has values that I want
    to use as sorting keys; the second one needs to be sorted by those keys.
    Obviously I could turn them into a dictionary of pairs and sort by the
    first member, but I think that's not very efficient, at least in space,
    and this needs to be done as efficiently as possible.

    I could use a hand.

    Victor.
    --
    Victor Eijkhout -- eijkhout at tacc utexas edu
     
    Victor Eijkhout, Mar 31, 2010
    #1
    1. Advertising

  2. * Victor Eijkhout:
    > I have two arrays, made with numpy. The first one has values that I want
    > to use as sorting keys; the second one needs to be sorted by those keys.
    > Obviously I could turn them into a dictionary of pairs and sort by the
    > first member, but I think that's not very efficient, at least in space,
    > and this needs to be done as efficiently as possible.
    >
    > I could use a hand.


    Just do the pairing, but in a 'list', not a dictionary (a dictionary is
    unordered and can't be sorted). You need to keep track of which keys belong to
    which values anyway. And anything in Python is a reference: you're not copying
    the data by creating the pairs. That is, the space overhead is proportional to
    the number of items but is independent of the data size of each item.


    Cheers & hth.,

    - Alf
     
    Alf P. Steinbach, Mar 31, 2010
    #2
    1. Advertising

  3. Victor Eijkhout

    Steve Holden Guest

    Victor Eijkhout wrote:
    > I have two arrays, made with numpy. The first one has values that I want
    > to use as sorting keys; the second one needs to be sorted by those keys.
    > Obviously I could turn them into a dictionary of pairs and sort by the
    > first member, but I think that's not very efficient, at least in space,
    > and this needs to be done as efficiently as possible.
    >
    > I could use a hand.
    >

    Well, my first approach would be to do it as inefficiently as I can ( or
    at least no more efficiently than I can with a simple-minded approach)
    and then take it from there.

    If I believe this is not a premature optimization (a question about
    which I am currently skeptical) I'd suggest conversion to a list of
    pairs rather than a dict.

    Can you use zip() on numpy arrays? That would be the easiest way to
    create the list of pairs.

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    See PyCon Talks from Atlanta 2010 http://pycon.blip.tv/
    Holden Web LLC http://www.holdenweb.com/
    UPCOMING EVENTS: http://holdenweb.eventbrite.com/
     
    Steve Holden, Mar 31, 2010
    #3
  4. Victor Eijkhout

    MRAB Guest

    Victor Eijkhout wrote:
    > I have two arrays, made with numpy. The first one has values that I want
    > to use as sorting keys; the second one needs to be sorted by those keys.
    > Obviously I could turn them into a dictionary of pairs and sort by the
    > first member, but I think that's not very efficient, at least in space,
    > and this needs to be done as efficiently as possible.
    >
    > I could use a hand.
    >

    You could sort a list of the indices, using the first array to provide
    the keys.
     
    MRAB, Mar 31, 2010
    #4
  5. Victor Eijkhout

    Robert Kern Guest

    On 2010-03-30 18:25 , Victor Eijkhout wrote:
    > I have two arrays, made with numpy. The first one has values that I want
    > to use as sorting keys; the second one needs to be sorted by those keys.
    > Obviously I could turn them into a dictionary of pairs and sort by the
    > first member, but I think that's not very efficient, at least in space,
    > and this needs to be done as efficiently as possible.


    second[first.argsort()]

    Ask numpy questions on the numpy mailing list.

    http://www.scipy.org/Mailing_Lists

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
     
    Robert Kern, Mar 31, 2010
    #5
  6. Robert Kern <> wrote:

    > second[first.argsort()]


    Really cool. Thanks.

    > Ask numpy questions on the numpy mailing list.


    I will. I thought that this question would have an answer in a generic
    python idiom.

    Victor.
    --
    Victor Eijkhout -- eijkhout at tacc utexas edu
     
    Victor Eijkhout, Mar 31, 2010
    #6
  7. Victor Eijkhout

    Robert Kern Guest

    On 2010-03-31 13:58 PM, Victor Eijkhout wrote:
    > Robert Kern<> wrote:
    >
    >> second[first.argsort()]

    >
    > Really cool. Thanks.
    >
    >> Ask numpy questions on the numpy mailing list.

    >
    > I will. I thought that this question would have an answer in a generic
    > python idiom.


    When dealing with numpy arrays, the generic Python idiom is often much slower.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
     
    Robert Kern, Mar 31, 2010
    #7
  8. Victor Eijkhout

    Steve Holden Guest

    Victor Eijkhout wrote:
    > Robert Kern <> wrote:
    >
    >> second[first.argsort()]

    >
    > Really cool. Thanks.
    >
    >> Ask numpy questions on the numpy mailing list.

    >
    > I will. I thought that this question would have an answer in a generic
    > python idiom.
    >
    > Victor.


    Not an unreasonable assumption, but it turns out that for most Python
    users (estimate PFTA: 97%) numpy/scipt is esoteric knowledge.

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    See PyCon Talks from Atlanta 2010 http://pycon.blip.tv/
    Holden Web LLC http://www.holdenweb.com/
    UPCOMING EVENTS: http://holdenweb.eventbrite.com/
     
    Steve Holden, Mar 31, 2010
    #8
  9. On Mar 30, 4:25 pm, (Victor Eijkhout) wrote:
    > I have two arrays, made with numpy. The first one has values that I want
    > to use as sorting keys; the second one needs to be sorted by those keys.
    > Obviously I could turn them into a dictionary  of pairs and sort by the
    > first member, but I think that's not very efficient, at least in space,
    > and this needs to be done as efficiently as possible.


    Alf's recommendation is clean and correct. Just make a list of
    tuples.

    FWIW, here's a little hack that does the work for you:

    >>> values = ['A', 'B', 'C', 'D', 'E']
    >>> keys = [50, 20, 40, 10, 30]
    >>> keyiter = iter(keys)
    >>> sorted(values, key=lambda k: next(keyiter))

    ['D', 'B', 'E', 'C', 'A']


    Raymond
     
    Raymond Hettinger, Mar 31, 2010
    #9
  10. Victor Eijkhout

    Steve Howell Guest

    On Mar 31, 1:09 pm, Raymond Hettinger <> wrote:
    > On Mar 30, 4:25 pm, (Victor Eijkhout) wrote:
    >
    > > I have two arrays, made with numpy. The first one has values that I want
    > > to use as sorting keys; the second one needs to be sorted by those keys..
    > > Obviously I could turn them into a dictionary  of pairs and sort by the
    > > first member, but I think that's not very efficient, at least in space,
    > > and this needs to be done as efficiently as possible.

    >
    > Alf's recommendation is clean and correct.  Just make a list of
    > tuples.
    >
    > FWIW, here's a little hack that does the work for you:
    >
    > >>> values = ['A', 'B', 'C', 'D', 'E']
    > >>> keys = [50, 20, 40, 10, 30]
    > >>> keyiter = iter(keys)
    > >>> sorted(values, key=lambda k: next(keyiter))

    >
    > ['D', 'B', 'E', 'C', 'A']
    >


    Another option:

    [values for i in sorted(range(len(keys)), key=lambda i: keys)]

    Sort the indexes according to keys values, then use indexes to get the
    values.

    It might read more clearly when broken out into two lines:

    >>> sorted_indexes = sorted(range(len(keys)), key = lambda i: keys)
    >>> sorted_indexes

    [3, 1, 4, 2, 0]
    >>> [values for i in sorted_indexes]

    ['D', 'B', 'E', 'C', 'A']

    The advantage of Raymond's solution is that he only creates one new
    Python list, whereas my solutions create an intermediate Python list
    of integers. I don't think my solution really is that space-wasteful,
    though, since by the time the second list gets created, any internal
    intermediate lists from CPython's sorted() implementation will
    probably have been cleaned up.
     
    Steve Howell, Apr 1, 2010
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dirk Meusel
    Replies:
    1
    Views:
    3,074
    Dirk Meusel
    Aug 19, 2003
  2. Stefan Siegl
    Replies:
    1
    Views:
    997
    Marrow
    Jul 18, 2003
  3. rkk
    Replies:
    9
    Views:
    847
    CBFalconer
    Sep 24, 2006
  4. Navin
    Replies:
    1
    Views:
    767
    Ken Schaefer
    Sep 9, 2003
  5. yelipolok
    Replies:
    4
    Views:
    292
    John W. Krahn
    Jan 27, 2010
Loading...

Share This Page