Set like feature

Discussion in 'Python' started by Hari Pulapaka, Nov 15, 2004.

  1. Hi,

    I have a list of space delimited strings ending in a newline.
    Eg: a = ['a sfds sdf s df 34 ew\n', 'df sdf s f s ssf\n']

    Now inside each row, I have a space delimited list of fields.

    Now I want to compare the fields in each row of the array and see which
    fields do not match.

    Think of it as a 2 dimensional array of size mn, and comparing each
    each element on a column by column basis.

    I am using python2.2 so no sets. Can anyone think of an efficient way
    to do this?

    Thanks,

    Hari
    Hari Pulapaka, Nov 15, 2004
    #1
    1. Advertising

  2. Hari  Pulapaka

    Mitja Guest

    On 15 Nov 2004 13:01:52 -0800, Hari Pulapaka <> wrote:

    > Hi,
    >
    > I have a list of space delimited strings ending in a newline.
    > Eg: a = ['a sfds sdf s df 34 ew\n', 'df sdf s f s ssf\n']
    >
    > Now inside each row, I have a space delimited list of fields.
    >
    > Now I want to compare the fields in each row of the array and see which
    > fields do not match.
    >
    > Think of it as a 2 dimensional array of size mn, and comparing each
    > each element on a column by column basis.
    >
    > I am using python2.2 so no sets. Can anyone think of an efficient way
    > to do this?


    If I understand the problem correctly, splitting the lines up and sorting
    them before comparison _is_ much better than a naive approach, though I
    don't know if that's what's best.

    --
    Mitja
    Mitja, Nov 15, 2004
    #2
    1. Advertising

  3. Hari Pulapaka <> wrote:

    > I have a list of space delimited strings ending in a newline.
    > Eg: a = ['a sfds sdf s df 34 ew\n', 'df sdf s f s ssf\n']
    >
    > Now inside each row, I have a space delimited list of fields.
    >
    > Now I want to compare the fields in each row of the array and see which
    > fields do not match.
    >
    > Think of it as a 2 dimensional array of size mn, and comparing each
    > each element on a column by column basis.
    >
    > I am using python2.2 so no sets. Can anyone think of an efficient way
    > to do this?


    Do you want to compare corresponding fields? That's the only way I can
    read that 'column by column basis', and thus I don't see what sets could
    possibly have to do with it.

    Do you want to compare each row with every other row? I also note in
    your example that the number of fields in each row appear to be
    variable, so how do you want to deal with 'missing' fields?

    Too many unanswered questions, I guess. But for some specified set of
    answers to those question, you might do...:

    def compare_fields(i, j, base, other):
    for k, f1, f2 in zip(xrange(sys.maxint), base, other):
    if f1 != f2:
    print 'DIFF', i, j, k, repr(f1), repr(f2)

    def lots_of_compares(list_of_strings):
    list_of_lists_of_fields = [row.split() for row in list_of_strings]
    num_rows = len(list_of_lists_of_fields)
    for i in xrange(num_rows):
    base_row = list_of_lists_of_fields
    for j in xrange(i+1, num_rows):
    compare_fields(i, j, base_row, list_of_lists_of_fields[j])

    You can do better with enumerate, itertools and other things which 2.2
    didn't have, but sets wouldn't help. Now, I hope this clarifies the
    many unanswered questions which your 'specs' leave open, so you can work
    out exactly what you want.

    And, btw: upgrate to 2.4. Sets or no sets, the performance enhancement
    by itself will be vastly sufficient to repay whatever inconvenience you
    think the upgrade might cause.


    Alex
    Alex Martelli, Nov 15, 2004
    #3
  4. Alex Martelli wrote:
    >
    > Do you want to compare corresponding fields? That's the only way I

    can
    > read that 'column by column basis', and thus I don't see what sets

    could
    > possibly have to do with it.
    >
    > Do you want to compare each row with every other row? I also note in
    > your example that the number of fields in each row appear to be
    > variable, so how do you want to deal with 'missing' fields?


    I want to comapre every element in each row with the element in the
    remaining rows having the same column position. The rows need not have
    the same number of elements, in which case I have to do some more
    thinking :)

    I was thinking of making each row of the array as a set and then
    comparing each row of the array with the compare function being the set
    intersection operation.

    You have pretty much captured what I was thinking, and my solution is
    also similar to what you showed.


    > Too many unanswered questions, I guess. But for some specified set

    of
    > answers to those question, you might do...:
    >
    > def compare_fields(i, j, base, other):
    > for k, f1, f2 in zip(xrange(sys.maxint), base, other):
    > if f1 != f2:
    > print 'DIFF', i, j, k, repr(f1), repr(f2)
    >
    > def lots_of_compares(list_of_strings):
    > list_of_lists_of_fields = [row.split() for row in

    list_of_strings]
    > num_rows = len(list_of_lists_of_fields)
    > for i in xrange(num_rows):
    > base_row = list_of_lists_of_fields
    > for j in xrange(i+1, num_rows):
    > compare_fields(i, j, base_row,

    list_of_lists_of_fields[j])
    >


    Thanks for your help.


    > You can do better with enumerate, itertools and other things which

    2.2
    > didn't have, but sets wouldn't help. Now, I hope this clarifies the
    > many unanswered questions which your 'specs' leave open, so you can

    work
    > out exactly what you want.
    >
    > And, btw: upgrate to 2.4. Sets or no sets, the performance

    enhancement
    > by itself will be vastly sufficient to repay whatever inconvenience

    you
    > think the upgrade might


    Not in my hands.

    - Hari

    >
    > Alex
    Hari Pulapaka, Nov 15, 2004
    #4
  5. Hari Pulapaka <> wrote:

    > I want to comapre every element in each row with the element in the
    > remaining rows having the same column position. The rows need not have
    > the same number of elements, in which case I have to do some more
    > thinking :)
    >
    > I was thinking of making each row of the array as a set and then
    > comparing each row of the array with the compare function being the set
    > intersection operation.


    Sets have no order, so that just woudln't work the way you state it.
    Rows 'a b' and 'b a' would appear identical, so the "having the same
    column position" condition would not be respected.

    You could maybe use a set(enumerate(therow.split())) -- but intersecting
    such sets would be of dubious utility. Maybe you mean symmetric
    difference (union minus intersection), but even then you'd still have to
    proceed in order to investigate which item of that difference comes from
    which of the two rows (assuming you do care -- hard to tell from here).

    I believe gadfly comes with a fast C-coded extension called kjbuckets
    which might help with this kind of things (and a Python-coded
    'fallback', not all that fast but easily portable, too). You might want
    to investigate that, if there's a chance you could get C-coded
    extensions installed on your Python 2.2 installation.


    Alex
    Alex Martelli, Nov 15, 2004
    #5
  6. Mitja <> wrote:

    > > each element on a column by column basis.
    > >
    > > I am using python2.2 so no sets. Can anyone think of an efficient way
    > > to do this?

    >
    > If I understand the problem correctly, splitting the lines up and sorting
    > them before comparison _is_ much better than a naive approach, though I
    > don't know if that's what's best.


    Splitting, sure. Sorting would destroy the 'column by column basis'.


    Alex
    Alex Martelli, Nov 15, 2004
    #6
  7. Hari  Pulapaka

    Mitja Guest

    On Tue, 16 Nov 2004 00:05:04 +0100, Alex Martelli <>
    wrote:

    > Mitja <> wrote:
    >
    >> > each element on a column by column basis.
    >> >
    >> > I am using python2.2 so no sets. Can anyone think of an efficient way
    >> > to do this?

    >>
    >> If I understand the problem correctly, splitting the lines up and
    >> sorting
    >> them before comparison _is_ much better than a naive approach, though I
    >> don't know if that's what's best.

    >
    > Splitting, sure. Sorting would destroy the 'column by column basis'.


    I wasn't sure what OP really wanted; I saw both the "column by column"
    thing and the bit about sets, which is contradicting, so I assumed he was
    after sets-like behavior. [wrongly, as later posts clarified]

    --
    Mitja
    Mitja, Nov 18, 2004
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. KatB
    Replies:
    4
    Views:
    473
  2. Neven Klofutar

    Amazon like feature ...

    Neven Klofutar, Nov 18, 2005, in forum: ASP .Net
    Replies:
    1
    Views:
    357
    Daniel Fisher\(lennybacon\)
    Nov 18, 2005
  3. Steve Sobol
    Replies:
    1
    Views:
    403
    Dale King
    Jun 2, 2005
  4. Replies:
    26
    Views:
    549
    Gregory Bond
    Nov 13, 2005
  5. Patrick Kowalzick
    Replies:
    5
    Views:
    458
    Patrick Kowalzick
    Mar 14, 2006
Loading...

Share This Page