Set like feature

Hari Pulapaka · Nov 15, 2004

Hi,

I have a list of space delimited strings ending in a newline.
Eg: a = ['a sfds sdf s df 34 ew\n', 'df sdf s f s ssf\n']

Now inside each row, I have a space delimited list of fields.

Now I want to compare the fields in each row of the array and see which
fields do not match.

Think of it as a 2 dimensional array of size mn, and comparing each
each element on a column by column basis.

I am using python2.2 so no sets. Can anyone think of an efficient way
to do this?

Thanks,

Hari

Mitja · Nov 15, 2004

Hi,

I have a list of space delimited strings ending in a newline.
Eg: a = ['a sfds sdf s df 34 ew\n', 'df sdf s f s ssf\n']

Now inside each row, I have a space delimited list of fields.

Now I want to compare the fields in each row of the array and see which
fields do not match.

Think of it as a 2 dimensional array of size mn, and comparing each
each element on a column by column basis.

I am using python2.2 so no sets. Can anyone think of an efficient way
to do this?

If I understand the problem correctly, splitting the lines up and sorting
them before comparison _is_ much better than a naive approach, though I
don't know if that's what's best.

Alex Martelli · Nov 15, 2004

Hari Pulapaka said:
I have a list of space delimited strings ending in a newline.
Eg: a = ['a sfds sdf s df 34 ew\n', 'df sdf s f s ssf\n']

Now inside each row, I have a space delimited list of fields.

Now I want to compare the fields in each row of the array and see which
fields do not match.

Think of it as a 2 dimensional array of size mn, and comparing each
each element on a column by column basis.

I am using python2.2 so no sets. Can anyone think of an efficient way
to do this?

Do you want to compare corresponding fields? That's the only way I can
read that 'column by column basis', and thus I don't see what sets could
possibly have to do with it.

Do you want to compare each row with every other row? I also note in
your example that the number of fields in each row appear to be
variable, so how do you want to deal with 'missing' fields?

Too many unanswered questions, I guess. But for some specified set of
answers to those question, you might do...:

def compare_fields(i, j, base, other):
for k, f1, f2 in zip(xrange(sys.maxint), base, other):
if f1 != f2:
print 'DIFF', i, j, k, repr(f1), repr(f2)

def lots_of_compares(list_of_strings):
list_of_lists_of_fields = [row.split() for row in list_of_strings]
num_rows = len(list_of_lists_of_fields)
for i in xrange(num_rows):
base_row = list_of_lists_of_fields
for j in xrange(i+1, num_rows):
compare_fields(i, j, base_row, list_of_lists_of_fields[j])

You can do better with enumerate, itertools and other things which 2.2
didn't have, but sets wouldn't help. Now, I hope this clarifies the
many unanswered questions which your 'specs' leave open, so you can work
out exactly what you want.

And, btw: upgrate to 2.4. Sets or no sets, the performance enhancement
by itself will be vastly sufficient to repay whatever inconvenience you
think the upgrade might cause.

Alex

Hari Pulapaka · Nov 15, 2004

Alex said:
Do you want to compare corresponding fields? That's the only way I can
read that 'column by column basis', and thus I don't see what sets could
possibly have to do with it.

Do you want to compare each row with every other row? I also note in
your example that the number of fields in each row appear to be
variable, so how do you want to deal with 'missing' fields?

I want to comapre every element in each row with the element in the
remaining rows having the same column position. The rows need not have
the same number of elements, in which case I have to do some more
thinking

I was thinking of making each row of the array as a set and then
comparing each row of the array with the compare function being the set
intersection operation.

You have pretty much captured what I was thinking, and my solution is
also similar to what you showed.

Too many unanswered questions, I guess. But for some specified set of
answers to those question, you might do...:

def compare_fields(i, j, base, other):
for k, f1, f2 in zip(xrange(sys.maxint), base, other):
if f1 != f2:
print 'DIFF', i, j, k, repr(f1), repr(f2)

def lots_of_compares(list_of_strings):
list_of_lists_of_fields = [row.split() for row in list_of_strings]
num_rows = len(list_of_lists_of_fields)
for i in xrange(num_rows):
base_row = list_of_lists_of_fields
for j in xrange(i+1, num_rows):
compare_fields(i, j, base_row, list_of_lists_of_fields[j])

Thanks for your help.

You can do better with enumerate, itertools and other things which 2.2
didn't have, but sets wouldn't help. Now, I hope this clarifies the
many unanswered questions which your 'specs' leave open, so you can work
out exactly what you want.

And, btw: upgrate to 2.4. Sets or no sets, the performance enhancement
by itself will be vastly sufficient to repay whatever inconvenience you
think the upgrade might

Click to expand...

Not in my hands.

- Hari

Alex Martelli · Nov 15, 2004

Hari Pulapaka said:
I want to comapre every element in each row with the element in the
remaining rows having the same column position. The rows need not have
the same number of elements, in which case I have to do some more
thinking

I was thinking of making each row of the array as a set and then
comparing each row of the array with the compare function being the set
intersection operation.

Sets have no order, so that just woudln't work the way you state it.
Rows 'a b' and 'b a' would appear identical, so the "having the same
column position" condition would not be respected.

You could maybe use a set(enumerate(therow.split())) -- but intersecting
such sets would be of dubious utility. Maybe you mean symmetric
difference (union minus intersection), but even then you'd still have to
proceed in order to investigate which item of that difference comes from
which of the two rows (assuming you do care -- hard to tell from here).

I believe gadfly comes with a fast C-coded extension called kjbuckets
which might help with this kind of things (and a Python-coded
'fallback', not all that fast but easily portable, too). You might want
to investigate that, if there's a chance you could get C-coded
extensions installed on your Python 2.2 installation.

Alex

Alex Martelli · Nov 15, 2004

Mitja said:
If I understand the problem correctly, splitting the lines up and sorting
them before comparison _is_ much better than a naive approach, though I
don't know if that's what's best.

Splitting, sure. Sorting would destroy the 'column by column basis'.

Alex

Mitja · Nov 18, 2004

Splitting, sure. Sorting would destroy the 'column by column basis'.

I wasn't sure what OP really wanted; I saw both the "column by column"
thing and the bit about sets, which is contradicting, so I assumed he was
after sets-like behavior. [wrongly, as later posts clarified]

In R Shiny, How do I ensure variable value propagation within same code block in R?	0	Sep 29, 2022
SENTINEL CONTROL LOOP WHEN DEALING WITH TWO ARRAYS	1	Oct 26, 2023
Why is this WordPress comments form not submitting?	1	Jan 12, 2020
Fibonacci C code	1	Aug 4, 2018
TF-IDF	1	Aug 19, 2021
Help with my responsive home page	2	Dec 14, 2022
Minimum Total Difficulty	0	Nov 15, 2023
Collect Excel Data from Website	5	Apr 30, 2022

Set like feature

Hari Pulapaka

Mitja

Alex Martelli

Hari Pulapaka

Alex Martelli

Alex Martelli

Mitja

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads