Beginner Question : Iterators and zip

moogyd · Jul 12, 2008

Hi group,

I have a basic question on the zip built in function.

I am writing a simple text file comparison script, that compares line
by line and character by character. The output is the original file,
with an X in place of any characters that are different.

I have managed a solution for a fixed (3) number of files, but I want
a solution of any number of input files.

The outline of my solution:

for vec in zip(vec_list[0],vec_list[1],vec_list[2]):
res = ''
for entry in zip(vec[0],vec[1],vec[2]):
if len(set(entry)) > 1:
res = res+'X'
else:
res = res+entry[0]
outfile.write(res)

So vec is a tuple containing a line from each file, and then entry is
a tuple containg a character from each line.

2 questions
1) What is the general solution. Using zip in this way looks wrong. Is
there another function that does what I want
2) I am using set to remove any repeated characters. Is there a
"better" way ?

Any other comments/suggestions appreciated.

Thanks,

Steven

bruno.desthuilliers · Jul 12, 2008

Hi group,

I have a basic question on the zip built in function.

I am writing a simple text file comparison script, that compares line
by line and character by character. The output is the original file,
with an X in place of any characters that are different.

I have managed a solution for a fixed (3) number of files, but I want
a solution of any number of input files.

The outline of my solution:

for vec in zip(vec_list[0],vec_list[1],vec_list[2]):
res = ''
for entry in zip(vec[0],vec[1],vec[2]):
if len(set(entry)) > 1:
res = res+'X'
else:
res = res+entry[0]
outfile.write(res)

So vec is a tuple containing a line from each file, and then entry is
a tuple containg a character from each line.

2 questions
1) What is the general solution. Using zip in this way looks wrong. Is
there another function that does what I want

zip is (mostly) ok. What you're missing is how to use it for any
arbitrary number of sequences. Try this instead:

lists = [range(5), range(5,11), range(11, 16)]
lists [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15]]
for item in zip(*lists):

Click to expand...

Click to expand...

.... print item
....
(0, 5, 11)
(1, 6, 12)
(2, 7, 13)
(3, 8, 14)
(4, 9, 15)

lists = [range(5), range(5,11), range(11, 16), range(16, 20)]
for item in zip(*lists):

Click to expand...

Click to expand...

.... print item
....
(0, 5, 11, 16)
(1, 6, 12, 17)
(2, 7, 13, 18)
(3, 8, 14, 19)
The only caveat with zip() is that it will only use as many items as
there are in your shorter sequence, ie:

zip(range(3), range(10)) [(0, 0), (1, 1), (2, 2)]
zip(range(30), range(10))

Click to expand...

Click to expand...

[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7), (8,
8), (9, 9)]
So you'd better pad your sequences to make them as long as the longer
one. There are idioms for doing this using the itertools package's
chain and repeat iterators, but I'll leave concrete example as an
exercice to the reader !-)

2) I am using set to remove any repeated characters. Is there a
"better" way ?

That's probably what I'd do too.

Any other comments/suggestions appreciated.

There's a difflib package in the standard lib. Did you give it a try ?

Terry Reedy · Jul 13, 2008

Hi group,

I have a basic question on the zip built in function.

I am writing a simple text file comparison script, that compares line
by line and character by character. The output is the original file,
with an X in place of any characters that are different.

I have managed a solution for a fixed (3) number of files, but I want
a solution of any number of input files.

The outline of my solution:

for vec in zip(vec_list[0],vec_list[1],vec_list[2]):
res = ''
for entry in zip(vec[0],vec[1],vec[2]):
if len(set(entry)) > 1:
res = res+'X'
else:
res = res+entry[0]
outfile.write(res)

So vec is a tuple containing a line from each file, and then entry is
a tuple containg a character from each line.

2 questions
1) What is the general solution. Using zip in this way looks wrong. Is
there another function that does what I want

zip(*vec_list) will zip together all entries in vec_list
Do be aware that zip stops on the shortest iterable. So if vec[1] is
shorter than vec[0] and matches otherwise, your output line will be
truncated. Or if vec[1] is longer and vec[0] matches as far as it goes,
there will be no signal either.

res=rex+whatever can be written as res+=whatever

2) I am using set to remove any repeated characters. Is there a
"better" way ?

I might have written a third loop to compare vec[0] to vec[1]..., but
your set solution is easier and prettier.

If speed is an issue, don't rebuild the output line char by char. Just
change what is needed in a mutable copy. I like this better anyway.

res = list(vec[0]) # if all ascii, in 3.0 use bytearray
for n, entry in enumerate(zip(vec[0],vec[1],vec[2])):
if len(set(entry)) > 1:
res[n] = 'X'
outfile.write(''.join(res)) # in 3.0, write(res)

tjr

moogyd · Jul 13, 2008

On 12 juil, 20:55, (e-mail address removed) wrote:

zip is (mostly) ok. What you're missing is how to use it for any
arbitrary number of sequences. Try this instead:

lists = [range(5), range(5,11), range(11, 16)]
lists

Click to expand...

Click to expand...

[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15]]>>> for item in zip(*lists):

... print item
...
(0, 5, 11)
(1, 6, 12)
(2, 7, 13)
(3, 8, 14)
(4, 9, 15)

What is this *lis operation called? I am having trouble finding any
reference to it in the python docs or the book learning python.

There's a difflib package in the standard lib. Did you give it a try ?

I'll check it out, but I am a newbie, so I am writing this as a
(useful) learning excercise.

Thanks for the help

Steven

Terry Reedy · Jul 13, 2008

What is this *lis operation called? I am having trouble finding any
reference to it in the python docs or the book learning python.

One might call this argument unpacking, but
Language Manual / Expressions / Primaries / Calls
simply calls it *expression syntax.
"If the syntax *expression appears in the function call, expression must
evaluate to a sequence. Elements from this sequence are treated as if
they were additional positional arguments; if there are positional
arguments x1,...,*xN* , and expression evaluates to a sequence
y1,...,*yM*, this is equivalent to a call with M+N positional arguments
x1,...,*xN*,*y1*,...,*yM*."

See Compound Statements / Function definitions for the mirror syntax in
definitions.

tjr

cokofreedom · Jul 14, 2008

zip(*vec_list) will zip together all entries in vec_list
Do be aware that zip stops on the shortest iterable. So if vec[1] is
shorter than vec[0] and matches otherwise, your output line will be
truncated. Or if vec[1] is longer and vec[0] matches as far as it goes,
there will be no signal either.

Do note that from Python 3.0 there is another form of zip that will
read until all lists are exhausted, with the other being filled up
with a settable default value. Very useful!

moogyd · Jul 14, 2008

One might call this argument unpacking, but
Language Manual / Expressions / Primaries / Calls
simply calls it *expression syntax.
"If the syntax *expression appears in the function call, expression must
evaluate to a sequence. Elements from this sequence are treated as if
they were additional positional arguments; if there are positional
arguments x1,...,*xN* , and expression evaluates to a sequence
y1,...,*yM*, this is equivalent to a call with M+N positional arguments
x1,...,*xN*,*y1*,...,*yM*."

See Compound Statements / Function definitions for the mirror syntax in
definitions.

tjr

Thanks,

It's starting to make sense

Steven

Syntax error and im a beginner	1	Jun 4, 2022
iterators	10	Jul 8, 2013
Help! (Beginner)	2	Nov 29, 2019
Var and let simple beginner problem	1	Nov 8, 2019
Uploading two zip files respectively to main page and to page 2	3	Oct 27, 2019
Fatal error: Uncaught Error: Cannot use object of type WP_Error as array in	0	Dec 23, 2021
Hello from beginner with some questions!	3	Jul 30, 2021
Select Eof extension files based on text list of filenames with if condition	0	May 4, 2022

Beginner Question : Iterators and zip

moogyd

bruno.desthuilliers

Terry Reedy

moogyd

Terry Reedy

cokofreedom

moogyd

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads