Query regarding set([])?

V

vox

Hi,
I'm contsructing a simple compare-script and thought I would use set
([]) to generate the difference output. But I'm obviosly doing
something wrong.

file1 contains 410 rows.
file2 contains 386 rows.
I want to know what rows are in file1 but not in file2.

This is my script:
s1 = set(open("file1"))
s2 = set(open("file2"))
s3 = set([])
s1temp = set([])
s2temp = set([])

s1temp = set(i.strip() for i in s1)
s2temp = set(i.strip() for i in s2)
s3 = s1temp-s2temp

print len(s3)

Output is 119. AFAIK 410-386=24. What am I doing wrong here?

BR,
Andy
 
P

Peter Otten

vox said:
I'm contsructing a simple compare-script and thought I would use set
([]) to generate the difference output. But I'm obviosly doing
something wrong.

file1 contains 410 rows.
file2 contains 386 rows.
I want to know what rows are in file1 but not in file2.

This is my script:
s1 = set(open("file1"))
s2 = set(open("file2"))

Remove the following three lines:
s3 = set([])
s1temp = set([])
s2temp = set([])

s1temp = set(i.strip() for i in s1)
s2temp = set(i.strip() for i in s2)
s3 = s1temp-s2temp

print len(s3)

Output is 119. AFAIK 410-386=24. What am I doing wrong here?

You are probably misinterpreting len(s3). s3 contains lines occuring in
"file1" but not in "file2". Duplicate lines are only counted once, and the
order doesn't matter.

So there are 119 lines that occur at least once in "file2", but not in
"file1".

If that is not what you want you have to tell us what exactly you are
looking for.

Peter
 
V

vox

You are probably misinterpreting len(s3). s3 contains lines occuring in
"file1" but not in "file2". Duplicate lines are only counted once, and the
order doesn't matter.

So there are 119 lines that occur at least once in "file2", but not in
"file1".

If that is not what you want you have to tell us what exactly you are
looking for.

Peter

Hi,
Thanks for the answer.

I am looking for a script that compares file1 and file2, for each line
in file1, check if line is present in file2. If the line from file1 is
not present in file2, print that line/write it to file3, because I
have to know what lines to add to file2.

BR,
Andy
 
D

David Robinow

I am looking for a script that compares file1 and file2, for each line
in file1, check if line is present in file2. If the line from file1 is
not present in file2, print that line/write it to file3, because I
have to know what lines to add to file2.
Just copy file1 to file2.
(I'm pretty sure that's not what you want, but in explaining why it
should become clearer what you're trying to do.)
 
D

Dave Angel

vox said:
Hi,
Thanks for the answer.

I am looking for a script that compares file1 and file2, for each line
in file1, check if line is present in file2. If the line from file1 is
not present in file2, print that line/write it to file3, because I
have to know what lines to add to file2.

BR,
Andy
There's no more detail in that response. To the level of detail you
provide, the program works perfectly. Just loop through the set and
write the members to the file.

But you have some unspecified assumptions:
1) order doesn't matter
2) duplicates are impossible in the input file, or at least not
meaningful. So the correct output file could very well be smaller than
either of the input files.

And a few others that might matter:
3) the two files are both text files, with identical line endings
matching your OS default
4) the two files are ASCII, or at least 8 bit encoded, using the
same encoding (such as both UTF-8)
5) the last line of each file DOES have a trailing newline sequence
 
V

vox

There's no more detail in that response.  To the level of detail you
provide, the program works perfectly.  Just loop through the set and
write the members to the file.

But you have some unspecified assumptions:
    1) order doesn't matter
    2) duplicates are impossible in the input file, or at least not
meaningful.  So the correct output file could very well be smaller than
either of the input files.

And a few others that might matter:
    3) the two files are both text files, with identical line endings
matching your OS default
    4) the two files are ASCII, or at least 8 bit encoded, using the
same encoding  (such as both UTF-8)
    5) the last line of each file DOES have a trailing newline sequence

Thanks all for the input!
I have guess I have to think it through a couple times more. :)

BR,
Andy
 
T

Terry Reedy

vox said:
Hi,
I'm contsructing a simple compare-script and thought I would use set
([]) to generate the difference output. But I'm obviosly doing
something wrong.

file1 contains 410 rows.
file2 contains 386 rows.
I want to know what rows are in file1 but not in file2.

This is my script:
s1 = set(open("file1"))
s2 = set(open("file2"))
s3 = set([])
s1temp = set([])
s2temp = set([])

s1temp = set(i.strip() for i in s1)
s2temp = set(i.strip() for i in s2)
s3 = s1temp-s2temp

print len(s3)

Output is 119. AFAIK 410-386=24. What am I doing wrong here?

Assuming that every line in s2 is in s1. If there are lines in s2 that
are not in s1, then the number of lines in s1 not in s2 will be larger
than 24. s1 - s2 subtracts the intersection of s1 and s2.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top