Filtering two files with uncommon column

M

Madhur

I would like to know the best way of generating filter of two files
based upon the following condition

I have two files. Contents of the first file is

File 1
abc def hij
asd sss lmn
hig pqr mno


File 2

jih def asd
poi iuu wer
wer pqr jjj

I would like have the output as
Output

File1
asd sss lmn
File2
poi iuu wer

Basically I want to compare the two files based on second column. If
the second
column matches on both the files do not print anything, else if there
is no matc
h in for the second column for first file in second file then print it
under Fil
e1 header, else if there is no match for the second column for second
file in fi
rst file print it under File2 header.

Thankyou
Madhur
 
C

Chris

I would like to know the best way of generating filter of two files
based upon the following condition

I have two files. Contents of the first file is

File 1
abc def hij
asd sss lmn
hig pqr mno

File 2

jih def asd
poi iuu wer
wer pqr jjj

I would like have the output as
Output

File1
asd sss lmn
File2
poi iuu wer

Basically I want to compare the two files based on second column. If
the second
column matches on both the files do not print anything, else if there
is no matc
h in for the second column for first file in second file then print it
under Fil
e1 header, else if there is no match for the second column for second
file in fi
rst file print it under File2 header.

Thankyou
Madhur

file1 = open('file1.txt','rb')
file2 = open('file2.txt','rb')

file1_line = file1.next()
file2_line = file2.next()

while file1_line and file2_line:
try:
f1_col2 = file1_line.split(' ')[1]
except IndexError:
print 'Not enough delimiters in line.'
try:
f2_col2 = file2_line.split(' ')[2]
except IndexError:
print 'Not enough delimiters in line.'

if f1_col2 != f2_col2:
outfile_data_to_relevant_files()

file1_line = file1.next()
file2_line = file2.next()

HTH
Chris
 
M

Madhur

I would like to know the best way of generating filter of two files
based upon the following condition
I have two files. Contents of the first file is
File 1
abc def hij
asd sss lmn
hig pqr mno
jih def asd
poi iuu wer
wer pqr jjj
I would like have the output as
Output
File1
asd sss lmn
File2
poi iuu wer
Basically I want to compare the two files based on second column. If
the second
column matches on both the files do not print anything, else if there
is no matc
h in for the second column for first file in second file then print it
under Fil
e1 header, else if there is no match for the second column for second
file in fi
rst file print it under File2 header.
Thankyou
Madhur

file1 = open('file1.txt','rb')
file2 = open('file2.txt','rb')

file1_line = file1.next()
file2_line = file2.next()

while file1_line and file2_line:
try:
f1_col2 = file1_line.split(' ')[1]
except IndexError:
print 'Not enough delimiters in line.'
try:
f2_col2 = file2_line.split(' ')[2]
except IndexError:
print 'Not enough delimiters in line.'

if f1_col2 != f2_col2:
outfile_data_to_relevant_files()

file1_line = file1.next()
file2_line = file2.next()

HTH
Chris

If the files2 is unordered, then the above logic does not work. How to
takle it?
 
P

Paul Rubin

Madhur said:
If the files2 is unordered, then the above logic does not work. How to
takle it?

This sounds like a homework problem. Also, you are trying to reimplement
the unix "comm" command.
 
C

Chris

file1 = open('file1.txt','rb')
file2 = open('file2.txt','rb')
file1_line = file1.next()
file2_line = file2.next()
while file1_line and file2_line:
try:
f1_col2 = file1_line.split(' ')[1]
except IndexError:
print 'Not enough delimiters in line.'
try:
f2_col2 = file2_line.split(' ')[2]
except IndexError:
print 'Not enough delimiters in line.'
if f1_col2 != f2_col2:
outfile_data_to_relevant_files()
file1_line = file1.next()
file2_line = file2.next()
HTH
Chris

If the files2 is unordered, then the above logic does not work. How to
takle it?

Take a look at *nix's sort command, it can also sort based on a key
 
M

Martin Blume

I would like to know the best way of generating filter
of two files based upon the following condition
[...]
Sounds like homework. Here some suggestions:

- for each file, create a dictionary (see help(dict)
in the python shell for details) and populate it with
the values, so that e.g.
d1['def'] = 'abc def hij'
(help("".split), perhaps help("".strip))

- for each key in the first dictionary, look whether
it exists in the second, if not, write the value (the
line extracted in the first step) out.
(help(dict.iteritems), help(dict.has_key))
(Note that for
if a_dict.has_key("def"): pass
one can also write
if "def" in a_dict: pass
but you won't find this in the simple on-line help,
at least in my version)

HTH
Martin
 
N

Neil Cerutti

I would like to know the best way of generating filter of two files
based upon the following condition

As a bit of friendly advice, you'll get much more useful assistance if
you post your code.

If you don't have any code to show, write some. Unless it's a quine, a
program won't write itself.
 
R

Reedick, Andrew

-----Original Message-----
From: [email protected] [mailto:python-
[email protected]] On Behalf Of Madhur
Sent: Friday, January 18, 2008 4:23 AM
To: (e-mail address removed)
Subject: Filtering two files with uncommon column


Basically I want to compare the two files based on second column. If
the second
column matches on both the files do not print anything, else if there
is no matc
h in for the second column for first file in second file then print it
under Fil
e1 header, else if there is no match for the second column for second
file in fi
rst file print it under File2 header.


I often do this to compare property files between environments. The
follow algorithm works for any number of files by creating a dictionary
of lists (or hash of arrays in Perl-ese.)

Create a dictionary
Index = -1
For file in files
Index++
For line in file
col = match/split/regex the column
If col not in dictionary
Dictionary[col] = []

extend dictionary[col] to length of index
dictionary[col][index] = col

for col in sort(dictionary.keys()):
extend dictionary[col] to length of index
print dictionary[col]




*****

The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers. GA622
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top