Adding a column in a tab delimited txt file

G

Garry

Hi, I am new to python, hope someone can help me here:
I have a MS Access exported .txt file which is tab delimited in total
20 columns, now I need to add another column of zero at the 4th column
position and a column of zero at the 9th column position. What is the
best way to do this? Can I write a while loop to count the number of
tab I hit until the counter is 4 and then add a zero in between and
thru the whole file?

Thanks,
Garry
 
B

Blake Garretson

Garry said:
Can I write a while loop to count the number of
tab I hit until the counter is 4 and then add a zero in between and
thru the whole file?

Try using the split string method and then the insert list method.

Untested code:

###
infile = file("in.txt")
outfile = file("out.txt")

for line in f:
columns = line.split("\t")
columns.insert(4, "0")
outfile.write("\t".join(columns)+"\n")
###

The code might not be perfect, but you get the idea.
 
J

John Hunter

Garry> Hi, I am new to python, hope someone can help me here: I
Garry> have a MS Access exported .txt file which is tab delimited
Garry> in total 20 columns, now I need to add another column of
Garry> zero at the 4th column position and a column of zero at the
Garry> 9th column position. What is the best way to do this? Can I
Garry> write a while loop to count the number of tab I hit until
Garry> the counter is 4 and then add a zero in between and thru
Garry> the whole file?

Unless the file is terribly large, it will be easier to slurp the
whole thing into memory, manipulate some list structures, and then
dump back to the file.

There are a couple of nifty things to speed you along. You can use
string split methods to split the file on tabs and read the file into
a list of rows, each row split on the tabs.

rows = [line.split('\t') for line in file('tabdelim.dat')]

The next fun trick is to use the zip(*rows) to tranpose this into a
list of columns. You can then use the list insert method to insert
your column. Here I'm adding a last name column to the third column.

cols = zip(*rows) # transposes 2Dlist
cols.insert(2, ['Hunter', 'Sierig', 'Hunter', 'Hunter'])

Now all that is left is to transpose back to rows and write the new
file using the string method join to rejoin the columns with tabs

rows = zip(*cols) # transpose back
file('newfile.dat', 'w').writelines(['\t'.join(row) for row in rows])

This script takes an input file like

1 John 35 M
2 Miriam 31 F
3 Rahel 5 F
4 Ava 2 F

and generates an outfile

1 John Hunter 35 M
2 Miriam Sierig 31 F
3 Rahel Hunter 5 F
4 Ava Hunter 2 F

Damn cool!

Here is the whole script:

rows = [line.split('\t') for line in file('tabdelim.dat')]
cols = zip(*rows)
cols.insert(2, ['Hunter', 'Sierig', 'Hunter', 'Hunter'])
rows = zip(*cols)
file('newfile.dat', 'w').writelines(['\t'.join(row) for row in rows])

Cheers,
John Hunter
 
?

-

Hi, I am new to python, hope someone can help me here:
I have a MS Access exported .txt file which is tab delimited in total
20 columns, now I need to add another column of zero at the 4th column
position and a column of zero at the 9th column position. What is the
best way to do this?

I don't know the best way, but one way is this.

import re

infile = file("in.txt","r")
outfile = file("out.txt","w")

pattern = re.compile(r'^((?:[^\t]+\t){3})((?:[^\t]+\t){5})')
replace = '\g<1>0\t\g<2>0\t'

for line in infile:
outfile.write(pattern.sub(replace,line))
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,682
Members
48,796
Latest member
Greg L.

Latest Threads

Top