Garry> Hi, I am new to python, hope someone can help me here: I
Garry> have a MS Access exported .txt file which is tab delimited
Garry> in total 20 columns, now I need to add another column of
Garry> zero at the 4th column position and a column of zero at the
Garry> 9th column position. What is the best way to do this? Can I
Garry> write a while loop to count the number of tab I hit until
Garry> the counter is 4 and then add a zero in between and thru
Garry> the whole file?
Unless the file is terribly large, it will be easier to slurp the
whole thing into memory, manipulate some list structures, and then
dump back to the file.
There are a couple of nifty things to speed you along. You can use
string split methods to split the file on tabs and read the file into
a list of rows, each row split on the tabs.
rows = [line.split('\t') for line in file('tabdelim.dat')]
The next fun trick is to use the zip(*rows) to tranpose this into a
list of columns. You can then use the list insert method to insert
your column. Here I'm adding a last name column to the third column.
cols = zip(*rows) # transposes 2Dlist
cols.insert(2, ['Hunter', 'Sierig', 'Hunter', 'Hunter'])
Now all that is left is to transpose back to rows and write the new
file using the string method join to rejoin the columns with tabs
rows = zip(*cols) # transpose back
file('newfile.dat', 'w').writelines(['\t'.join(row) for row in rows])
This script takes an input file like
1 John 35 M
2 Miriam 31 F
3 Rahel 5 F
4 Ava 2 F
and generates an outfile
1 John Hunter 35 M
2 Miriam Sierig 31 F
3 Rahel Hunter 5 F
4 Ava Hunter 2 F
Damn cool!
Here is the whole script:
rows = [line.split('\t') for line in file('tabdelim.dat')]
cols = zip(*rows)
cols.insert(2, ['Hunter', 'Sierig', 'Hunter', 'Hunter'])
rows = zip(*cols)
file('newfile.dat', 'w').writelines(['\t'.join(row) for row in rows])
Cheers,
John Hunter