how to extract columns like awk $1 $5

A

Anand S Bisen

Hi

Is there a simple way to extract words speerated by a space in python
the way i do it in awk '{print $4 $5}' . I am sure there should be some
but i dont know it.

Thanks
n00b
 
B

beliavsky

It takes a few more lines in Python, but you can do something like

for text in open("file.txt","r"):
words = text.split()
print words[4],words[5]
(assuming that awk starts counting from zero -- I forget).
 
J

Jeremy Sanders

Is there a simple way to extract words speerated by a space in python
the way i do it in awk '{print $4 $5}' . I am sure there should be some
but i dont know it.

mystr = '1 2 3 4 5 6'
parts = mystr.split()
print parts[3:5]

Jeremy
 
R

Roy Smith

Hi

Is there a simple way to extract words speerated by a space in python
the way i do it in awk '{print $4 $5}' . I am sure there should be some
but i dont know it.

Something along the lines of:

words = input.split()
print words[4], words[5]
 
P

Paul Rubin

Something along the lines of:

words = input.split()
print words[4], words[5]

That throws an exception if there are fewer than 6 fields, which might
or might not be what you want.
 
D

Dan Valentine

Is there a simple way to extract words speerated by a space in python
the way i do it in awk '{print $4 $5}' . I am sure there should be some
but i dont know it.

i guess it depends on how faithfully you want to reproduce awk's behavior
and options.

as several people have mentioned, strings have the split() method for
simple tokenization, but blindly indexing into the resulting sequence
can give you an out-of-range exception. out of range indexes are no
problem for awk; it would just return an empty string without complaint.

note that the index bases are slightly different: python sequences
start with index 0, while awk's fields begin with $1. there IS a $0,
but it means the entire unsplit line.

the split() method accepts a separator argument, which can be used to
replicate awk's -F option / FS variable.

so, if you want to closely approximate awk's behavior without fear of
exceptions, you could try a small function like this:


def awk_it(instring,index,delimiter=" "):
try:
return [instring,instring.split(delimiter)[index-1]][max(0,min(1,index))]
except:
return ""

a b c d e


- dan
 
R

Roy Smith

Dan Valentine said:
i guess it depends on how faithfully you want to reproduce awk's behavior
and options.

as several people have mentioned, strings have the split() method for
simple tokenization, but blindly indexing into the resulting sequence
can give you an out-of-range exception. out of range indexes are no
problem for awk; it would just return an empty string without complaint.

It's pretty easy to create a list type which has awk-ish behavior:

class awkList (list):
def __getitem__ (self, key):
try:
return list.__getitem__ (self, key)
except IndexError:
return ""

l = awkList ("foo bar baz".split())
print "l[0] = ", repr (l[0])
print "l[5] = ", repr (l[5])

-----------

Roy-Smiths-Computer:play$ ./awk.py
l[0] = 'foo'
l[5] = ''

Hmmm. There's something going on here I don't understand. The ref
manual (3.3.5 Emulating container types) says for __getitem__(), "Note:
for loops expect that an IndexError will be raised for illegal indexes
to allow proper detection of the end of the sequence." I expected my
little demo class to therefore break for loops, but they seem to work
fine:
import awk
l = awk.awkList ("foo bar baz".split())
l ['foo', 'bar', 'baz']
for i in l:
.... print i
....
foo
bar
baz
''

Given that I've caught the IndexError, I'm not sure how that's working.
 
C

Carl Banks

Roy said:
Hmmm. There's something going on here I don't understand. The ref
manual (3.3.5 Emulating container types) says for __getitem__(), "Note:
for loops expect that an IndexError will be raised for illegal indexes
to allow proper detection of the end of the sequence." I expected my
little demo class to therefore break for loops, but they seem to work
fine:
import awk
l = awk.awkList ("foo bar baz".split())
l ['foo', 'bar', 'baz']
for i in l:
... print i
...
foo
bar
baz
''

Given that I've caught the IndexError, I'm not sure how that's
working.


The title of that particular section is "Emulating container types",
which is not what you're doing, so it doesn't apply here. For built-in
types, iterators are at work. The list iterator probably doesn't even
call getitem, but accesses the items directly from the C structure.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top