how to extract columns like awk $1 $5

Discussion in 'Python' started by Anand S Bisen, Jan 7, 2005.

  1. Hi

    Is there a simple way to extract words speerated by a space in python
    the way i do it in awk '{print $4 $5}' . I am sure there should be some
    but i dont know it.

    Thanks
    n00b
     
    Anand S Bisen, Jan 7, 2005
    #1
    1. Advertising

  2. Anand S Bisen

    Guest

    It takes a few more lines in Python, but you can do something like

    for text in open("file.txt","r"):
    words = text.split()
    print words[4],words[5]
    (assuming that awk starts counting from zero -- I forget).
     
    , Jan 7, 2005
    #2
    1. Advertising

  3. On Fri, 07 Jan 2005 12:15:48 -0500, Anand S Bisen wrote:

    > Is there a simple way to extract words speerated by a space in python
    > the way i do it in awk '{print $4 $5}' . I am sure there should be some
    > but i dont know it.


    mystr = '1 2 3 4 5 6'
    parts = mystr.split()
    print parts[3:5]

    Jeremy
     
    Jeremy Sanders, Jan 7, 2005
    #3
  4. Anand S Bisen

    Roy Smith Guest

    In article <>,
    Anand S Bisen <> wrote:
    >Hi
    >
    >Is there a simple way to extract words speerated by a space in python
    >the way i do it in awk '{print $4 $5}' . I am sure there should be some
    >but i dont know it.


    Something along the lines of:

    words = input.split()
    print words[4], words[5]
     
    Roy Smith, Jan 7, 2005
    #4
  5. Anand S Bisen

    Paul Rubin Guest

    (Roy Smith) writes:
    > Something along the lines of:
    >
    > words = input.split()
    > print words[4], words[5]


    That throws an exception if there are fewer than 6 fields, which might
    or might not be what you want.
     
    Paul Rubin, Jan 7, 2005
    #5
  6. On Fri, 07 Jan 2005 12:15:48 -0500, Anand S Bisen wrote:

    > Is there a simple way to extract words speerated by a space in python
    > the way i do it in awk '{print $4 $5}' . I am sure there should be some
    > but i dont know it.


    i guess it depends on how faithfully you want to reproduce awk's behavior
    and options.

    as several people have mentioned, strings have the split() method for
    simple tokenization, but blindly indexing into the resulting sequence
    can give you an out-of-range exception. out of range indexes are no
    problem for awk; it would just return an empty string without complaint.

    note that the index bases are slightly different: python sequences
    start with index 0, while awk's fields begin with $1. there IS a $0,
    but it means the entire unsplit line.

    the split() method accepts a separator argument, which can be used to
    replicate awk's -F option / FS variable.

    so, if you want to closely approximate awk's behavior without fear of
    exceptions, you could try a small function like this:


    def awk_it(instring,index,delimiter=" "):
    try:
    return [instring,instring.split(delimiter)[index-1]][max(0,min(1,index))]
    except:
    return ""


    >>> print awk_it("a b c d e",0)

    a b c d e

    >>> print awk_it("a b c d e",1)

    a

    >>> print awk_it("a b c d e",5)

    e

    >>> print awk_it("a b c d e",6)



    - dan
     
    Dan Valentine, Jan 8, 2005
    #6
  7. Anand S Bisen

    Roy Smith Guest

    Dan Valentine <> wrote:

    > On Fri, 07 Jan 2005 12:15:48 -0500, Anand S Bisen wrote:
    >
    > > Is there a simple way to extract words speerated by a space in python
    > > the way i do it in awk '{print $4 $5}' . I am sure there should be some
    > > but i dont know it.

    >
    > i guess it depends on how faithfully you want to reproduce awk's behavior
    > and options.
    >
    > as several people have mentioned, strings have the split() method for
    > simple tokenization, but blindly indexing into the resulting sequence
    > can give you an out-of-range exception. out of range indexes are no
    > problem for awk; it would just return an empty string without complaint.


    It's pretty easy to create a list type which has awk-ish behavior:

    class awkList (list):
    def __getitem__ (self, key):
    try:
    return list.__getitem__ (self, key)
    except IndexError:
    return ""

    l = awkList ("foo bar baz".split())
    print "l[0] = ", repr (l[0])
    print "l[5] = ", repr (l[5])

    -----------

    Roy-Smiths-Computer:play$ ./awk.py
    l[0] = 'foo'
    l[5] = ''

    Hmmm. There's something going on here I don't understand. The ref
    manual (3.3.5 Emulating container types) says for __getitem__(), "Note:
    for loops expect that an IndexError will be raised for illegal indexes
    to allow proper detection of the end of the sequence." I expected my
    little demo class to therefore break for loops, but they seem to work
    fine:

    >>> import awk
    >>> l = awk.awkList ("foo bar baz".split())
    >>> l

    ['foo', 'bar', 'baz']
    >>> for i in l:

    .... print i
    ....
    foo
    bar
    baz
    >>> l[5]

    ''

    Given that I've caught the IndexError, I'm not sure how that's working.
     
    Roy Smith, Jan 8, 2005
    #7
  8. Anand S Bisen

    Carl Banks Guest

    Roy Smith wrote:
    > Hmmm. There's something going on here I don't understand. The ref
    > manual (3.3.5 Emulating container types) says for __getitem__(),

    "Note:
    > for loops expect that an IndexError will be raised for illegal

    indexes
    > to allow proper detection of the end of the sequence." I expected my


    > little demo class to therefore break for loops, but they seem to work


    > fine:
    >
    > >>> import awk
    > >>> l = awk.awkList ("foo bar baz".split())
    > >>> l

    > ['foo', 'bar', 'baz']
    > >>> for i in l:

    > ... print i
    > ...
    > foo
    > bar
    > baz
    > >>> l[5]

    > ''
    >
    > Given that I've caught the IndexError, I'm not sure how that's

    working.


    The title of that particular section is "Emulating container types",
    which is not what you're doing, so it doesn't apply here. For built-in
    types, iterators are at work. The list iterator probably doesn't even
    call getitem, but accesses the items directly from the C structure.
    --
    CARL BANKS
     
    Carl Banks, Jan 8, 2005
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. gorda
    Replies:
    2
    Views:
    553
    Andrew Shitov
    Oct 21, 2003
  2. Craig Ringer

    Re: how to extract columns like awk $1 $5

    Craig Ringer, Jan 7, 2005, in forum: Python
    Replies:
    0
    Views:
    434
    Craig Ringer
    Jan 7, 2005
  3. gorda
    Replies:
    3
    Views:
    158
    Barry Kimelman
    Oct 21, 2003
  4. A. Farber
    Replies:
    9
    Views:
    208
    Uri Guttman
    Jun 3, 2009
  5. Rudra Banerjee

    awk like usage in python

    Rudra Banerjee, Nov 9, 2012, in forum: Python
    Replies:
    0
    Views:
    146
    Rudra Banerjee
    Nov 9, 2012
Loading...

Share This Page