reading a specific column from file

Discussion in 'Python' started by cesco, Jan 11, 2008.

  1. cesco

    cesco Guest

    Hi,

    I have a file containing four columns of data separated by tabs (\t)
    and I'd like to read a specific column from it (say the third). Is
    there any simple way to do this in Python?

    I've found quite interesting the linecache module but unfortunately
    that is (to my knowledge) only working on lines, not columns.

    Any suggestion?

    Thanks and regards
    Francesco
     
    cesco, Jan 11, 2008
    #1
    1. Advertising

  2. cesco

    A.T.Hofkamp Guest

    On 2008-01-11, cesco <> wrote:
    > Hi,
    >
    > I have a file containing four columns of data separated by tabs (\t)
    > and I'd like to read a specific column from it (say the third). Is
    > there any simple way to do this in Python?
    >
    > I've found quite interesting the linecache module but unfortunately
    > that is (to my knowledge) only working on lines, not columns.
    >
    > Any suggestion?


    the csv module may do what you want.
     
    A.T.Hofkamp, Jan 11, 2008
    #2
    1. Advertising

  3. cesco wrote:

    > I have a file containing four columns of data separated by tabs (\t)
    > and I'd like to read a specific column from it (say the third). Is
    > there any simple way to do this in Python?


    use the "split" method and plain old indexing:

    for line in open("file.txt"):
    columns = line.split("\t")
    print columns[2] # indexing starts at zero

    also see the "csv" module, which can read all sorts of
    comma/semicolon/tab-separated spreadsheet-style files.

    > I've found quite interesting the linecache module


    the "linecache" module seems to be quite popular on comp.lang.python
    these days, but it's designed for a very specific purpose (displaying
    Python code in tracebacks), and is a really lousy way to read text files
    in the general case. please unlearn.

    </F>
     
    Fredrik Lundh, Jan 11, 2008
    #3
  4. cesco

    Chris Guest

    On Jan 11, 2:15 pm, cesco <> wrote:
    > Hi,
    >
    > I have a file containing four columns of data separated by tabs (\t)
    > and I'd like to read a specific column from it (say the third). Is
    > there any simple way to do this in Python?
    >
    > I've found quite interesting the linecache module but unfortunately
    > that is (to my knowledge) only working on lines, not columns.
    >
    > Any suggestion?
    >
    > Thanks and regards
    > Francesco


    for (i, each_line) in enumerate(open('input_file.txt','rb')):
    try:
    column_3 = each_line.split('\t')[2].strip()
    except IndexError:
    print 'Not enough columns on line %i of file.' % (i+1)
    continue

    do_something_with_column_3()
     
    Chris, Jan 11, 2008
    #4
  5. cesco

    Peter Otten Guest

    A.T.Hofkamp wrote:

    > On 2008-01-11, cesco <> wrote:
    >> Hi,
    >>
    >> I have a file containing four columns of data separated by tabs (\t)
    >> and I'd like to read a specific column from it (say the third). Is
    >> there any simple way to do this in Python?
    >>
    >> I've found quite interesting the linecache module but unfortunately
    >> that is (to my knowledge) only working on lines, not columns.
    >>
    >> Any suggestion?

    >
    > the csv module may do what you want.


    Here's an example:

    >>> print open("tmp.csv").read()

    alpha beta gamma delta
    one two three for

    >>> records = csv.reader(open("tmp.csv"), delimiter="\t")
    >>> [record[2] for record in records]

    ['gamma', 'three']

    Peter
     
    Peter Otten, Jan 11, 2008
    #5
  6. cesco

    Ivan Novick Guest

    On Jan 11, 4:15 am, cesco <> wrote:
    > Hi,
    >
    > I have a file containing four columns of data separated by tabs (\t)
    > and I'd like to read a specific column from it (say the third). Is
    > there any simple way to do this in Python?


    You say you would like to "read" a specific column. I wonder if you
    meant read all the data and then just seperate out the 3rd column or
    if you really mean only do disk IO for the 3rd column of data and
    thereby making your read faster. The second seems more interesting
    but much harder and I wonder if any one has any ideas. As for the
    just filtering out the third column, you have been given many
    suggestions already.

    Regards,
    Ivan Novick
    http://www.0x4849.net
     
    Ivan Novick, Jan 11, 2008
    #6
  7. > -----Original Message-----
    > From: python-list-bounces+jr9445= [mailto:python-
    > list-bounces+jr9445=] On Behalf Of Ivan Novick
    > Sent: Friday, January 11, 2008 12:46 PM
    > To:
    > Subject: Re: reading a specific column from file
    >
    >
    > You say you would like to "read" a specific column. I wonder if you
    > meant read all the data and then just seperate out the 3rd column or
    > if you really mean only do disk IO for the 3rd column of data and
    > thereby making your read faster. The second seems more interesting
    > but much harder and I wonder if any one has any ideas.


    Do what databases do. If the columns are stored with a fixed size on
    disk, then you can simply compute the offset and seek to it. If the
    columns are of variable size, then you need to store (and maintain) the
    offsets in some kind of index.



    *****

    The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers. GA623
     
    Reedick, Andrew, Jan 11, 2008
    #7
  8. cesco

    Hai Vu Guest

    Here is another suggestion:

    col = 2 # third column
    filename = '4columns.txt'
    third_column = [line[:-1].split('\t')[col] for line in open(filename,
    'r')]

    third_column now contains a list of items in the third column.

    This solution is great for small files (up to a couple of thousand of
    lines). For larger file, performance could be a problem, so you might
    need a different solution.
     
    Hai Vu, Jan 17, 2008
    #8
  9. cesco

    John Machin Guest

    On Jan 17, 8:47 pm, Hai Vu <> wrote:
    > Here is another suggestion:
    >
    > col = 2 # third column
    > filename = '4columns.txt'
    > third_column = [line[:-1].split('\t')[col] for line in open(filename,
    > 'r')]
    >
    > third_column now contains a list of items in the third column.
    >
    > This solution is great for small files (up to a couple of thousand of
    > lines). For larger file, performance could be a problem, so you might
    > need a different solution.


    Using the maxsplit arg could speed it up a little:

    line[:-1].split('\t', col+1)[col]
     
    John Machin, Jan 17, 2008
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Yi Xing

    reading specific lines of a file

    Yi Xing, Jul 15, 2006, in forum: Python
    Replies:
    12
    Views:
    882
    Fredrik Lundh
    Jul 16, 2006
  2. =?Utf-8?B?SmF2?=

    Is ViwState Page-Specific or UserControl-Specific

    =?Utf-8?B?SmF2?=, Aug 16, 2006, in forum: ASP .Net
    Replies:
    2
    Views:
    575
    =?Utf-8?B?SmF2?=
    Aug 16, 2006
  3. mazdotnet
    Replies:
    2
    Views:
    425
    Alexey Smirnov
    Oct 2, 2009
  4. Yigit Turgut
    Replies:
    14
    Views:
    315
  5. William FERRERES
    Replies:
    7
    Views:
    240
    William FERRERES
    Jul 9, 2007
Loading...

Share This Page