reading specific lines of a file

Discussion in 'Python' started by Yi Xing, Jul 15, 2006.

  1. Yi Xing

    Yi Xing Guest

    Hi All,

    I want to read specific lines of a huge txt file (I know the line #).
    Each line might have different sizes. Is there a convenient and fast
    way of doing this in Python? Thanks.

    Yi Xing
    Yi Xing, Jul 15, 2006
    #1
    1. Advertising

  2. If the line number of the first line is 0 :

    source=open('afile.txt')
    for i,line in enumerate(source):
    if i == line_num:
    break
    print line

    Pierre
    Pierre Quentel, Jul 15, 2006
    #2
    1. Advertising

  3. >>>>> Yi Xing <> (YX) wrote:

    >YX> Hi All,
    >YX> I want to read specific lines of a huge txt file (I know the line #). Each
    >YX> line might have different sizes. Is there a convenient and fast way of
    >YX> doing this in Python? Thanks.


    Not fast. You have to read all preceding lines.
    If you have to do this many times while the file does not change, you could
    build an index into the file.
    --
    Piet van Oostrum <>
    URL: http://www.cs.uu.nl/~piet [PGP 8DAE142BE17999C4]
    Private email:
    Piet van Oostrum, Jul 15, 2006
    #3
  4. Yi Xing

    Bill Pursell Guest

    Yi Xing wrote:
    > I want to read specific lines of a huge txt file (I know the line #).
    > Each line might have different sizes. Is there a convenient and fast
    > way of doing this in Python? Thanks.


    #!/usr/bin/env python

    import os,sys
    line = int(sys.argv[1])
    path = sys.argv[2]
    os.system("sed -n %dp %s"%(line,path))


    Some might argue that this is not really doing
    it in Python. In fact, I would argue that! But if
    you're at a command prompt and you want to
    see line 7358, it's much easier to type
    % sed -n 7358p
    than it is to write the python one-liner.
    Bill Pursell, Jul 15, 2006
    #4
  5. Yi Xing

    Simon Forman Guest

    Yi Xing wrote:
    > Hi All,
    >
    > I want to read specific lines of a huge txt file (I know the line #).
    > Each line might have different sizes. Is there a convenient and fast
    > way of doing this in Python? Thanks.
    >
    > Yi Xing


    I once had to do a lot of random access of lines in a multi gigabyte
    log file. I found that a very fast way to do this was to build an
    index file containing the int offset in bytes of each line in the log
    file.

    I could post the code if you're interested.

    Peace,
    ~Simon
    Simon Forman, Jul 15, 2006
    #5
  6. In <>, Yi Xing wrote:

    > I want to read specific lines of a huge txt file (I know the line #).
    > Each line might have different sizes. Is there a convenient and fast
    > way of doing this in Python? Thanks.


    Don't know how efficient the `linecache` module in the standard library is
    implemented but you might have a look at it.

    Ciao,
    Marc 'BlackJack' Rintsch
    Marc 'BlackJack' Rintsch, Jul 15, 2006
    #6
  7. Yi,
    Use the linecache module. The documentation states that :
    """
    The linecache module allows one to get any line from any file, while
    attempting to optimize internally, using a cache, the common case where
    many lines are read from a single file.
    >>> import linecache
    >>> linecache.getline('/etc/passwd', 4)

    'sys:x:3:3:sys:/dev:/bin/sh\012'
    """

    Please note that you cannot really skip over the lines unless each has
    a fixed known size. (and if all lines have a fixed, known size then
    they can be considered as 'records' and you can use seek() and other
    random access magic. That is why sometimes it is a lot faster to use
    fixed length rows in a database => increase the speed of search but at
    the expense of wasted space! - but this is a another topic for another
    discussion...).

    So the point is that you won't be able to jump to line 15000 without
    reading lines 0-14999. You can either iterate over the rows by yourself
    or simply use the 'linecache' module like shown above. If I were you I
    would use the linecache, but of course you don't mention anything about
    the context of your project so it is hard to say.

    Hope this helps,
    Nick Vatamaniuc


    Yi Xing wrote:
    > Hi All,
    >
    > I want to read specific lines of a huge txt file (I know the line #).
    > Each line might have different sizes. Is there a convenient and fast
    > way of doing this in Python? Thanks.
    >
    > Yi Xing
    Nick Vatamaniuc, Jul 16, 2006
    #7
  8. Yi Xing

    John Machin Guest

    On 16/07/2006 2:54 PM, Nick Vatamaniuc top-posted:
    > Yi,
    > Use the linecache module.


    Yi, *don't* use the linecache module without carefully comparing the
    documentation and the implementation with your requirements.

    You will find that you have the source code on your computer -- mine
    (Windows box) is at c:\Python24\Lib\linecache.py. When you read right
    down to the end (it's not a large file, only 108 lines), you'll find this:

    try:
    fp = open(fullname, 'rU')
    lines = fp.readlines()
    fp.close()
    except IOError, msg:
    ## print '*** Cannot open', fullname, ':', msg
    return []
    size, mtime = stat.st_size, stat.st_mtime
    cache[filename] = size, mtime, lines, fullname

    Looks like it's caching the *whole* of *each* file. Not unreasonable
    given it appears to have been written to get source lines to include in
    tracebacks.

    It might just not be what you want if as you say you have "a huge txt
    file". How many megabytes is "huge"?

    Cheers,
    John

    The documentation states that :
    > """
    > The linecache module allows one to get any line from any file, while
    > attempting to optimize internally, using a cache, the common case where
    > many lines are read from a single file.
    >>>> import linecache
    >>>> linecache.getline('/etc/passwd', 4)

    > 'sys:x:3:3:sys:/dev:/bin/sh\012'
    > """
    >
    > Please note that you cannot really skip over the lines unless each has
    > a fixed known size. (and if all lines have a fixed, known size then
    > they can be considered as 'records' and you can use seek() and other
    > random access magic. That is why sometimes it is a lot faster to use
    > fixed length rows in a database => increase the speed of search but at
    > the expense of wasted space! - but this is a another topic for another
    > discussion...).
    >
    > So the point is that you won't be able to jump to line 15000 without
    > reading lines 0-14999. You can either iterate over the rows by yourself
    > or simply use the 'linecache' module like shown above. If I were you I
    > would use the linecache, but of course you don't mention anything about
    > the context of your project so it is hard to say.
    >
    > Hope this helps,
    > Nick Vatamaniuc
    >
    >
    > Yi Xing wrote:
    >> Hi All,
    >>
    >> I want to read specific lines of a huge txt file (I know the line #).
    >> Each line might have different sizes. Is there a convenient and fast
    >> way of doing this in Python? Thanks.
    >>
    >> Yi Xing

    >
    John Machin, Jul 16, 2006
    #8
  9. Bill Pursell wrote:

    > Some might argue that this is not really doing
    > it in Python. In fact, I would argue that! But if
    > you're at a command prompt and you want to
    > see line 7358, it's much easier to type
    > % sed -n 7358p
    > than it is to write the python one-liner.


    'sed' is not recognized as an internal or external command,
    operable program or batch file.

    </F>
    Fredrik Lundh, Jul 16, 2006
    #9
  10. In message <>, Fredrik
    Lundh wrote:

    > Bill Pursell wrote:
    >
    >> Some might argue that this is not really doing
    >> it in Python. In fact, I would argue that! But if
    >> you're at a command prompt and you want to
    >> see line 7358, it's much easier to type
    >> % sed -n 7358p
    >> than it is to write the python one-liner.

    >
    > 'sed' is not recognized as an internal or external command,
    > operable program or batch file.


    You're not using Windows, are you?
    Lawrence D'Oliveiro, Jul 16, 2006
    #10
  11. In message <>, Yi Xing
    wrote:

    > I want to read specific lines of a huge txt file (I know the line #).
    > Each line might have different sizes. Is there a convenient and fast
    > way of doing this in Python? Thanks.


    file("myfile.txt").readlines()[LineNr]

    Convenient, yes. Fast, no. :)
    Lawrence D'Oliveiro, Jul 16, 2006
    #11
  12. Yi Xing

    John Machin Guest

    On 16/07/2006 5:16 PM, Fredrik Lundh wrote:
    > Bill Pursell wrote:
    >
    >> Some might argue that this is not really doing
    >> it in Python. In fact, I would argue that! But if
    >> you're at a command prompt and you want to
    >> see line 7358, it's much easier to type
    >> % sed -n 7358p


    aarrbejaysus #1: You *don't* type the '%', you *do* need to specify an
    input file somehow.

    >> than it is to write the python one-liner.

    >
    > 'sed' is not recognized as an internal or external command,
    > operable program or batch file.


    aarrbejaysus #2: Download the installer from

    http://gnuwin32.sourceforge.net/packages/sed.htm
    John Machin, Jul 16, 2006
    #12
  13. Fredrik Lundh, Jul 16, 2006
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Joe Wright
    Replies:
    0
    Views:
    496
    Joe Wright
    Jul 27, 2003
  2. =?Utf-8?B?SmF2?=

    Is ViwState Page-Specific or UserControl-Specific

    =?Utf-8?B?SmF2?=, Aug 16, 2006, in forum: ASP .Net
    Replies:
    2
    Views:
    518
    =?Utf-8?B?SmF2?=
    Aug 16, 2006
  3. Murali
    Replies:
    2
    Views:
    538
    Jerry Coffin
    Mar 9, 2006
  4. mazdotnet
    Replies:
    2
    Views:
    383
    Alexey Smirnov
    Oct 2, 2009
  5. William FERRERES
    Replies:
    7
    Views:
    201
    William FERRERES
    Jul 9, 2007
Loading...

Share This Page