Re: efficient text file search.

Discussion in 'Python' started by Bill Scherer, Sep 11, 2006.

  1. Bill Scherer

    Bill Scherer Guest

    noro wrote:

    >Is there a more efficient method to find a string in a text file then:
    >
    >f=file('somefile')
    >for line in f:
    > if 'string' in line:
    > print 'FOUND'
    >
    >?
    >
    >BTW:
    >does "for line in f: " read a block of line to te memory or is it
    >simply calls f.readline() many times?
    >
    >thanks
    >amit
    >
    >

    If your file is sorted by some key in the data, you can build a very
    fast binary search with mmap in Python.
     
    Bill Scherer, Sep 11, 2006
    #1
    1. Advertising

  2. Bill Scherer

    noro Guest

    can you add some more info, or point me to a link, i havn't found
    anything about binary search in mmap() in python documents.

    the files are very big...

    thanks
    amit
    Bill Scherer wrote:
    > noro wrote:
    >
    > >Is there a more efficient method to find a string in a text file then:
    > >
    > >f=file('somefile')
    > >for line in f:
    > > if 'string' in line:
    > > print 'FOUND'
    > >
    > >?
    > >
    > >BTW:
    > >does "for line in f: " read a block of line to te memory or is it
    > >simply calls f.readline() many times?
    > >
    > >thanks
    > >amit
    > >
    > >

    > If your file is sorted by some key in the data, you can build a very
    > fast binary search with mmap in Python.
     
    noro, Sep 11, 2006
    #2
    1. Advertising

  3. Bill Scherer

    Steve Holden Guest

    noro wrote:
    > Bill Scherer wrote:
    >
    >>noro wrote:
    >>
    >>
    >>>Is there a more efficient method to find a string in a text file then:
    >>>
    >>>f=file('somefile')
    >>>for line in f:
    >>> if 'string' in line:
    >>> print 'FOUND'
    >>>
    >>>?
    >>>
    >>>BTW:
    >>>does "for line in f: " read a block of line to te memory or is it
    >>>simply calls f.readline() many times?
    >>>
    >>>thanks
    >>>amit
    >>>
    >>>

    >>
    >>If your file is sorted by some key in the data, you can build a very
    >>fast binary search with mmap in Python.

    >
    >
    > can you add some more info, or point me to a link, i haven't found
    > anything about binary search in mmap() in python documents.
    >
    > the files are very big...
    >

    [please don't "top-post": add your latest comments at the end so the
    story reads from the beginning].

    I think this is probably not going to help you. A binary search is only
    useful if you want to locate a value in an ordered list. Since your
    original posting made it seem like the text you are looking for could
    appear in any position in any line of the file a binary search doesn't
    do you any good at all (in fact it complicates things and slows them
    down unnecessarily) because you'd still need to look at all lines.

    Plus, if the lines are of variable length then you'd need to start by
    creating an index of them, meaning you'd have to go right through the
    file anyway.

    regards
    Steve
    --
    Steve Holden +44 150 684 7255 +1 800 494 3119
    Holden Web LLC/Ltd http://www.holdenweb.com
    Skype: holdenweb http://holdenweb.blogspot.com
    Recent Ramblings http://del.icio.us/steve.holden
     
    Steve Holden, Sep 11, 2006
    #3
  4. Bill Scherer

    noro Guest

    i'm not sure.

    each line in the text file and an index string. i can sort the file,
    and use some binary tree search on
    it. (I need to do a number of searchs).
    there are 1219137 indexs in the file. so maby a memory efficient sort
    algorithm is in place.
    how can mmap help me?
    is there any fbinary search algorithm for text files out there or do i
    need to write one?


    Steve Holden wrote:
    > noro wrote:
    > > Bill Scherer wrote:
    > >
    > >>noro wrote:
    > >>
    > >>
    > >>>Is there a more efficient method to find a string in a text file then:
    > >>>
    > >>>f=file('somefile')
    > >>>for line in f:
    > >>> if 'string' in line:
    > >>> print 'FOUND'
    > >>>
    > >>>?
    > >>>
    > >>>BTW:
    > >>>does "for line in f: " read a block of line to te memory or is it
    > >>>simply calls f.readline() many times?
    > >>>
    > >>>thanks
    > >>>amit
    > >>>
    > >>>
    > >>
    > >>If your file is sorted by some key in the data, you can build a very
    > >>fast binary search with mmap in Python.

    > >
    > >
    > > can you add some more info, or point me to a link, i haven't found
    > > anything about binary search in mmap() in python documents.
    > >
    > > the files are very big...
    > >

    > [please don't "top-post": add your latest comments at the end so the
    > story reads from the beginning].
    >
    > I think this is probably not going to help you. A binary search is only
    > useful if you want to locate a value in an ordered list. Since your
    > original posting made it seem like the text you are looking for could
    > appear in any position in any line of the file a binary search doesn't
    > do you any good at all (in fact it complicates things and slows them
    > down unnecessarily) because you'd still need to look at all lines.
    >
    > Plus, if the lines are of variable length then you'd need to start by
    > creating an index of them, meaning you'd have to go right through the
    > file anyway.
    >
    > regards
    > Steve
    > --
    > Steve Holden +44 150 684 7255 +1 800 494 3119
    > Holden Web LLC/Ltd http://www.holdenweb.com
    > Skype: holdenweb http://holdenweb.blogspot.com
    > Recent Ramblings http://del.icio.us/steve.holden
     
    noro, Sep 11, 2006
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. noro

    efficient text file search.

    noro, Sep 11, 2006, in forum: Python
    Replies:
    10
    Views:
    495
    Sion Arrowsmith
    Sep 12, 2006
  2. Amit
    Replies:
    1
    Views:
    301
    Howard Hinnant
    Sep 24, 2005
  3. Replies:
    4
    Views:
    349
    Jerry Coffin
    Jul 28, 2006
  4. Abby Lee
    Replies:
    5
    Views:
    415
    Abby Lee
    Aug 2, 2004
  5. martin
    Replies:
    9
    Views:
    179
    Xicheng Jia
    Apr 15, 2006
Loading...

Share This Page