Using regular expressions to extract substrings from files

Discussion in 'Python' started by Timothy Hume, Sep 10, 2004.

  1. Timothy Hume

    Timothy Hume Guest

    Hi,

    I am new to Python, and was wondering if it is possible to operate on
    files using regular expressions.

    What I mean is this:
    - It is easy to search for a substring of a string using regular
    expressions
    - Can I also search for a substring inside a file using regular
    expressions? The substring may span several lines (ie there may be
    embedded new line and carriage return characters).

    So far, the only way I know how to do this is to read the entire file into
    a string, and then parse the resulting string with regular expressions.
    This is OK for small files (in fact it is probably quite efficient,
    because the disc I/O is done all at once). However, once the files get
    large, there is the risk I will run out of memory. The closest UNIX tool I
    can think of to do this sort of job is grep, but that doesn't have the
    power and flexibility of Python.

    Any ideas would be appreciated.

    Tim Hume
    Bureau of Meteorology Research Centre
    Melbourne
    Australia
     
    Timothy Hume, Sep 10, 2004
    #1
    1. Advertising

  2. Timothy Hume

    Jason Lai Guest

    Timothy Hume wrote:
    > Hi,
    >
    > I am new to Python, and was wondering if it is possible to operate on
    > files using regular expressions.
    >
    > What I mean is this:
    > - It is easy to search for a substring of a string using regular
    > expressions
    > - Can I also search for a substring inside a file using regular
    > expressions? The substring may span several lines (ie there may be
    > embedded new line and carriage return characters).
    >
    > So far, the only way I know how to do this is to read the entire file into
    > a string, and then parse the resulting string with regular expressions.
    > This is OK for small files (in fact it is probably quite efficient,
    > because the disc I/O is done all at once). However, once the files get
    > large, there is the risk I will run out of memory. The closest UNIX tool I
    > can think of to do this sort of job is grep, but that doesn't have the
    > power and flexibility of Python.
    >
    > Any ideas would be appreciated.
    >
    > Tim Hume
    > Bureau of Meteorology Research Centre
    > Melbourne
    > Australia
    >


    http://docs.python.org/lib/module-mmap.html
     
    Jason Lai, Sep 10, 2004
    #2
    1. Advertising

  3. Timothy Hume

    Brian Szmyd Guest

    Timothy Hume wrote:

    > Hi,
    >
    > I am new to Python, and was wondering if it is possible to operate on
    > files using regular expressions.
    >
    > What I mean is this:
    > - It is easy to search for a substring of a string using regular
    > expressions
    > - Can I also search for a substring inside a file using regular
    > expressions? The substrin g may span several lines (ie there may be
    > embedded new line and carriage return characters).
    >
    > So far, the only way I know how to do this is to read the entire file into
    > a string, and then parse the resulting string with regular expressions.
    > This is OK for small files (in fact it is probably quite efficient,
    > because the disc I/O is done all at once). However, once the files get
    > large, there is the risk I will run out of memory. The closest UNIX tool I
    > can think of to do this sort of job is grep, but that doesn't have the
    > power and flexibility of Python.
    >
    > Any ideas would be appreciated.
    >
    > Tim Hume
    > Bureau of Meteorology Research Centre
    > Melbourne
    > Australia


    You could always call grep from python if that will work for you, otherwise
    you'll probably have to read in the file using some buffer and check the
    buffer each time, problem is, what if it spans two buffers right?

    As for spanning lines, they fall under the category of "whitespace", so
    allowing them in your regular expression would be appropriate.

    -regards
    brian szmyd
     
    Brian Szmyd, Sep 10, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Markus Dehmann

    regex: How to extract substrings?

    Markus Dehmann, Dec 10, 2005, in forum: Java
    Replies:
    2
    Views:
    798
    IchBin
    Dec 10, 2005
  2. Amit Khemka
    Replies:
    8
    Views:
    307
    Amit Khemka
    Nov 23, 2005
  3. Jimbo
    Replies:
    3
    Views:
    1,582
    Walter Overby
    May 1, 2010
  4. Amittai Aviram
    Replies:
    0
    Views:
    96
    Amittai Aviram
    Nov 15, 2004
  5. Noman Shapiro
    Replies:
    0
    Views:
    235
    Noman Shapiro
    Jul 17, 2013
Loading...

Share This Page