Re: Regex Generator From Multiple Files

Discussion in 'Python' started by MRAB, Jan 6, 2009.

  1. MRAB

    MRAB Guest

    James Pruitt wrote:
    > I am looking for a way given a number of files, say 3, that represent
    > technical support tickets in the same format to generate regular
    > expressions for the different fields automatically.
    >
    > An example from of one line from each file:
    > Date: 12/30/2008 Room: 457 Building: Main
    > Date: 12/31/2008 Room: A21 Building: Annex
    > Date: 1/4/2009 Room: L69 Building: Library
    >
    > The program would then, possibly using the python diff library, generate
    > the regular expression needed to parse out different fields. In this
    > case it might return a tuple like
    > ("^Date:[\w]+(.*)[\w]+Room","Room:[\w]+(.*)[\w]+Building","Building:[\w]+(.*)[\w]+$")
    > that would match each of the fields based on the common data and sort of
    > assume that what doesn't change between them is data we are looking for.
    >

    Why not just assume that each field consists of a word terminated by a
    colon, then some text, then the next field or the end of the line?
    MRAB, Jan 6, 2009
    #1
    1. Advertising

  2. MRAB

    Jeremy.Chen Guest

    On Jan 6, 8:48 am, MRAB <> wrote:
    > James Pruitt wrote:
    > > I am looking for a way given a number of files, say 3, that represent
    > > technical support tickets in the same format to generate regular
    > > expressions for the different fields automatically.

    >
    > > An example from of one line from each file:
    > > Date: 12/30/2008 Room: 457 Building: Main
    > > Date: 12/31/2008 Room: A21 Building: Annex
    > > Date: 1/4/2009 Room: L69 Building: Library

    >
    > > The program would then, possibly using the python diff library, generate
    > > the regular expression needed to parse out different fields. In this
    > > case it might return a tuple like
    > > ("^Date:[\w]+(.*)[\w]+Room","Room:[\w]+(.*)[\w]+Building","Building:[\w]+(.­*)[\w]+$")
    > > that would match each of the fields based on the common data and sort of
    > > assume that what doesn't change between them is data we are looking for..

    >
    > Why not just assume that each field consists of a word terminated by a
    > colon, then some text, then the next field or the end of the line?- Hide quoted text -
    >
    > - Show quoted text -


    do you mean the sub method?
    -------------
    re.sub( r'(?i)(example)',self.captureRegxp,content )
    Jeremy.Chen, Jan 6, 2009
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Martin Maurer
    Replies:
    3
    Views:
    4,783
    Peter
    Apr 19, 2006
  2. TheDustbustr
    Replies:
    1
    Views:
    436
    Sami Hangaslammi
    Jul 25, 2003
  3. Replies:
    4
    Views:
    937
    M.E.Farmer
    Feb 13, 2005
  4. Replies:
    9
    Views:
    529
  5. Replies:
    3
    Views:
    731
    Reedick, Andrew
    Jul 1, 2008
Loading...

Share This Page