multi regexp analyzer ? or how to do...

Discussion in 'Python' started by joh12005@yahoo.fr, Jun 30, 2005.

  1. Guest

    Hello,

    here is a trouble that i had, i would like to resolve it with python,
    even if i still have no clue on how to do it.

    i had many small "text" files, so to speed up processes on them, i used
    to copy them inside a huge one adding some king of xml separator :

    <file name="...">
    [content]
    </file>

    content is tab separated data (columns) ; data are strings

    now here come the tricky part for me :

    i would like to be able to create some kind of matching rules, using
    regular expressions, rules should match data on one line (the smallest
    data unit for me) or a set of lines, say for example :

    if on this line , match first column against this regexp and match
    second column
    and on following line match third column
    -> trigger something

    so, here is how i had tried :

    - having all the rules,
    - build some kind of analyzer for each rule,
    - keep size of longest one L,
    - then read each line of the huge file one by one,
    - inside a "file", create all the subsets of length <= L
    - for each analyzer see if it matches any of the subsets
    - if it occurs...

    my trouble is here :

    "for each analyzer see if it matches any of the subset"

    it is really to slow, i had many many rules, and as it is "for loop
    inside for loop", and inside each rule also "for loop on subsets lines"
    i need to speed up that, have you any idea ?

    i am thinking of having "only rules for one line" and to keep traces of
    if a rule is a "ending one" (to trigger something) , or a "must
    continue" , but is still unclear to me for now...

    a great thing could also have been some sort of dict with regexp
    keys...

    (and actually it would be great if i could also use some kind of regexp
    operator to tell one can skip the content of 0 to n lines before
    matching, just as if in the example i had changed "following..." by
    "skip at least 2 lines and match third column on next line - it would
    be great, but i still have really no idea on how to even think about
    that)

    great thx to anybody who could help,

    best
     
    , Jun 30, 2005
    #1
    1. Advertisements

  2. Paul McGuire Guest

    I'd propose a pyparsing implementation, but you don't give us many
    specifics. Is there any chance you could post some sample data, and
    one or two of the regexps you are using for matching?

    -- Paul
     
    Paul McGuire, Jun 30, 2005
    #2
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John Grandy
    Replies:
    0
    Views:
    789
    John Grandy
    Sep 13, 2005
  2. Dave
    Replies:
    3
    Views:
    555
    Kevin Goodsell
    Apr 19, 2004
  3. Greg Hurrell
    Replies:
    4
    Views:
    336
    James Edward Gray II
    Feb 14, 2007
  4. Mikel Lindsaar
    Replies:
    0
    Views:
    748
    Mikel Lindsaar
    Mar 31, 2008
  5. Joao Silva
    Replies:
    16
    Views:
    671
    7stud --
    Aug 21, 2009
  6. Uldis  Bojars
    Replies:
    2
    Views:
    336
    Janwillem Borleffs
    Dec 17, 2006
  7. Matìj Cepl

    new RegExp().test() or just RegExp().test()

    Matìj Cepl, Nov 24, 2009, in forum: Javascript
    Replies:
    3
    Views:
    426
    Matěj Cepl
    Nov 24, 2009
  8. Brett
    Replies:
    11
    Views:
    398
    Lasse Reichstein Nielsen
    Aug 10, 2010
Loading...