multi regexp analyzer ? or how to do...

Discussion in 'Python' started by joh12005@yahoo.fr, Jun 30, 2005.

  1. Guest

    Hello,

    here is a trouble that i had, i would like to resolve it with python,
    even if i still have no clue on how to do it.

    i had many small "text" files, so to speed up processes on them, i used
    to copy them inside a huge one adding some king of xml separator :

    <file name="...">
    [content]
    </file>

    content is tab separated data (columns) ; data are strings

    now here come the tricky part for me :

    i would like to be able to create some kind of matching rules, using
    regular expressions, rules should match data on one line (the smallest
    data unit for me) or a set of lines, say for example :

    if on this line , match first column against this regexp and match
    second column
    and on following line match third column
    -> trigger something

    so, here is how i had tried :

    - having all the rules,
    - build some kind of analyzer for each rule,
    - keep size of longest one L,
    - then read each line of the huge file one by one,
    - inside a "file", create all the subsets of length <= L
    - for each analyzer see if it matches any of the subsets
    - if it occurs...

    my trouble is here :

    "for each analyzer see if it matches any of the subset"

    it is really to slow, i had many many rules, and as it is "for loop
    inside for loop", and inside each rule also "for loop on subsets lines"
    i need to speed up that, have you any idea ?

    i am thinking of having "only rules for one line" and to keep traces of
    if a rule is a "ending one" (to trigger something) , or a "must
    continue" , but is still unclear to me for now...

    a great thing could also have been some sort of dict with regexp
    keys...

    (and actually it would be great if i could also use some kind of regexp
    operator to tell one can skip the content of 0 to n lines before
    matching, just as if in the example i had changed "following..." by
    "skip at least 2 lines and match third column on next line - it would
    be great, but i still have really no idea on how to even think about
    that)

    great thx to anybody who could help,

    best
    , Jun 30, 2005
    #1
    1. Advertising

  2. Paul McGuire Guest

    I'd propose a pyparsing implementation, but you don't give us many
    specifics. Is there any chance you could post some sample data, and
    one or two of the regexps you are using for matching?

    -- Paul
    Paul McGuire, Jun 30, 2005
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Thomas Gutzler

    search for netnames in design analyzer

    Thomas Gutzler, Dec 18, 2003, in forum: VHDL
    Replies:
    1
    Views:
    546
    davidjaffer666
    Nov 29, 2010
  2. Kenneth Brun Nielsen

    Synopsys Design Analyzer in command prompt

    Kenneth Brun Nielsen, May 9, 2005, in forum: VHDL
    Replies:
    6
    Views:
    2,919
    Kenneth Brun Nielsen
    May 9, 2005
  3. Yovav
    Replies:
    0
    Views:
    517
    Yovav
    Nov 4, 2003
  4. Simon Devlin
    Replies:
    0
    Views:
    679
    Simon Devlin
    Jan 20, 2004
  5. Joao Silva
    Replies:
    16
    Views:
    328
    7stud --
    Aug 21, 2009
Loading...

Share This Page