re.I slowness

Discussion in 'Python' started by vvikram@gmail.com, Mar 30, 2006.

  1. Guest

    We process a lot of messages in a file based on some regex pattern(s)
    we have in a db.
    If I compile the regex using re.I, the processing time is substantially
    more than if I
    don't i.e using re.I is slow.

    However, more surprisingly, if we do something on the lines of :

    s = <regex string>
    s = s.lower()
    t = dict([(k, '[%s%s]' % (k, k.upper())) for k in
    string.ascii_lowercase])
    for k in t: s = s.replace(k, t[k])
    re.compile(s)
    .......

    its much better than using plainly re.I.

    So the qns are:
    a) Why is re.I so slow in general?
    b) What is the underlying implementation used and what is wrong, if
    any,
    with above method and why is it not used instead?

    Thanks
    Vikram
     
    , Mar 30, 2006
    #1
    1. Advertising

  2. Paul McGuire Guest

    <> wrote in message
    news:...
    > We process a lot of messages in a file based on some regex pattern(s)
    > we have in a db.
    > If I compile the regex using re.I, the processing time is substantially
    > more than if I
    > don't i.e using re.I is slow.
    >
    > However, more surprisingly, if we do something on the lines of :
    >
    > s = <regex string>
    > s = s.lower()
    > t = dict([(k, '[%s%s]' % (k, k.upper())) for k in
    > string.ascii_lowercase])
    > for k in t: s = s.replace(k, t[k])
    > re.compile(s)
    > ......
    >
    > its much better than using plainly re.I.
    >
    > So the qns are:
    > a) Why is re.I so slow in general?
    > b) What is the underlying implementation used and what is wrong, if
    > any,
    > with above method and why is it not used instead?
    >
    > Thanks
    > Vikram
    >

    Can't tell you why re.I is slow, but perhaps this expression will make your
    RE transform a little plainer (no need to create that dictionary of uppers
    and lowers).

    s = <regex string>
    makeReAlphaCharLowerOrUpper = lambda c : c.isalpha() and "[%s%s]" %
    (c.lower(),c.upper()) or c
    s_optimized = "".join( makeReAlphaCharLowerOrUpper(k) for k in s)

    or

    s_optimized = "".join( map( makeReAlphaCharLowerOrUpper, s ) )


    Just curious, but what happens if your RE contains something like this
    spelling check error finder:
    "[^c]ei"
    (looking for violations of "i before e except after c")

    Can []'s nest in an RE?

    -- Paul
     
    Paul McGuire, Mar 30, 2006
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jason K
    Replies:
    6
    Views:
    3,994
    Jeff Flinn
    May 12, 2005
  2. Replies:
    10
    Views:
    552
  3. Replies:
    2
    Views:
    391
  4. Joshua Cranmer

    The myth of Java's slowness

    Joshua Cranmer, Dec 8, 2007, in forum: Java
    Replies:
    15
    Views:
    621
  5. Sigfried

    Slowness of SAX

    Sigfried, Nov 12, 2008, in forum: Java
    Replies:
    18
    Views:
    671
    Arne Vajhøj
    Nov 19, 2008
Loading...

Share This Page