Performance on local constants?

Discussion in 'Python' started by William McBrine, Dec 22, 2007.

  1. Hi all,

    I'm pretty new to Python (a little over a month). I was wondering -- is
    something like this:

    s = re.compile('whatever')

    def t(whatnot):
    return s.search(whatnot)

    for i in xrange(1000):
    print t(something)

    significantly faster than something like this:

    def t(whatnot):
    s = re.compile('whatever')
    return s.search(whatnot)

    for i in xrange(1000):
    result = t(something)

    ? Or is Python clever enough to see that the value of s will be the same
    on every call, and thus only compile it once?

    --
    09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0 -- pass it on
    William McBrine, Dec 22, 2007
    #1
    1. Advertising

  2. William McBrine

    Paddy Guest

    On Dec 22, 10:53 am, William McBrine <> wrote:
    > Hi all,
    >
    > I'm pretty new to Python (a little over a month). I was wondering -- is
    > something like this:
    >
    > s = re.compile('whatever')
    >
    > def t(whatnot):
    > return s.search(whatnot)
    >
    > for i in xrange(1000):
    > print t(something)
    >
    > significantly faster than something like this:
    >
    > def t(whatnot):
    > s = re.compile('whatever')
    > return s.search(whatnot)
    >
    > for i in xrange(1000):
    > result = t(something)
    >
    > ? Or is Python clever enough to see that the value of s will be the same
    > on every call, and thus only compile it once?
    >
    > --
    > 09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0 -- pass it on


    Python RE's do have a cache but telling it to compile multiple times
    is going to take time.

    Best to do as the docs say and compile your RE's once before use if
    you can.

    The timeit module: http://www.diveintopython.org/performance_tuning/timeit.html
    will allow you to do your own timings.

    - Paddy.
    Paddy, Dec 22, 2007
    #2
    1. Advertising

  3. William McBrine

    John Machin Guest

    On Dec 22, 9:53 pm, William McBrine <> wrote:
    > Hi all,
    >
    > I'm pretty new to Python (a little over a month). I was wondering -- is
    > something like this:
    >
    > s = re.compile('whatever')
    >
    > def t(whatnot):
    > return s.search(whatnot)
    >
    > for i in xrange(1000):
    > print t(something)
    >
    > significantly faster than something like this:
    >
    > def t(whatnot):
    > s = re.compile('whatever')
    > return s.search(whatnot)
    >
    > for i in xrange(1000):
    > result = t(something)
    >
    > ?


    No.

    Or is Python clever enough to see that the value of s will be the same
    > on every call,


    No. It doesn't have a crystal ball.

    > and thus only compile it once?


    But it is smart enough to maintain a cache, which achieves the desired
    result.

    Why don't you do some timings?

    While you're at it, try this:

    def t2(whatnot):
    return re.search('whatever', whatnot)

    and this:

    t3 = re.compile('whatever').search

    HTH,
    John
    John Machin, Dec 22, 2007
    #3
  4. William McBrine

    Duncan Booth Guest

    William McBrine <> wrote:

    > Hi all,
    >
    > I'm pretty new to Python (a little over a month). I was wondering -- is
    > something like this:
    >
    > s = re.compile('whatever')
    >
    > def t(whatnot):
    > return s.search(whatnot)
    >
    > for i in xrange(1000):
    > print t(something)
    >
    > significantly faster than something like this:
    >
    > def t(whatnot):
    > s = re.compile('whatever')
    > return s.search(whatnot)
    >
    > for i in xrange(1000):
    > result = t(something)
    >
    > ? Or is Python clever enough to see that the value of s will be the same
    > on every call, and thus only compile it once?
    >


    The best way to answer these questions is always to try it out for
    yourself. Have a look at 'timeit.py' in the library: you can run
    it as a script to time simple things or import it from longer scripts.

    C:\Python25>python lib/timeit.py -s "import re;s=re.compile('whatnot')" "s.search('some long string containing a whatnot')"
    1000000 loops, best of 3: 1.05 usec per loop

    C:\Python25>python lib/timeit.py -s "import re" "re.compile('whatnot').search('some long string containing a whatnot')"
    100000 loops, best of 3: 3.76 usec per loop

    C:\Python25>python lib/timeit.py -s "import re" "re.search('whatnot', 'some long string containing a whatnot')"
    100000 loops, best of 3: 3.98 usec per loop

    So it looks like it takes a couple of microseconds overhead if you
    don't pre-compile the regular expression. That could be significant
    if you have simple matches as above, or irrelevant if the match is
    complex and slow.

    You can also try measuring the compile time separately:

    C:\Python25>python lib/timeit.py -s "import re" "re.compile('whatnot')"
    100000 loops, best of 3: 2.36 usec per loop

    C:\Python25>python lib/timeit.py -s "import re" "re.compile('<(?:p|div)[^>]*>(?P<pat0>(?:(?P<atag0>\\<a[^>]*\\>)\\<img[^>]+class\\s*=[^=>]*captioned[^>]+\\>\\</a\\>)|\\<img[^>]+class\\s*=[^=>]*captioned[^>]+\\>)</(?:p|div)>|(?P<pat1>(?:(?P<atag1>\\<a[^>]*\\>)\\<img[^>]+class\\s*=[^=>]*captioned[^>]+\\>\\</a\\>)|\\<img[^>]+class\\s*=[^=>]*captioned[^>]+\\>)')"
    100000 loops, best of 3: 2.34 usec per loop

    It makes no difference whether you use a trivial regular expression
    or a complex one: Python remembers (if I remember correctly) the last
    100 expressions it compiled,so the compilation overhead will be pretty
    constant.
    Duncan Booth, Dec 22, 2007
    #4
  5. On Sat, 22 Dec 2007 10:53:39 +0000, William McBrine wrote:

    > Hi all,
    >
    > I'm pretty new to Python (a little over a month). I was wondering -- is
    > something like this:
    >
    > s = re.compile('whatever')
    >
    > def t(whatnot):
    > return s.search(whatnot)
    >
    > for i in xrange(1000):
    > print t(something)
    >
    > significantly faster than something like this:
    >
    > def t(whatnot):
    > s = re.compile('whatever')
    > return s.search(whatnot)
    >
    > for i in xrange(1000):
    > result = t(something)
    >
    > ? Or is Python clever enough to see that the value of s will be the same
    > on every call, and thus only compile it once?



    Let's find out:


    >>> import re
    >>> import dis
    >>>
    >>> def spam(x):

    .... s = re.compile('nobody expects the Spanish Inquisition!')
    .... return s.search(x)
    ....
    >>> dis.dis(spam)

    2 0 LOAD_GLOBAL 0 (re)
    3 LOAD_ATTR 1 (compile)
    6 LOAD_CONST 1 ('nobody expects the Spanish
    Inquisition!')
    9 CALL_FUNCTION 1
    12 STORE_FAST 1 (s)

    3 15 LOAD_FAST 1 (s)
    18 LOAD_ATTR 2 (search)
    21 LOAD_FAST 0 (x)
    24 CALL_FUNCTION 1
    27 RETURN_VALUE



    No, the Python compiler doesn't know anything about regular expression
    objects, so it compiles a call to the RE engine which is executed every
    time the function is called.

    However, the re module keeps its own cache, so in fact the regular
    expression itself may only get compiled once regardless.

    Here's another approach that avoids the use of a global variable for the
    regular expression:

    >>> def spam2(x, s=re.compile('nobody expects the Spanish Inquisition!')):

    .... return s.search(x)
    ....
    >>> dis.dis(spam2)

    2 0 LOAD_FAST 1 (s)
    3 LOAD_ATTR 0 (search)
    6 LOAD_FAST 0 (x)
    9 CALL_FUNCTION 1
    12 RETURN_VALUE

    What happens now is that the regex is compiled by the RE engine once, at
    Python-compile time, then stored as the default value for the argument s.
    If you don't supply another value for s when you call the function, the
    default regex is used. If you do, the over-ridden value is used instead:

    >>> spam2("nothing")
    >>> spam2("nothing", re.compile('thing'))

    <_sre.SRE_Match object at 0xb7c29c28>


    I suspect that this will be not only the fastest solution, but also the
    most flexible.



    --
    Steven
    Steven D'Aprano, Dec 22, 2007
    #5
  6. William McBrine

    Dustan Guest

    On Dec 22, 6:04 am, John Machin <> wrote:
    > t3 = re.compile('whatever').search


    Ack! No! Too Pythonic! GETITOFF! GETITOFF!!
    Dustan, Dec 22, 2007
    #6
  7. William McBrine

    Terry Reedy Guest

    "Steven D'Aprano" <> wrote in message
    news:...
    | >>> def spam2(x, s=re.compile('nobody expects the Spanish
    Inquisition!')):
    | ... return s.search(x)
    |
    | I suspect that this will be not only the fastest solution, but also the
    | most flexible.

    'Most flexible' in a different way is

    def searcher(rex):
    crex = re.compile(rex)
    def _(txt):
    return crex.search(txt)
    return _

    One can then create and keep around multiple searchers based on different
    patterns, to be used as needed.

    tjr
    Terry Reedy, Dec 22, 2007
    #7
  8. William McBrine

    John Machin Guest

    On Dec 23, 5:38 am, "Terry Reedy" <> wrote:
    > "Steven D'Aprano" <> wrote in message
    >
    > news:...
    > | >>> def spam2(x, s=re.compile('nobody expects the Spanish
    > Inquisition!')):
    > | ... return s.search(x)
    > |
    > | I suspect that this will be not only the fastest solution, but also the
    > | most flexible.
    >
    > 'Most flexible' in a different way is
    >
    > def searcher(rex):
    > crex = re.compile(rex)
    > def _(txt):
    > return crex.search(txt)
    > return _
    >


    I see your obfuscatory ante and raise you several dots and
    underscores:

    class Searcher(object):
    def __init__(self, rex):
    self.crex = re.compile(rex)
    def __call__(self, txt):
    return self.crex.search(txt)

    Cheers,
    John
    John Machin, Dec 22, 2007
    #8
  9. William McBrine

    Terry Reedy Guest

    "John Machin" <> wrote in message
    news:...
    | On Dec 23, 5:38 am, "Terry Reedy" <> wrote:
    | > 'Most flexible' in a different way is
    | >
    | > def searcher(rex):
    | > crex = re.compile(rex)
    | > def _(txt):
    | > return crex.search(txt)
    | > return _
    | >
    |
    | I see your obfuscatory ante and raise you several dots and
    | underscores:

    I will presume you are merely joking, but for the benefit of any beginning
    programmers reading this, the closure above is a standard functional idiom
    for partial evaluation of a function (in this this, re.search(crex,txt))

    | class Searcher(object):
    | def __init__(self, rex):
    | self.crex = re.compile(rex)
    | def __call__(self, txt):
    | return self.crex.search(txt)

    while this is, the equivalent OO version. Intermdiate Python programmers
    should know both.

    tjr
    Terry Reedy, Dec 23, 2007
    #9
  10. William McBrine

    John Machin Guest

    On Dec 23, 2:39 pm, "Terry Reedy" <> wrote:
    > "John Machin" <> wrote in message
    >
    > news:...
    > | On Dec 23, 5:38 am, "Terry Reedy" <> wrote:
    > | > 'Most flexible' in a different way is
    > | >
    > | > def searcher(rex):
    > | > crex = re.compile(rex)
    > | > def _(txt):
    > | > return crex.search(txt)
    > | > return _
    > | >
    > |
    > | I see your obfuscatory ante and raise you several dots and
    > | underscores:
    >
    > I will presume you are merely joking, but for the benefit of any beginning
    > programmers reading this, the closure above is a standard functional idiom
    > for partial evaluation of a function (in this this, re.search(crex,txt))
    >
    > | class Searcher(object):
    > | def __init__(self, rex):
    > | self.crex = re.compile(rex)
    > | def __call__(self, txt):
    > | return self.crex.search(txt)
    >
    > while this is, the equivalent OO version. Intermdiate Python programmers
    > should know both.
    >


    Semi-joking; I thought that your offering of this:

    def searcher(rex):
    crex = re.compile(rex)
    def _(txt):
    return crex.search(txt)
    return _
    foo_searcher = searcher('foo')

    was somewhat over-complicated, and possibly slower than already-
    mentioned alternatives. The standard idiom etc etc it may be, but the
    OP was interested in getting overhead out of his re searching loop.
    Let's trim it a bit.

    step 1:
    def searcher(rex):
    crexs = re.compile(rex).search
    def _(txt):
    return crexs(txt)
    return _
    foo_searcher = searcher('foo')

    step 2:
    def searcher(rex):
    return re.compile(rex).search
    foo_searcher = searcher('foo')

    step 3:
    foo_searcher = re.compile('foo').search

    HTH,
    John
    John Machin, Dec 23, 2007
    #10
  11. Thanks for all the answers on this. (And, sorry the lousy Subject line; I
    couldn't think of a better one.)

    --
    09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0 -- pass it on
    William McBrine, Dec 26, 2007
    #11
  12. En Sun, 23 Dec 2007 03:55:07 -0300, John Machin <>
    escribió:
    > On Dec 23, 2:39 pm, "Terry Reedy" <> wrote:


    >> I will presume you are merely joking, but for the benefit of any
    >> beginning
    >> programmers reading this, the closure above is a standard functional
    >> idiom
    >> for partial evaluation of a function (in this this, re.search(crex,txt))

    >
    > Semi-joking; I thought that your offering of this:
    >
    > def searcher(rex):
    > crex = re.compile(rex)
    > def _(txt):
    > return crex.search(txt)
    > return _
    > foo_searcher = searcher('foo')
    >
    > was somewhat over-complicated, and possibly slower than already-
    > mentioned alternatives. The standard idiom etc etc it may be, but the
    > OP was interested in getting overhead out of his re searching loop.
    > Let's trim it a bit.
    >
    > step 1:
    > def searcher(rex):
    > crexs = re.compile(rex).search
    > def _(txt):
    > return crexs(txt)
    > return _
    > foo_searcher = searcher('foo')
    >
    > step 2:
    > def searcher(rex):
    > return re.compile(rex).search
    > foo_searcher = searcher('foo')
    >
    > step 3:
    > foo_searcher = re.compile('foo').search


    Nice derivation! Like the word-stairs game: love -> rove -> rave -> have
    -> hate

    --
    Gabriel Genellina
    Gabriel Genellina, Dec 27, 2007
    #12
  13. I get class Searcher(object) but can't for the life of me see why
    (except to be intentionally obtuse) one would use the def
    searcher(rex) pattern which I assure you would call with
    searcher(r)(t) right?

    - mdf


    > >
    > > 'Most flexible' in a different way is
    > >
    > > def searcher(rex):
    > > crex = re.compile(rex)
    > > def _(txt):
    > > return crex.search(txt)
    > > return _
    > >

    >
    > I see your obfuscatory ante and raise you several dots and
    > underscores:
    >
    > class Searcher(object):
    > def __init__(self, rex):
    > self.crex = re.compile(rex)
    > def __call__(self, txt):
    > return self.crex.search(txt)
    >
    Matthew Franz, Dec 27, 2007
    #13
  14. William McBrine

    John Machin Guest

    On Dec 28, 7:53 am, "Matthew Franz" <> wrote:
    > I get class Searcher(object) but can't for the life of me see why
    > (except to be intentionally obtuse) one would use the def
    > searcher(rex) pattern which I assure you would call with
    > searcher(r)(t) right?
    >


    The whole point of the thread was performance across multiple searches
    for the one pattern. Thus one would NOT do
    searcher(r)(t)
    each time a search was required; one would do
    s = searcher(r)
    ONCE, and then do
    s(t)
    each time ...
    John Machin, Dec 27, 2007
    #14
  15. Thanks, that makes more sense. I got tripped up by the function
    returning a function thing and (for a while) thought _ was some sort
    of spooky special variable.

    - mdf

    > On Dec 28, 7:53 am, "Matthew Franz" <> wrote:
    > > I get class Searcher(object) but can't for the life of me see why
    > > (except to be intentionally obtuse) one would use the def
    > > searcher(rex) pattern which I assure you would call with
    > > searcher(r)(t) right?
    > >

    >
    > The whole point of the thread was performance across multiple searches
    > for the one pattern. Thus one would NOT do
    > searcher(r)(t)
    > each time a search was required; one would do
    > s = searcher(r)
    > ONCE, and then do
    > s(t)
    > each time ...
    >
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >




    --
    Matthew Franz
    http://www.threatmind.net/
    Matthew Franz, Dec 27, 2007
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. karim
    Replies:
    1
    Views:
    758
    George Ter-Saakov
    Jun 26, 2003
  2. =?Utf-8?B?WVNVVA==?=

    Access local port or Running local exe file

    =?Utf-8?B?WVNVVA==?=, Jan 14, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    500
    =?Utf-8?B?WVNVVA==?=
    Jan 14, 2006
  3. Jim in Arizona
    Replies:
    8
    Views:
    4,959
    Jim in Arizona
    Jan 24, 2006
  4. Steven Bethard
    Replies:
    9
    Views:
    266
    Steven Bethard
    Jan 25, 2005
  5. Luca Cerone
    Replies:
    4
    Views:
    270
    Luca Cerone
    Mar 2, 2012
Loading...

Share This Page