performance problem with time.strptime()

Discussion in 'Python' started by Nils Rüttershoff, Jul 2, 2009.

  1. Hi everyone,

    In my free time I translate scripts from open source projects or write
    my own, to train my python skills. ATM I convert the aplogmerge.pl from
    awstats. It merges multiple apache logfiles and sort the output by the
    timestamps of each line. My first version of this script hasn't a good
    performance, so I started profiling. It turned out that the script spend
    much time for converting the timestamps of the line into a struct_time
    object. Here a code example (Python 2.6.2 on Ubuntu 7.10):

    Rec = re.compile(r"^\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\s-\s\d+\s\[(\d{2}/\w+/\d{4}:\d{2}:\d{2}:\d{2})\s\+\d{4}\].*")
    Line = '1.2.3.4 - 4459 [02/Jul/2009:01:50:26 +0200] "GET /foo HTTP/1.0" 200 - "-" "www.example.org" "-" "-" "-"'

    def strptime():
    m = Rec.match(Line)
    if m:
    date_string = m.group(1)
    # date_string example: '02/Jul/2009:01:50:26'
    return time.strptime(date_string, "%d/%b/%Y:%H:%M:%S")

    with timeit this functions takes approximate 125 sec and 29.004.081
    function calls (I've configured timeit with 1.000.000 runs). A look at
    the output of cProfile told me that more than the half time is spent in
    locale.py:

    1000002 11.712 0.000 19.592 0.000 locale.py:316(normalize)

    1000002 3.639 0.000 23.231 0.000 locale.py:382(_parse_localename)

    1000002 5.162 0.000 30.271 0.000 locale.py:481(getlocale)



    I studied the time documentation and thought that I had to set TZ in os
    environ:

    os.environ['TZ'] = 'Europe/Berlin'

    time.set()


    but that had no effect. I don't know why time.strptime() looks every
    time for my locale. Maybe it's a problem with my OS locale... However.
    I've introduced a work around, which works perfectly for my problem. For
    time comparison I could use any sort of time representation and so I
    convert to epoch:

    # needed to translate month str to dig repr

    Shortmonth = {'Jan' : '01',

    'Feb' : '02',

    'Mar' : '03',

    'Apr' : '04',

    'May' : '05',

    'Jun' : '06',

    'Jul' : '07',

    'Aug' : '08',

    'Sep' : '09',

    'Oct' : '10',

    'Nov' : '11',

    'Dec' : '12'}

    Rec = re.compile(r"^\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\s-\s\d+\s\[(?P<day>\d{2})/(?P<month>\w+)/(?P<year>\d{4}):(?P<hour>\d{2}):(?P<min>\d{2}):(?P<sec>\d{2})\s\+\d{4}\].*")

    Line = '1.2.3.4 - 4459 [02/Jul/2009:01:50:26 +0200] "GET /foo HTTP/1.0" 200 - "-" "www.example.org" "-" "-" "-"'

    def epoch():

    m = Rec.match(Line)

    if m:

    result = m.groupdict()

    date_tuple = (result["year"], Shortmonth[result["month"]], result["day"], result["hour"], result["min"], result["sec"], -1, -1, -1)

    date_tuple = map(int,date_tuple)

    return time.mktime(date_tuple)


    with this workaround I had a speed up to 4 times; it tooks only 31 sec
    with only 5.000.009 function calls. Maybe this helps some of you, who
    had the similar problems with time conversion....

    ....But one big question remains: Why time.strptime() checks everytime
    the locale? had I missed something or could I have a problem with my OS
    locale?

    With python 3.1 there is no difference, unless that time.strptime() took
    approx 12 sec longer... :(

    regards, Nils



    Here a complete test script:

    #!/opt/python/2.6.2/bin/python

    import time

    import timeit

    import cProfile

    import re

    # needed to tranlate month str to dig repr

    Shortmonth = {'Jan' : '01',

    'Feb' : '02',

    'Mar' : '03',

    'Apr' : '04',

    'May' : '05',

    'Jun' : '06',

    'Jul' : '07',

    'Aug' : '08',

    'Sep' : '09',

    'Oct' : '10',

    'Nov' : '11',

    'Dec' : '12'}

    Rec1 = re.compile(r"^\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\s-\s\d+\s\[(?P<day>\d{2})/(?P<month>\w+)/(?P<year>\d{4}):(?P<hour>\d{2}):(?P<min>\d{2}):(?P<sec>\d{2})\s\+\d{4}\].*")

    Rec2 = re.compile(r"^\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\s-\s\d+\s\[(\d{2}/\w+/\d{4}:\d{2}:\d{2}:\d{2})\s\+\d{4}\].*")

    Line = '1.2.3.4 - 4459 [02/Jul/2009:01:50:26 +0200] "GET /foo HTTP/1.0" 200 - "-" "www.example.org" "-" "-" "-"'

    def epoch():

    m = Rec1.match(Line)

    if m:

    result = m.groupdict()

    date_tuple = (result["year"], Shortmonth[result["month"]], result["day"], result["hour"], result["min"], result["sec"], -1, -1, -1)

    date_tuple = map(int,date_tuple)

    return time.mktime(date_tuple)

    def strptime():

    m = Rec2.match(Line)

    if m:

    date_string = m.group(1)

    return time.strptime(date_string, "%d/%b/%Y:%H:%M:%S")

    if __name__ == "__main__":

    t1 = timeit.Timer("epoch()","from __main__ import epoch")

    t2 = timeit.Timer("strptime()", "from __main__ import strptime")

    cProfile.run("t1.timeit();print")

    print ""

    cProfile.run("t2.timeit();print")
     
    Nils Rüttershoff, Jul 2, 2009
    #1
    1. Advertising

  2. On Jul 2, 7:30 am, Nils Rüttershoff <> wrote:

    > Rec = re.compile(r"^\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\s-\s\d+\s\[(\d{2}/\w+/\d{4}:\d{2}:\d{2}:\d{2})\s\+\d{4}\].*")
    > Line = '1.2.3.4 - 4459 [02/Jul/2009:01:50:26 +0200] "GET /foo HTTP/1.0" 200 - "-" "www.example.org" "-" "-" "-"'


    I'm not sure how much it will help but if you are only using the regex
    to get the date/time group element, it might be faster to replace the
    regex with:

    >>> date_string = Line.split()[3][1:-1]
     
    Casey Webster, Jul 2, 2009
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. George Trojan

    strptime performance

    George Trojan, Nov 3, 2003, in forum: Python
    Replies:
    3
    Views:
    740
    John Roth
    Nov 4, 2003
  2. Delaney, Timothy C (Timothy)

    RE: strptime performance

    Delaney, Timothy C (Timothy), Nov 4, 2003, in forum: Python
    Replies:
    1
    Views:
    449
    John Roth
    Nov 4, 2003
  3. kongkolvyu
    Replies:
    4
    Views:
    495
    Old Wolf
    Jan 4, 2006
  4. Greg Krohn
    Replies:
    1
    Views:
    991
    Denis Kasak
    Feb 17, 2009
  5. m_ahlenius

    problem with strptime and time zone

    m_ahlenius, Aug 24, 2010, in forum: Python
    Replies:
    5
    Views:
    2,156
    Alex Willmer
    Aug 25, 2010
Loading...

Share This Page