speed

Discussion in 'Python' started by Peter Kleiweg, Aug 19, 2004.

  1. I implemented a lexer in Pylly and compared it to the version I
    had written in Flex. Processing 219062 lines took 0.9 seconds in
    C (from Flex), and 5 minutes 54 second in Python (from Pylly), a
    ratio of 393 to 1.

    Is this normal for Python, or does Flex produce better parsers
    than Pylly? I have been looking at the code produced by Flex to
    see if I could translate it to Python automaticly. But it has a
    lot of goto statements, and I haven't figured out how to
    translate those to Python efficiently.

    What are the average times used for text processing of Python
    compared to C?

    --
    Peter Kleiweg L:NL,af,da,de,en,ia,nds,no,sv,(fr,it) S:NL,de,en,(da,ia)
    info: http://www.let.rug.nl/~kleiweg/ls.html
    Peter Kleiweg, Aug 19, 2004
    #1
    1. Advertising

  2. Peter Kleiweg

    John Lenton Guest

    On Thu, Aug 19, 2004 at 03:37:26PM +0200, Peter Kleiweg wrote:
    >
    > I implemented a lexer in Pylly and compared it to the version I
    > had written in Flex. Processing 219062 lines took 0.9 seconds in
    > C (from Flex), and 5 minutes 54 second in Python (from Pylly), a
    > ratio of 393 to 1.
    >
    > Is this normal for Python, or does Flex produce better parsers
    > than Pylly? I have been looking at the code produced by Flex to
    > see if I could translate it to Python automaticly. But it has a
    > lot of goto statements, and I haven't figured out how to
    > translate those to Python efficiently.


    flex has an option to generate code without the gotos...

    --
    John Lenton () -- Random fortune:
    Don't read everything you believe.

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.4 (GNU/Linux)

    iD8DBQFBJLEYgPqu395ykGsRAnZWAJ9Kf/+vqmZ/t/FJrBWvfsQPwMVdXwCgk7Jp
    YmxLnwJ2ciNDG9qzeKHSW/s=
    =BquW
    -----END PGP SIGNATURE-----
    John Lenton, Aug 19, 2004
    #2
    1. Advertising

  3. John Lenton schreef:


    > flex has an option to generate code without the gotos...


    I have the latest version. I can't find it, not as run time
    option, not as build option.



    --
    Peter Kleiweg L:NL,af,da,de,en,ia,nds,no,sv,(fr,it) S:NL,de,en,(da,ia)
    info: http://www.let.rug.nl/~kleiweg/ls.html
    Peter Kleiweg, Aug 19, 2004
    #3
  4. Peter Kleiweg

    John Lenton Guest

    On Thu, Aug 19, 2004 at 04:16:24PM +0200, Peter Kleiweg wrote:
    > John Lenton schreef:
    >
    >
    > > flex has an option to generate code without the gotos...

    >
    > I have the latest version. I can't find it, not as run time
    > option, not as build option.


    hmm! you're right... I wonder what lexer it was, then? I definitely
    have a weak ref to the option in my head, but the owner has been gc'ed
    :(

    --
    John Lenton () -- Random fortune:
    There was a phone call for you.

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.4 (GNU/Linux)

    iD8DBQFBJLuogPqu395ykGsRAhDKAJ4xO/JWXvLl8UnQGpV3VzZWE7ArWwCgtefk
    Kdqboao+WYsvWqsdZkgz2UY=
    =4JCc
    -----END PGP SIGNATURE-----
    John Lenton, Aug 19, 2004
    #4
  5. Peter Kleiweg <> wrote:
    > I implemented a lexer in Pylly and compared it to the version I
    > had written in Flex. Processing 219062 lines took 0.9 seconds in
    > C (from Flex), and 5 minutes 54 second in Python (from Pylly), a
    > ratio of 393 to 1.
    >
    > Is this normal for Python, or does Flex produce better parsers
    > than Pylly? I have been looking at the code produced by Flex to
    > see if I could translate it to Python automaticly. But it has a
    > lot of goto statements, and I haven't figured out how to
    > translate those to Python efficiently.
    >
    > What are the average times used for text processing of Python
    > compared to C?


    I don't know Pylly, but I guess it generates a parser using
    a finite automaton -- just like lex/flex, except it handles
    every single character in Python, wheres lex/flex will lead
    to compiled C code. That would explain the speed difference.

    When I have to parse something in Python, I try to do that
    using things like string.split(), string.find(), the "re"
    module etc. Those things are written in C, therefore they
    are fast enough for most applications. There are also some
    modules for specialized cases, such as "ConfigParser" and
    "shlex". See the Python Library Reference.

    Best regards
    Oliver

    --
    Oliver Fromme, Konrad-Celtis-Str. 72, 81369 Munich, Germany

    ``All that we see or seem is just a dream within a dream.''
    (E. A. Poe)
    Oliver Fromme, Aug 19, 2004
    #5
  6. Peter Kleiweg

    Ayose Guest

    Hi,

    On Thu, Aug 19, 2004 at 03:37:26PM +0200, Peter Kleiweg wrote:
    >
    > I implemented a lexer in Pylly and compared it to the version I
    > had written in Flex. Processing 219062 lines took 0.9 seconds in
    > C (from Flex), and 5 minutes 54 second in Python (from Pylly), a
    > ratio of 393 to 1.
    >
    > Is this normal for Python, or does Flex produce better parsers
    > than Pylly? I have been looking at the code produced by Flex to
    > see if I could translate it to Python automaticly. But it has a
    > lot of goto statements, and I haven't figured out how to
    > translate those to Python efficiently.


    Don't try to translate the generated code to python. Python code is
    (almost) always slower than C code, because C is converted into machine
    code, and Python has to be interpreted by the VM. Besides, python does a
    lot of checks.

    Try with PLY, <http://systems.cs.uchicago.edu/ply/>. If you have
    experience with flex/yacc in C, this module should be easy to use.

    You can also play with Psyco (a JIT compiler for x86) or even with
    Pyrex.

    But, IMHO, if you has to process very big files, don't do it with
    python. Instead, write a simple C-module, which uses your Flex parser
    and creates python objects with that information. It should be trivial
    if you have experience with the C API. :)

    >
    > What are the average times used for text processing of Python
    > compared to C?
    >


    IMO, Python is a powerful language to do almost everything, but in some
    cases it is bad. One of this cases is intensive computing (like parsing a
    big file). Use the correct tool =)

    --
    Ayose Cazorla León
    Debian GNU/Linux - setepo
    Ayose, Aug 20, 2004
    #6
  7. Another Python parser generator to look into is SimpleParse/mxTextTools

    <http://simpleparse.sourceforge.net/>

    We use it to parse and process large log files. In our case, a typical
    grammar contains over 250 productions and parsing a log file of 180
    Klines (100 MB) takes approx 3 min. Processing the result from the
    parse step requires an additional 3 mins. This on a 2.4 GHz Xeon
    machine running RedHat 8.

    Obviously these figures are very grammar and application specific. Your
    milage may vary.

    /Jean Brouwers

    PS) A good reference is David Mertz' book "Text Processing in Python"

    <http://www.informit.com/title/0321112547>

    or several articles on (t)his web page

    <http://gnosis.cx/publish/tech_index_cp.html>




    In article <>, Ayose
    <> wrote:

    > <http://systems.cs.uchicago.edu/ply/>.
    Jean Brouwers, Aug 20, 2004
    #7
  8. At some point, Ayose <> wrote:
    > On Thu, Aug 19, 2004 at 03:37:26PM +0200, Peter Kleiweg wrote:
    >>
    >> I implemented a lexer in Pylly and compared it to the version I
    >> had written in Flex. Processing 219062 lines took 0.9 seconds in
    >> C (from Flex), and 5 minutes 54 second in Python (from Pylly), a
    >> ratio of 393 to 1.
    >>
    >> Is this normal for Python, or does Flex produce better parsers
    >> than Pylly? I have been looking at the code produced by Flex to
    >> see if I could translate it to Python automaticly. But it has a
    >> lot of goto statements, and I haven't figured out how to
    >> translate those to Python efficiently.

    >...
    > But, IMHO, if you has to process very big files, don't do it with
    > python. Instead, write a simple C-module, which uses your Flex parser
    > and creates python objects with that information. It should be trivial
    > if you have experience with the C API. :)


    Or have a look at FlexModule at
    http://www.cs.utexas.edu/users/mcguire/software/fbmodule/
    which makes it really simple without experience with the C API.

    --
    |>|\/|<
    /--------------------------------------------------------------------------\
    |David M. Cooke
    |cookedm(at)physics(dot)mcmaster(dot)ca
    David M. Cooke, Aug 22, 2004
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ham

    I need speed Mr .Net....speed

    Ham, Oct 28, 2004, in forum: ASP .Net
    Replies:
    6
    Views:
    2,317
    Antony Baula
    Oct 29, 2004
  2. efiedler
    Replies:
    1
    Views:
    2,017
    Tim Ward
    Oct 9, 2003
  3. Replies:
    2
    Views:
    2,269
    Howard
    Apr 28, 2004
  4. Replies:
    2
    Views:
    330
    Christopher Benson-Manica
    Apr 28, 2004
  5. Weng Lei-QCH1840
    Replies:
    1
    Views:
    178
    Thomas
    Aug 15, 2003
Loading...

Share This Page