speed

P

Peter Kleiweg

I implemented a lexer in Pylly and compared it to the version I
had written in Flex. Processing 219062 lines took 0.9 seconds in
C (from Flex), and 5 minutes 54 second in Python (from Pylly), a
ratio of 393 to 1.

Is this normal for Python, or does Flex produce better parsers
than Pylly? I have been looking at the code produced by Flex to
see if I could translate it to Python automaticly. But it has a
lot of goto statements, and I haven't figured out how to
translate those to Python efficiently.

What are the average times used for text processing of Python
compared to C?
 
J

John Lenton

I implemented a lexer in Pylly and compared it to the version I
had written in Flex. Processing 219062 lines took 0.9 seconds in
C (from Flex), and 5 minutes 54 second in Python (from Pylly), a
ratio of 393 to 1.

Is this normal for Python, or does Flex produce better parsers
than Pylly? I have been looking at the code produced by Flex to
see if I could translate it to Python automaticly. But it has a
lot of goto statements, and I haven't figured out how to
translate those to Python efficiently.

flex has an option to generate code without the gotos...

--
John Lenton ([email protected]) -- Random fortune:
Don't read everything you believe.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBJLEYgPqu395ykGsRAnZWAJ9Kf/+vqmZ/t/FJrBWvfsQPwMVdXwCgk7Jp
YmxLnwJ2ciNDG9qzeKHSW/s=
=BquW
-----END PGP SIGNATURE-----
 
P

Peter Kleiweg

John Lenton schreef:

flex has an option to generate code without the gotos...

I have the latest version. I can't find it, not as run time
option, not as build option.
 
J

John Lenton

John Lenton schreef:



I have the latest version. I can't find it, not as run time
option, not as build option.

hmm! you're right... I wonder what lexer it was, then? I definitely
have a weak ref to the option in my head, but the owner has been gc'ed
:(

--
John Lenton ([email protected]) -- Random fortune:
There was a phone call for you.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBJLuogPqu395ykGsRAhDKAJ4xO/JWXvLl8UnQGpV3VzZWE7ArWwCgtefk
Kdqboao+WYsvWqsdZkgz2UY=
=4JCc
-----END PGP SIGNATURE-----
 
O

Oliver Fromme

Peter Kleiweg said:
> I implemented a lexer in Pylly and compared it to the version I
> had written in Flex. Processing 219062 lines took 0.9 seconds in
> C (from Flex), and 5 minutes 54 second in Python (from Pylly), a
> ratio of 393 to 1.
>
> Is this normal for Python, or does Flex produce better parsers
> than Pylly? I have been looking at the code produced by Flex to
> see if I could translate it to Python automaticly. But it has a
> lot of goto statements, and I haven't figured out how to
> translate those to Python efficiently.
>
> What are the average times used for text processing of Python
> compared to C?

I don't know Pylly, but I guess it generates a parser using
a finite automaton -- just like lex/flex, except it handles
every single character in Python, wheres lex/flex will lead
to compiled C code. That would explain the speed difference.

When I have to parse something in Python, I try to do that
using things like string.split(), string.find(), the "re"
module etc. Those things are written in C, therefore they
are fast enough for most applications. There are also some
modules for specialized cases, such as "ConfigParser" and
"shlex". See the Python Library Reference.

Best regards
Oliver
 
A

Ayose

Hi,

I implemented a lexer in Pylly and compared it to the version I
had written in Flex. Processing 219062 lines took 0.9 seconds in
C (from Flex), and 5 minutes 54 second in Python (from Pylly), a
ratio of 393 to 1.

Is this normal for Python, or does Flex produce better parsers
than Pylly? I have been looking at the code produced by Flex to
see if I could translate it to Python automaticly. But it has a
lot of goto statements, and I haven't figured out how to
translate those to Python efficiently.

Don't try to translate the generated code to python. Python code is
(almost) always slower than C code, because C is converted into machine
code, and Python has to be interpreted by the VM. Besides, python does a
lot of checks.

Try with PLY, <http://systems.cs.uchicago.edu/ply/>. If you have
experience with flex/yacc in C, this module should be easy to use.

You can also play with Psyco (a JIT compiler for x86) or even with
Pyrex.

But, IMHO, if you has to process very big files, don't do it with
python. Instead, write a simple C-module, which uses your Flex parser
and creates python objects with that information. It should be trivial
if you have experience with the C API. :)
What are the average times used for text processing of Python
compared to C?

IMO, Python is a powerful language to do almost everything, but in some
cases it is bad. One of this cases is intensive computing (like parsing a
big file). Use the correct tool =)
 
J

Jean Brouwers

Another Python parser generator to look into is SimpleParse/mxTextTools

<http://simpleparse.sourceforge.net/>

We use it to parse and process large log files. In our case, a typical
grammar contains over 250 productions and parsing a log file of 180
Klines (100 MB) takes approx 3 min. Processing the result from the
parse step requires an additional 3 mins. This on a 2.4 GHz Xeon
machine running RedHat 8.

Obviously these figures are very grammar and application specific. Your
milage may vary.

/Jean Brouwers

PS) A good reference is David Mertz' book "Text Processing in Python"

<http://www.informit.com/title/0321112547>

or several articles on (t)his web page

<http://gnosis.cx/publish/tech_index_cp.html>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top