Snowball to Python compiler

M

Matt Chaput

On the slim chance that (a) somebody worked on something like this but
never uploaded it to PyPI, and (b) the person who did (a) or heard about
it is reading this list ;) --

I'm looking for some code that will take a Snowball program and compile
it into a Python script. Or, less ideally, a Snowball interpreter
written in Python.

(http://snowball.tartarus.org/)

Anyone heard of such a thing?

Thanks!

Matt
 
P

Paul Rubin

Matt Chaput said:
I'm looking for some code that will take a Snowball program and
compile it into a Python script. Or, less ideally, a Snowball
interpreter written in Python.

(http://snowball.tartarus.org/)

Anyone heard of such a thing?

I never saw snowball before, it looks kind of interesting, and it
looks like it already has a way to compile to C. If you're using
it for IR on any scale, you're surely much better off using the C
routines with a C API wrapper, than translating snowball to
Python, which will be dog slow to interpret.
 
T

Terry Reedy

I never saw snowball before, it looks kind of interesting, and it
looks like it already has a way to compile to C. If you're using
it for IR on any scale, you're surely much better off using the C
routines with a C API wrapper,

If the C routines are in a shared library, you should be able to write
the interface in Python with ctypes.
 
S

Stefan Behnel

Terry Reedy, 22.04.2011 05:48:
If the C routines are in a shared library, you should be able to write the
interface in Python with ctypes.

Since it appears that the code has to get compiled anyway, Cython is likely
a better option, as it makes it easier to write a fast and Pythonic wrapper.

From a quick look, Snowball also has a "-widechar" option that could allow
interfacing directly with Python's Unicode strings in 16-bit Unicode builds
(but not 32-bit builds!). That would provide for really fast wrappers that
do not even need an intermediate encoding step. And PEP 393 would
eventually allow to include both a UTF-8 and a 16-bit version of the
(prefixed) Snowball code, and to use them alternatively, depending on the
internal layout of the processed string, with the obvious fallback to UTF-8
encoding only for strings that really exceed the lower 16-bit Unicode range.

That sounds like a really nice project.

Stefan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top