text analysis in python

M

Maurice Ling

Hi,

I'm a postgraduate and my project deals with a fair bit of text
analysis. I'm looking for some libraries and tools that is geared
towards text analysis (and text engineering). So far, the most
comprehensive toolkit in python for my purpose is NLTK (natural language
tool kit) by Edward Loper and Steven Bird, followed by mxTextTools. Are
there any OSS tools out there that is more comprehensive than NLTK?

In the Java world, there is GATE (general architecture for text
engineering) and it seems very impressive. Are there something like that
for Python?

Thanks in advance.

Cheers
Maurice
 
C

Cameron Laird

.
.
.
In the Java world, there is GATE (general architecture for text
engineering) and it seems very impressive. Are there something like that
for Python?
.
.
.
I don't know if you're aware that, in a fairly strong sense,
anything "n the Java world" *is* "for Python". If you
program with Jython (for example--there are other ways to
achieve much the same end), your source code can be in
Python, but you have full access to any library coded in Java.
 
M

Maurice LING

.
I don't know if you're aware that, in a fairly strong sense,
anything "n the Java world" *is* "for Python". If you
program with Jython (for example--there are other ways to
achieve much the same end), your source code can be in
Python, but you have full access to any library coded in Java.


Yes, I do know the presence of Jython but had not used it in any
productive ways. So I might need some assistance here... Say I code my
stuffs in Jython (importing java libraries) in a file "text.py"... Will
there be any issues when I try to import text.py into CPython?

My impression is that NLTK is more of a teaching tool rather than for
production use. Please correct me if I'm wrong... The main reason I'm
looking at NLTK is that it is pure python and is about the comprehensive
text analysis toolkit in python. Are there any projects that uses NLTK?

Thanks and Cheers
Maurice
 
M

Mark Winrock

Maurice said:
Hi,

I'm a postgraduate and my project deals with a fair bit of text
analysis. I'm looking for some libraries and tools that is geared
towards text analysis (and text engineering). So far, the most
comprehensive toolkit in python for my purpose is NLTK (natural language
tool kit) by Edward Loper and Steven Bird, followed by mxTextTools. Are
there any OSS tools out there that is more comprehensive than NLTK?

In the Java world, there is GATE (general architecture for text
engineering) and it seems very impressive. Are there something like that
for Python?

Thanks in advance.

Cheers
Maurice

You might try http://web.media.mit.edu/~hugo/montylingua/

"Liu, Hugo (2004). MontyLingua: An end-to-end natural
language processor with common sense. Available
at: web.media.mit.edu/~hugo/montylingua."
 
M

Maurice LING

Mark Winrock wrote:

You might try http://web.media.mit.edu/~hugo/montylingua/

"Liu, Hugo (2004). MontyLingua: An end-to-end natural
language processor with common sense. Available
at: web.media.mit.edu/~hugo/montylingua."


Thanks Mark. I've downloaded MontyLingua and it looks pretty cool. To
me, it seems like pretty much geared to people like myself who needs
something to process written text but do not need the hardcore bolts and
nuts of a computational linguistist. NLTK is more of the bolts and nuts
toolkit. GATE still seems more advanced than MontyLingua but to a
different end.

Is there anyone in this forum that is using or had used MontyLingua and
is happy to comment more on it? I'm happy to get more opinions.

Thanks and cheers
Maurice
 
T

Terry Reedy

Maurice LING said:
Say I code my stuffs in Jython (importing java libraries) in a file
"text.py"

Just to be clear, Jython is not a separate langague that you code *in*, but
a separate implementation that you may slightly differently code *for*.
... Will there be any issues when I try to import text.py into CPython?

If text.py is written in an appropriate version of Python, it itself will
cause no problem. Hoqwever, when it imports javacode files, as opposed to
CPython bytecode files, CPython will choke.

Terry J. Reedy
 
S

Steven Bethard

Maurice said:
In the Java world, there is GATE (general architecture for text
engineering) and it seems very impressive. Are there something like that
for Python?

I worked with GATE this last summer and really hated it. Can't decide
whether that was just my growing distaste for Java or actually the GATE
API. Anyway, if you're looking for something like GATE that (in my
experience) runs significantly faster, you should look at Ellogon
(www.ellogon.org). It's written in C and TCL, with C++, Java, Perl, and
Python bindings. And I believe, if you have any software already
written for GATE, Ellogon can run those modules directly. I've
personally never done so -- all my modules are written in Python (often
simple wrappers for things like MXPOST, MXTerminator, Charniak's parser,
etc.) I find the Python interface simple and easy to use, and they've
added a number of my suggestions to the API in the last release.

STeVe
 
M

Maurice LING

Terry said:
Just to be clear, Jython is not a separate langague that you code *in*, but
a separate implementation that you may slightly differently code *for*.
Yes, I do get this point rightly. Jython is just an implementation of
Python virtual machine using Java. I do note that there are some
differences, such as, Jython can only handle pure python modules.
However, I'm not a language expert to differentiate language differences
between these 2 implementations of Python, as in Jython and CPython. If
someone care to enlighten, it will be my pleasure to consult. TIA.
If text.py is written in an appropriate version of Python, it itself will
cause no problem. Hoqwever, when it imports javacode files, as opposed to
CPython bytecode files, CPython will choke.
In my example, the file "text.py" is coded in Jython, importing Java
libraries. I do get that I cannot import Java jar files directly into
CPython. What I do not get is that what is so special about Jython that
it can "fool" CPython into using Java libraries... or is that there will
always be a need for Java virtual machine and Python virtual machine
when I use Java libraries in Jython... and importing Jython coded files
into CPython....

Cheers
Maurice
 
D

Dennis Lee Bieber

Yes, I do get this point rightly. Jython is just an implementation of
Python virtual machine using Java. I do note that there are some

Pardon? I though Jython directly used the Java VM... It is not a
Python VM at all. It's the same language at the source level, but a
totally different back-end.

Hence, it requires the JVM to be able to run anything that
imports a Java library. Pure Python (source code) is compatible because
the two implementations will "compile" into either JVM byte code
(Jython) or classic Python byte code (CPython).

The CPython /run time/ has no facilities for interpreting JVM
byte code and can not, therefore, process Java library imports.
Similarly, the JVM has no facilities for interfacing with CPython
compiled libraries.

--
 
S

Steve Holden

Maurice said:
Yes, I do get this point rightly. Jython is just an implementation of
Python virtual machine using Java. I do note that there are some
differences, such as, Jython can only handle pure python modules.
However, I'm not a language expert to differentiate language differences
between these 2 implementations of Python, as in Jython and CPython. If
someone care to enlighten, it will be my pleasure to consult. TIA.
That's not strictly correct. The Python virtual machine isn;t
implemented at all in Jython, instead the JVM is used as the compilation
target.
In my example, the file "text.py" is coded in Jython, importing Java
libraries. I do get that I cannot import Java jar files directly into
CPython. What I do not get is that what is so special about Jython that
it can "fool" CPython into using Java libraries... or is that there will
always be a need for Java virtual machine and Python virtual machine
when I use Java libraries in Jython... and importing Jython coded files
into CPython....
Jython is pretty much a Python interpreter that compiles Python into JVM
bytecodes. Consequently the amount of "trickery" involved is rather
less, though clearly there is some (automated conversion b etween Java
and Pythin data types where appropriate, and automated signature-based
selection of the appropriate Java method being the two most obvious).

regards
Steve
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top