text analysis in python

Discussion in 'Python' started by Maurice Ling, Apr 3, 2005.

  1. Maurice Ling

    Maurice Ling Guest

    Hi,

    I'm a postgraduate and my project deals with a fair bit of text
    analysis. I'm looking for some libraries and tools that is geared
    towards text analysis (and text engineering). So far, the most
    comprehensive toolkit in python for my purpose is NLTK (natural language
    tool kit) by Edward Loper and Steven Bird, followed by mxTextTools. Are
    there any OSS tools out there that is more comprehensive than NLTK?

    In the Java world, there is GATE (general architecture for text
    engineering) and it seems very impressive. Are there something like that
    for Python?

    Thanks in advance.

    Cheers
    Maurice
    Maurice Ling, Apr 3, 2005
    #1
    1. Advertising

  2. In article <>,
    Maurice Ling <> wrote:
    .
    .
    .
    >In the Java world, there is GATE (general architecture for text
    >engineering) and it seems very impressive. Are there something like that
    >for Python?

    .
    .
    .
    I don't know if you're aware that, in a fairly strong sense,
    anything "n the Java world" *is* "for Python". If you
    program with Jython (for example--there are other ways to
    achieve much the same end), your source code can be in
    Python, but you have full access to any library coded in Java.
    Cameron Laird, Apr 3, 2005
    #2
    1. Advertising

  3. Maurice Ling

    Guest

    The book "Text Processing in Python" by David Mertz, available online
    at http://gnosis.cx/TPiP/ , may be helpful.
    , Apr 3, 2005
    #3
  4. Maurice Ling

    Maurice LING Guest

    .
    > I don't know if you're aware that, in a fairly strong sense,
    > anything "n the Java world" *is* "for Python". If you
    > program with Jython (for example--there are other ways to
    > achieve much the same end), your source code can be in
    > Python, but you have full access to any library coded in Java.


    Yes, I do know the presence of Jython but had not used it in any
    productive ways. So I might need some assistance here... Say I code my
    stuffs in Jython (importing java libraries) in a file "text.py"... Will
    there be any issues when I try to import text.py into CPython?

    My impression is that NLTK is more of a teaching tool rather than for
    production use. Please correct me if I'm wrong... The main reason I'm
    looking at NLTK is that it is pure python and is about the comprehensive
    text analysis toolkit in python. Are there any projects that uses NLTK?

    Thanks and Cheers
    Maurice
    Maurice LING, Apr 3, 2005
    #4
  5. Maurice Ling

    Mark Winrock Guest

    Maurice Ling wrote:
    > Hi,
    >
    > I'm a postgraduate and my project deals with a fair bit of text
    > analysis. I'm looking for some libraries and tools that is geared
    > towards text analysis (and text engineering). So far, the most
    > comprehensive toolkit in python for my purpose is NLTK (natural language
    > tool kit) by Edward Loper and Steven Bird, followed by mxTextTools. Are
    > there any OSS tools out there that is more comprehensive than NLTK?
    >
    > In the Java world, there is GATE (general architecture for text
    > engineering) and it seems very impressive. Are there something like that
    > for Python?
    >
    > Thanks in advance.
    >
    > Cheers
    > Maurice
    >
    >


    You might try http://web.media.mit.edu/~hugo/montylingua/

    "Liu, Hugo (2004). MontyLingua: An end-to-end natural
    language processor with common sense. Available
    at: web.media.mit.edu/~hugo/montylingua."
    Mark Winrock, Apr 3, 2005
    #5
  6. Maurice Ling

    Maurice LING Guest

    Mark Winrock wrote:


    >
    > You might try http://web.media.mit.edu/~hugo/montylingua/
    >
    > "Liu, Hugo (2004). MontyLingua: An end-to-end natural
    > language processor with common sense. Available
    > at: web.media.mit.edu/~hugo/montylingua."



    Thanks Mark. I've downloaded MontyLingua and it looks pretty cool. To
    me, it seems like pretty much geared to people like myself who needs
    something to process written text but do not need the hardcore bolts and
    nuts of a computational linguistist. NLTK is more of the bolts and nuts
    toolkit. GATE still seems more advanced than MontyLingua but to a
    different end.

    Is there anyone in this forum that is using or had used MontyLingua and
    is happy to comment more on it? I'm happy to get more opinions.

    Thanks and cheers
    Maurice
    Maurice LING, Apr 3, 2005
    #6
  7. Maurice Ling

    Terry Reedy Guest

    "Maurice LING" <> wrote in message
    news:...
    >Say I code my stuffs in Jython (importing java libraries) in a file
    >"text.py"


    Just to be clear, Jython is not a separate langague that you code *in*, but
    a separate implementation that you may slightly differently code *for*.

    >... Will there be any issues when I try to import text.py into CPython?


    If text.py is written in an appropriate version of Python, it itself will
    cause no problem. Hoqwever, when it imports javacode files, as opposed to
    CPython bytecode files, CPython will choke.

    Terry J. Reedy
    Terry Reedy, Apr 3, 2005
    #7
  8. Maurice Ling wrote:
    > In the Java world, there is GATE (general architecture for text
    > engineering) and it seems very impressive. Are there something like that
    > for Python?


    I worked with GATE this last summer and really hated it. Can't decide
    whether that was just my growing distaste for Java or actually the GATE
    API. Anyway, if you're looking for something like GATE that (in my
    experience) runs significantly faster, you should look at Ellogon
    (www.ellogon.org). It's written in C and TCL, with C++, Java, Perl, and
    Python bindings. And I believe, if you have any software already
    written for GATE, Ellogon can run those modules directly. I've
    personally never done so -- all my modules are written in Python (often
    simple wrappers for things like MXPOST, MXTerminator, Charniak's parser,
    etc.) I find the Python interface simple and easy to use, and they've
    added a number of my suggestions to the API in the last release.

    STeVe
    Steven Bethard, Apr 3, 2005
    #8
  9. Maurice Ling

    Maurice LING Guest

    Terry Reedy wrote:

    > "Maurice LING" <> wrote in message
    > news:...
    >
    >>Say I code my stuffs in Jython (importing java libraries) in a file
    >>"text.py"

    >
    >
    > Just to be clear, Jython is not a separate langague that you code *in*, but
    > a separate implementation that you may slightly differently code *for*.
    >

    Yes, I do get this point rightly. Jython is just an implementation of
    Python virtual machine using Java. I do note that there are some
    differences, such as, Jython can only handle pure python modules.
    However, I'm not a language expert to differentiate language differences
    between these 2 implementations of Python, as in Jython and CPython. If
    someone care to enlighten, it will be my pleasure to consult. TIA.

    >
    >>... Will there be any issues when I try to import text.py into CPython?

    >
    >
    > If text.py is written in an appropriate version of Python, it itself will
    > cause no problem. Hoqwever, when it imports javacode files, as opposed to
    > CPython bytecode files, CPython will choke.
    >

    In my example, the file "text.py" is coded in Jython, importing Java
    libraries. I do get that I cannot import Java jar files directly into
    CPython. What I do not get is that what is so special about Jython that
    it can "fool" CPython into using Java libraries... or is that there will
    always be a need for Java virtual machine and Python virtual machine
    when I use Java libraries in Jython... and importing Jython coded files
    into CPython....

    Cheers
    Maurice
    Maurice LING, Apr 4, 2005
    #9
  10. On Mon, 04 Apr 2005 09:36:32 +1000, Maurice LING <>
    declaimed the following in comp.lang.python:

    > >

    > Yes, I do get this point rightly. Jython is just an implementation of
    > Python virtual machine using Java. I do note that there are some


    Pardon? I though Jython directly used the Java VM... It is not a
    Python VM at all. It's the same language at the source level, but a
    totally different back-end.

    Hence, it requires the JVM to be able to run anything that
    imports a Java library. Pure Python (source code) is compatible because
    the two implementations will "compile" into either JVM byte code
    (Jython) or classic Python byte code (CPython).

    The CPython /run time/ has no facilities for interpreting JVM
    byte code and can not, therefore, process Java library imports.
    Similarly, the JVM has no facilities for interfacing with CPython
    compiled libraries.

    --
    > ============================================================== <
    > | Wulfraed Dennis Lee Bieber KD6MOG <
    > | Bestiaria Support Staff <
    > ============================================================== <
    > Home Page: <http://www.dm.net/~wulfraed/> <
    > Overflow Page: <http://wlfraed.home.netcom.com/> <
    Dennis Lee Bieber, Apr 4, 2005
    #10
  11. Maurice Ling

    Steve Holden Guest

    Maurice LING wrote:
    > Terry Reedy wrote:
    >
    >> "Maurice LING" <> wrote in message
    >> news:...
    >>
    >>> Say I code my stuffs in Jython (importing java libraries) in a file
    >>> "text.py"

    >>
    >>
    >>
    >> Just to be clear, Jython is not a separate langague that you code
    >> *in*, but a separate implementation that you may slightly differently
    >> code *for*.
    >>

    > Yes, I do get this point rightly. Jython is just an implementation of
    > Python virtual machine using Java. I do note that there are some
    > differences, such as, Jython can only handle pure python modules.
    > However, I'm not a language expert to differentiate language differences
    > between these 2 implementations of Python, as in Jython and CPython. If
    > someone care to enlighten, it will be my pleasure to consult. TIA.
    >

    That's not strictly correct. The Python virtual machine isn;t
    implemented at all in Jython, instead the JVM is used as the compilation
    target.

    >>
    >>> ... Will there be any issues when I try to import text.py into CPython?

    >>
    >>
    >>
    >> If text.py is written in an appropriate version of Python, it itself
    >> will cause no problem. Hoqwever, when it imports javacode files, as
    >> opposed to CPython bytecode files, CPython will choke.
    >>

    > In my example, the file "text.py" is coded in Jython, importing Java
    > libraries. I do get that I cannot import Java jar files directly into
    > CPython. What I do not get is that what is so special about Jython that
    > it can "fool" CPython into using Java libraries... or is that there will
    > always be a need for Java virtual machine and Python virtual machine
    > when I use Java libraries in Jython... and importing Jython coded files
    > into CPython....
    >

    Jython is pretty much a Python interpreter that compiles Python into JVM
    bytecodes. Consequently the amount of "trickery" involved is rather
    less, though clearly there is some (automated conversion b etween Java
    and Pythin data types where appropriate, and automated signature-based
    selection of the appropriate Java method being the two most obvious).

    regards
    Steve
    --
    Steve Holden +1 703 861 4237 +1 800 494 3119
    Holden Web LLC http://www.holdenweb.com/
    Python Web Programming http://pydish.holdenweb.com/
    Steve Holden, Apr 4, 2005
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    6
    Views:
    514
    Carlos Eduardo Lima Borges
    Jul 7, 2006
  2. John Benson

    ELF object file analysis tool in Python?

    John Benson, Mar 3, 2004, in forum: Python
    Replies:
    2
    Views:
    521
    Miki Tebeka
    Mar 3, 2004
  3. tommygun101
    Replies:
    0
    Views:
    514
    tommygun101
    Jun 8, 2007
  4. ssubbarayan
    Replies:
    5
    Views:
    2,305
    Dave Hansen
    Nov 3, 2009
  5. Navneet Mathpal
    Replies:
    0
    Views:
    116
    Navneet Mathpal
    Apr 15, 2014
Loading...

Share This Page