Pattern Classification Frameworks?

E

Evan Klitzke

Hi all,

What frameworks are there available for doing pattern classification?
I'm generally interested in the problem of mapping some sort of input
to one or more categories. For example, I want to be able to solve
problems like taking text and applying one or more tags to it like
"romance", "horror", "poetry", etc. This isn't really my research
specialty, but my understanding is that Bayesian classifiers are
generally used for problems like this. I've had CRM114 recommended to
me, but as far as I can tell there aren't any python bindings for
this. From a few searches online, I've come across the Open Bayes
project which is a Python library for working with Bayesian networks,
and it also appears that DSPAM has some Python bindings, but from the
cursory look I gave it it's hard to tell how general purpose the DSPAM
engine is. Has anyone worked with any of these frameworks? Are there
any other frameworks I should be aware of?

Also, as a sidenote, are there any texts that anyone can recommend to
me for learning more about this area? I'm a mathematician by training,
so I'm not afraid to jump into reasonably advanced statistics
papers/books if necessary.
 
D

Diez B. Roggisch

Evan said:
Hi all,

What frameworks are there available for doing pattern classification?
I'm generally interested in the problem of mapping some sort of input
to one or more categories. For example, I want to be able to solve
problems like taking text and applying one or more tags to it like
"romance", "horror", "poetry", etc. This isn't really my research
specialty, but my understanding is that Bayesian classifiers are
generally used for problems like this. I've had CRM114 recommended to
me, but as far as I can tell there aren't any python bindings for
this.

I've utilized the CRM114 classifier from python. It wasn't too hard to come
up with a simple wrapping that only needs the crm114 binary somewhere. The
rest was dealt with in python.

So if CRM114 fits you needs functionalitywise, you should go for it.

Diez
 
S

Steven Bethard

Evan said:
What frameworks are there available for doing pattern classification?
I'm generally interested in the problem of mapping some sort of input
to one or more categories. For example, I want to be able to solve
problems like taking text and applying one or more tags to it like
"romance", "horror", "poetry", etc. This isn't really my research
specialty, but my understanding is that Bayesian classifiers are
generally used for problems like this.

In fact, a wide variety of classifiers are used in text classification,
including Bayesian approaches, support vector machines, conditional
random fields, etc.
Are there any other frameworks I should be aware of?

I have used (but not recently) Orange:

http://www.ailab.si/orange

I haven't used, but have been meaning to try, PyML:

http://pyml.sourceforge.net/

A more recent addition (whose documentation needs work) is:

http://montepython.sourceforge.net/

And here's a Summer of Code project to build an ML library:

http://projects.scipy.org/scipy/scipy/wiki/MachineLearning

These are all general-purpose machine learning frameworks. So they can
be applied to pretty much any classification problem (including the text
classification problems you're looking at). You just need to pick out a
set of relevant features to describe your data, and feed those features
along with your chosen labels to a machine learning algorithm.

STeVe
 
E

Evan Klitzke

In fact, a wide variety of classifiers are used in text classification,
including Bayesian approaches, support vector machines, conditional
random fields, etc.


I have used (but not recently) Orange:

http://www.ailab.si/orange

I haven't used, but have been meaning to try, PyML:

http://pyml.sourceforge.net/

A more recent addition (whose documentation needs work) is:

http://montepython.sourceforge.net/

And here's a Summer of Code project to build an ML library:

http://projects.scipy.org/scipy/scipy/wiki/MachineLearning

These are all general-purpose machine learning frameworks. So they can
be applied to pretty much any classification problem (including the text
classification problems you're looking at). You just need to pick out a
set of relevant features to describe your data, and feed those features
along with your chosen labels to a machine learning algorithm.

STeVe

Thanks Steven (and Diez), the projects you pointed me to look like
great places to start.
 
M

Miki

Hello Evan,
What frameworks are there available for doing pattern classification?
...
Two Bayesian classifiers are SpamBayes (http://spambayes.sf.net) and
Reverend Thomas (http://www.divmod.org/projects/reverend).
IMO the latter will be easier to play with.
Also, as a sidenote, are there any texts that anyone can recommend to
me for learning more about this area?
A good book about NLP is http://nlp.stanford.edu/fsnlp/ which have a
chapter about
text classification. http://www.cs.cmu.edu/~tom/mlbook.html has some
good coverage on
the subject as well.

HTH.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,145
Latest member
web3PRAgeency
Top