[ANN| Classifier 1.2 with Bayesian and NEW LSI classification

L

Lucas Carlson

You may remember that I announced the Bayesian classifier a couple of
weeks ago. With the help of David Fayram, we added LSI classification
so that you can now do both:

b = Classisifer::Bayes.new
lsi = Classifier::LSI.new

LSI is Latent Semantic Indexer, which can search, classify and cluster
data based on underlying semantic relations. It uses more resources
than the Bayesian classifier and even requires an external library, but
can still be Marshalled for Madeline or DRB's sake. For more
information on the algorithms used, please consult
http://en.wikipedia.org/wiki/Latent_Semantic_Indexing

I also added an #untrain method to reverse the effects of training the
Bayesian classifier. LSI can also untrain itself. To upgrade, try:

gem update classifier

Or see this site:

http://rubyforge.org/projects/classifier/

Again, all feedback is appreciated.

-Lucas Carlson
http://tech.rufy.com/
 
G

George Moschovitis

Thanks for this library! I 'll test this on the first opportunity and
email you my remarks.

George.
 
C

Charles Comstock

Lucas said:
You may remember that I announced the Bayesian classifier a couple of
weeks ago. With the help of David Fayram, we added LSI classification
so that you can now do both:

b = Classisifer::Bayes.new
lsi = Classifier::LSI.new

LSI is Latent Semantic Indexer, which can search, classify and cluster
data based on underlying semantic relations. It uses more resources
than the Bayesian classifier and even requires an external library, but
can still be Marshalled for Madeline or DRB's sake. For more
information on the algorithms used, please consult
http://en.wikipedia.org/wiki/Latent_Semantic_Indexing

I also added an #untrain method to reverse the effects of training the
Bayesian classifier. LSI can also untrain itself. To upgrade, try:

gem update classifier

Or see this site:

http://rubyforge.org/projects/classifier/

Again, all feedback is appreciated.

-Lucas Carlson
http://tech.rufy.com/

This is kind of off topic, but does anyone know if there is an
implementation of principle component analysis (PCA) easily usable from
ruby? Bayesian is pretty powerful but you can do some pretty rediculous
things with PCA. Essentially it's a method of compression on random
data, but it does this by finding correspondences in each matrix row.
Anyhow it's as useful as Bayesian methods for finding correspondances.
It might even be more useful given that you can then easily generate
data points that would occur near your input points. Though I suppose
that usefulness depends on what your using it for. I thought about
implementing it back when I helped out with this project [1]. But given
time constraints it made more sense to just manually use matlab to do
it. It would be awesome to play with in ruby though.

Charles Comstock

[1] http://www.cs.wustl.edu/~jdt1/vision/final/
[2] http://www.imm.dtu.dk/~aam/
 
L

Lucas Carlson

If you would like to start this within the classifier framework, shoot
me an email.

-Lucas Carlson
 
D

Dave Fayram

Actually Charles, PCA and LSI are mathmatically related.
Classifier::LSI is allready doing all the hard math (in particular SVD)
on the dataset, but what I'm not doing is calculating a covariance
matrix for the data. I'm not sure what that'd buy in terms of data
mining. If you'd like to talk with me about getting this into the next
release of Classifier, please email me and we'll see if we can't get it
working (assuming we can figure out what it'd be useful for in terms of
data mining).

In Classifier::LSI, I just do SVD on a term-document matrix to reduce
its rank, then break apart the columns and do inner-products on the
resultant vectors. I've worked with it quite a bit now and I've
experienced some really amazing results (you can see in the unit tests,
it's pretty smart, it isn't easily fooled by lots of text matches).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top