[ANN| Classifier 1.2 with Bayesian and NEW LSI classification

Discussion in 'Ruby' started by Lucas Carlson, Apr 25, 2005.

  1. You may remember that I announced the Bayesian classifier a couple of
    weeks ago. With the help of David Fayram, we added LSI classification
    so that you can now do both:

    b = Classisifer::Bayes.new
    lsi = Classifier::LSI.new

    LSI is Latent Semantic Indexer, which can search, classify and cluster
    data based on underlying semantic relations. It uses more resources
    than the Bayesian classifier and even requires an external library, but
    can still be Marshalled for Madeline or DRB's sake. For more
    information on the algorithms used, please consult
    http://en.wikipedia.org/wiki/Latent_Semantic_Indexing

    I also added an #untrain method to reverse the effects of training the
    Bayesian classifier. LSI can also untrain itself. To upgrade, try:

    gem update classifier

    Or see this site:

    http://rubyforge.org/projects/classifier/

    Again, all feedback is appreciated.

    -Lucas Carlson
    http://tech.rufy.com/
    Lucas Carlson, Apr 25, 2005
    #1
    1. Advertising

  2. Thanks for this library! I 'll test this on the first opportunity and
    email you my remarks.

    George.
    George Moschovitis, Apr 25, 2005
    #2
    1. Advertising

  3. Lucas Carlson wrote:
    > You may remember that I announced the Bayesian classifier a couple of
    > weeks ago. With the help of David Fayram, we added LSI classification
    > so that you can now do both:
    >
    > b = Classisifer::Bayes.new
    > lsi = Classifier::LSI.new
    >
    > LSI is Latent Semantic Indexer, which can search, classify and cluster
    > data based on underlying semantic relations. It uses more resources
    > than the Bayesian classifier and even requires an external library, but
    > can still be Marshalled for Madeline or DRB's sake. For more
    > information on the algorithms used, please consult
    > http://en.wikipedia.org/wiki/Latent_Semantic_Indexing
    >
    > I also added an #untrain method to reverse the effects of training the
    > Bayesian classifier. LSI can also untrain itself. To upgrade, try:
    >
    > gem update classifier
    >
    > Or see this site:
    >
    > http://rubyforge.org/projects/classifier/
    >
    > Again, all feedback is appreciated.
    >
    > -Lucas Carlson
    > http://tech.rufy.com/
    >


    This is kind of off topic, but does anyone know if there is an
    implementation of principle component analysis (PCA) easily usable from
    ruby? Bayesian is pretty powerful but you can do some pretty rediculous
    things with PCA. Essentially it's a method of compression on random
    data, but it does this by finding correspondences in each matrix row.
    Anyhow it's as useful as Bayesian methods for finding correspondances.
    It might even be more useful given that you can then easily generate
    data points that would occur near your input points. Though I suppose
    that usefulness depends on what your using it for. I thought about
    implementing it back when I helped out with this project [1]. But given
    time constraints it made more sense to just manually use matlab to do
    it. It would be awesome to play with in ruby though.

    Charles Comstock

    [1] http://www.cs.wustl.edu/~jdt1/vision/final/
    [2] http://www.imm.dtu.dk/~aam/
    Charles Comstock, Apr 25, 2005
    #3
  4. If you would like to start this within the classifier framework, shoot
    me an email.

    -Lucas Carlson
    Lucas Carlson, Apr 25, 2005
    #4
  5. Lucas Carlson

    Dave Fayram Guest

    Actually Charles, PCA and LSI are mathmatically related.
    Classifier::LSI is allready doing all the hard math (in particular SVD)
    on the dataset, but what I'm not doing is calculating a covariance
    matrix for the data. I'm not sure what that'd buy in terms of data
    mining. If you'd like to talk with me about getting this into the next
    release of Classifier, please email me and we'll see if we can't get it
    working (assuming we can figure out what it'd be useful for in terms of
    data mining).

    In Classifier::LSI, I just do SVD on a term-document matrix to reduce
    its rank, then break apart the columns and do inner-products on the
    resultant vectors. I've worked with it quite a bit now and I've
    experienced some really amazing results (you can see in the unit tests,
    it's pretty smart, it isn't easily fooled by lots of text matches).
    Dave Fayram, Apr 25, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Lucas Carlson

    [ANN| Bayesian Classification for Ruby

    Lucas Carlson, Apr 11, 2005, in forum: Ruby
    Replies:
    14
    Views:
    226
    Dave Brown
    Apr 13, 2005
  2. Matt Mower
    Replies:
    8
    Views:
    166
    Lucas Carlson
    Apr 21, 2005
  3. Tom Reilly

    classifier lsi and ruby gsl

    Tom Reilly, May 4, 2005, in forum: Ruby
    Replies:
    2
    Views:
    119
    Dave Fayram
    May 5, 2005
  4. Replies:
    9
    Views:
    131
    Cameron McBride
    Feb 9, 2006
  5. Chris Kottom

    Doing LSI at scale in Ruby

    Chris Kottom, May 26, 2011, in forum: Ruby
    Replies:
    10
    Views:
    374
    Ryan Davis
    May 29, 2011
Loading...

Share This Page