bayesian classifiers in ruby?

Discussion in 'Ruby' started by Giles Bowkett, Nov 1, 2006.

  1. I'm researching existing Bayesian classifiers in Ruby -- it looks as
    if there are two, one called Bishop, a Python port, and another called
    Classifier.

    Has anybody worked with them? Any upsides, downsides? Both theoretical
    and practical perspectives. Partly to expand my brain and partly for
    the sake of putting some real software together.

    --
    Giles Bowkett
    http://www.gilesgoatboy.org
     
    Giles Bowkett, Nov 1, 2006
    #1
    1. Advertising

  2. Giles Bowkett

    Jaypee Guest

    Giles Bowkett a écrit :
    > I'm researching existing Bayesian classifiers in Ruby -- it looks as
    > if there are two, one called Bishop, a Python port, and another called
    > Classifier.
    >
    > Has anybody worked with them? Any upsides, downsides? Both theoretical
    > and practical perspectives. Partly to expand my brain and partly for
    > the sake of putting some real software together.
    >

    I have used Bishop to classify the 488 articles of the Project of
    European Constitution and it was helpful.
    Bishop is very simple and consists of just one file and a couple of
    classes. There is room for improvement in the way it tokenizes code
    source.
    Classifier is more complex, with multiple files and more classes.
    Morever it may use the Gnu Scientific Library to perform its calculation,
    I have recently tried to use both of them to help a teacher classify
    CS homeworks and analyze how many different solutions the students
    had come up with. I used simple example like printing number from 0 to
    10 in C. In one case, I used a "for" loop, and in another I used a
    "while" loop. Then I tested a candidate program using a while loop with
    different variable naems. Bishop's guess was that it was more like the
    'for' loop. Not very conclusive.
    Classifier did not better in that it failed to tokenize the C source
    text. It was trying to stem a keyword but failed.

    As long as you analyze natural language, both seem suited, although with
    different degrees of complexity under the hood, both have a very simple
    interface: define a category and train it. Then a guess interface to
    evaluate candidates.

    J-P
     
    Jaypee, Nov 1, 2006
    #2
    1. Advertising

  3. Giles Bowkett

    Tom Reilly Guest

    I wrote a bayesian classifier to classify nursing home calls. If you
    want I can email you the source. It works with about a 95% accuracy.
    Tom Reilly

    Giles Bowkett wrote:

    > I'm researching existing Bayesian classifiers in Ruby -- it looks as
    > if there are two, one called Bishop, a Python port, and another called
    > Classifier.
    >
    > Has anybody worked with them? Any upsides, downsides? Both theoretical
    > and practical perspectives. Partly to expand my brain and partly for
    > the sake of putting some real software together.
    >
     
    Tom Reilly, Nov 1, 2006
    #3
  4. Definitely! That would be very cool.

    On 11/1/06, Tom Reilly <> wrote:
    > I wrote a bayesian classifier to classify nursing home calls. If you
    > want I can email you the source. It works with about a 95% accuracy.
    > Tom Reilly
    >
    > Giles Bowkett wrote:
    >
    > > I'm researching existing Bayesian classifiers in Ruby -- it looks as
    > > if there are two, one called Bishop, a Python port, and another called
    > > Classifier.
    > >
    > > Has anybody worked with them? Any upsides, downsides? Both theoretical
    > > and practical perspectives. Partly to expand my brain and partly for
    > > the sake of putting some real software together.
    > >

    >
    >
    >



    --
    Giles Bowkett
    http://www.gilesgoatboy.org
     
    Giles Bowkett, Nov 2, 2006
    #4
  5. > As long as you analyze natural language, both seem suited, although with
    > different degrees of complexity under the hood, both have a very simple
    > interface: define a category and train it. Then a guess interface to
    > evaluate candidates.


    I'm hoping to develop yet another spam filter. in that sense I can
    only say I'm sort of analyzing natural language. Not all of it is
    natural language, some of it is code. In the Paul Graham thing where
    he came up with this idea, if I remember right, he said that a font
    tag with the color red turned out to be the single most reliable
    indicator of spam. Obviously in HTML e-mail there are going to be
    similar trends. However if the tokenizer is the only problem that may
    be something I can change without too much stress.

    --
    Giles Bowkett
    http://www.gilesgoatboy.org
     
    Giles Bowkett, Nov 2, 2006
    #5
  6. -----BEGIN PGP SIGNED MESSAGE-----

    In article <>,
    Giles Bowkett <> wrote:
    >> As long as you analyze natural language, both seem suited, although with
    >> different degrees of complexity under the hood, both have a very simple
    >> interface: define a category and train it. Then a guess interface to
    >> evaluate candidates.

    >
    >I'm hoping to develop yet another spam filter. in that sense I can
    >only say I'm sort of analyzing natural language. Not all of it is
    >natural language, some of it is code. In the Paul Graham thing where
    >he came up with this idea, if I remember right, he said that a font
    >tag with the color red turned out to be the single most reliable
    >indicator of spam. Obviously in HTML e-mail there are going to be
    >similar trends. However if the tokenizer is the only problem that may
    >be something I can change without too much stress.
    >


    Long ago, I wrote an interface to the ifile program and I use
    that in my spam/email filtering. ifile is abandomware at the
    moment. I think I posted it on the ruby mailing list at ome
    point, you might try searching for it.


    _ Booker C. Bense


    -----BEGIN PGP SIGNATURE-----
    Version: 2.6.2

    iQCVAwUBRUpDBWTWTAjn5N/lAQFx9QP+NqHWWcudTBnJK3u2qofqheu6p0hJ3W2I
    L6elwknvioDWRuwWO/rksM2DZXwQ6trTHkpEnh0REEsWGl6n683ckuYBbr/ElVA2
    9SfGWM0cXspEVX6Xsx/xFsnpF8mdF6le6SdxSEHr0HGhq+8NY1HFoLSOEKdEIBo6
    p2sZwJ6+94Q=
    =1IG0
    -----END PGP SIGNATURE-----
     
    Booker C. Bense, Nov 2, 2006
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mike
    Replies:
    0
    Views:
    399
  2. Replies:
    0
    Views:
    460
  3. Mike
    Replies:
    0
    Views:
    410
  4. Lucas Carlson

    [ANN| Bayesian Classification for Ruby

    Lucas Carlson, Apr 11, 2005, in forum: Ruby
    Replies:
    14
    Views:
    243
    Dave Brown
    Apr 13, 2005
  5. Matt Mower
    Replies:
    8
    Views:
    178
    Lucas Carlson
    Apr 21, 2005
Loading...

Share This Page