Giles Bowkett a écrit :
I'm researching existing Bayesian classifiers in Ruby -- it looks as
if there are two, one called Bishop, a Python port, and another called
Classifier.
Has anybody worked with them? Any upsides, downsides? Both theoretical
and practical perspectives. Partly to expand my brain and partly for
the sake of putting some real software together.
I have used Bishop to classify the 488 articles of the Project of
European Constitution and it was helpful.
Bishop is very simple and consists of just one file and a couple of
classes. There is room for improvement in the way it tokenizes code
source.
Classifier is more complex, with multiple files and more classes.
Morever it may use the Gnu Scientific Library to perform its calculation,
I have recently tried to use both of them to help a teacher classify
CS homeworks and analyze how many different solutions the students
had come up with. I used simple example like printing number from 0 to
10 in C. In one case, I used a "for" loop, and in another I used a
"while" loop. Then I tested a candidate program using a while loop with
different variable naems. Bishop's guess was that it was more like the
'for' loop. Not very conclusive.
Classifier did not better in that it failed to tokenize the C source
text. It was trying to stem a keyword but failed.
As long as you analyze natural language, both seem suited, although with
different degrees of complexity under the hood, both have a very simple
interface: define a category and train it. Then a guess interface to
evaluate candidates.
J-P