does anyone know of a library which permits to summarise text?
i've been looking at nltk but haven't found anything yet. any
help would be very welcome.
Well, summarizing text is one of those things that generally
takes a brain-cell or two to do. Automating the process would
require doing it either smartly (some sort of
neural-net/NLP/Markov-chain technology, which is a non-trivial
task--something one might consider braving in the 3rd or 4th-year
of a university computer-science program), or doing it fairly
dumbly. As an example of a "dumb" solution, you can use regexps
to trim off the first few words and the last few words and call
that a "summary":
.... and it has a second line
.... and a third line
.... and the last line is the fourth line."""'This is the...fourth line.'
You can adjust the "{8}" portions for more or less
leader/trailing context characters.
The regexp might need a bit of tweaking for somewhat short
strings, but if they're fairly short, one might not need to
summarize them
-tkc