Python based unacceptable language filter

D

David Pratt

Hi. Is anyone aware of any python based unacceptable language filter
code to scan and detect bad language in text from uploads etc.

Many thanks.
David
 
N

Nigel Rowe

David said:
Hi. Is anyone aware of any python based unacceptable language filter
code to scan and detect bad language in text from uploads etc.

Many thanks.
David

You might be able to adapt languagetool.
http://www.danielnaber.de/languagetool/features.html

Later versions have been ported to Java, but the old python version of
languagetool is at http://tkltrans.sourceforge.net/#r03

His thesis paper is at
http://www.danielnaber.de/languagetool/download/style_and_grammar_checker.pdf

Mind you, given the poor language skills of many native english speakers
(not to mention those for whom english is a second language) relying on
automated filters to enforce 'good' language seems a trifle extreme. This
post for example would probably not pass.

Cheers,
Nigel

PS. For the humour impaired, this g*d d*mm post was a f*cking joke, OK! :)

Mind you, the links are real.
 
F

Frithiof Andreas Jensen

David Pratt said:
Hi. Is anyone aware of any python based unacceptable language filter
code to scan and detect bad language in text from uploads etc.

Many thanks.
David

Look up Spambayes - if you can filter on terms like "dear friend" you can
filter on the inverse too, no? It needs samples to work with.
 
A

Andrew Gwozdziewycz

David Pratt wrote:



You might be able to adapt languagetool.
http://www.danielnaber.de/languagetool/features.html

Later versions have been ported to Java, but the old python version of
languagetool is at http://tkltrans.sourceforge.net/#r03

His thesis paper is at
http://www.danielnaber.de/languagetool/download/
style_and_grammar_checker.pdf

Mind you, given the poor language skills of many native english
speakers
(not to mention those for whom english is a second language)
relying on
automated filters to enforce 'good' language seems a trifle
extreme. This
post for example would probably not pass.

Cheers,
Nigel

PS. For the humour impaired, this g*d d*mm post was a f*cking joke,
OK! :)

Mind you, the links are real.



I think he may be referring to "bad" words, and 'filthy' language. At
least that's what i got from the question.
There are many PHP implementations on the web, which could be adapted
to python fairly easily. Most of which are probably not the most
ideal solution and
involve alot of stuff like

for n in badwords:
texttofilter.replace(n, '<bad word deleted>')

If that's all you need though, maybe it's not so bad.
 
E

Erik Max Francis

Andrew said:
I think he may be referring to "bad" words, and 'filthy' language. At
least that's what i got from the question.
There are many PHP implementations on the web, which could be adapted
to python fairly easily. Most of which are probably not the most
ideal solution and
involve alot of stuff like

for n in badwords:
texttofilter.replace(n, '<bad word deleted>')

If that's all you need though, maybe it's not so bad.

This is a no-op, since it replaces the text, but then discards it. You
meant:

for badWord in badWords:
textToFilter = textToFilter.replace(badWord, '<)!&%(#&)%>')
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top