SpellChecker

A

abosalim

I used this code.It works fine,but on word not whole text.I want to
extend this code to correct
text file not only a word,but i don't know.If you have any help,please
inform me.

This is the code:

import re, collections

def words(text): return re.findall('[a-z]+', text.lower())

def train(features):
model = collections.defaultdict(lambda: 1)
for f in features:
model[f] += 1
return model

NWORDS = train(words(file('big.txt').read()))

alphabet = 'abcdefghijklmnopqrstuvwxyz'

def edits1(word):
n = len(word)
return set([word[0:i]+word[i+1:] for i in range(n)]
+ # deletion
[word[0:i]+word[i+1]+word+word[i+2:] for i in range
(n-1)] + # transposition
[word[0:i]+c+word[i+1:] for i in range(n) for c in
alphabet] + # alteration
[word[0:i]+c+word[i:] for i in range(n+1) for c in
alphabet]) # insertion

def known_edits2(word):
return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in
NWORDS)

def known(words): return set(w for w in words if w in NWORDS)

def correct(word):
candidates = known([word]) or known(edits1(word)) or known_edits2
(word) or [word]
return max(candidates, key=lambda w: NWORDS[w])
 
P

Peter Otten

abosalim said:
I used this code.It works fine,but on word not whole text.I want to
extend this code to correct
text file not only a word,but i don't know.If you have any help,please
inform me.

import re
import sys

def correct(word, _lookup={"teh": "the"}):
"""
Replace with Norvig's implementation found at

http://norvig.com/spell-correct.html
"""
return _lookup.get(word.lower(), word)

def correct_word(word):
corrected = correct(word)
if corrected != word:
if word.istitle():
corrected = corrected.title()
if word.isupper():
corrected = corrected.upper()
print >> sys.stderr, "correcting", word, "-->", corrected
return corrected

def sub_word(match):
return correct_word(match.group())

def correct_text(text):
return re.compile("[a-z]+", re.I).sub(sub_word, text)

if __name__ == "__main__":
text = "Teh faster teh better TEH BIGGER"
print "original:", text
print "corrected:", correct_text(text)


Peter

PS: Don't you get bored if you have all your code written for you?
 
M

Mike Kazantsev

abosalim said:
I used this code.It works fine,but on word not whole text.I want to
extend this code to correct
text file not only a word,but i don't know.If you have any help,please
inform me. ...
def correct(word):
candidates = known([word]) or known(edits1(word)) or known_edits2
(word) or [word]
return max(candidates, key=lambda w: NWORDS[w])

Here I assume that "word" is any string consisting of letters, feel free
to add your own check in place of str.isalpha, like word length or case.
Note that simple ops like concatenation work much faster with buffers
than str / unicode.

text = 'some text to correct (anything, really)'
result = buffer('')

word, c = buffer(''), ''
for c in text:
if c.isalpha(): word += c
else:
if word:
result += correct(word)
word = buffer('')
result += c

--
Mike Kazantsev // fraggod.net


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.11 (GNU/Linux)

iEYEARECAAYFAkoTz1kACgkQASbOZpzyXnGApQCeO6MAhpRzbmHN5RPZBHWPArGR
8sIAoICaVeYkaf3adbhSIHIKk/KZWoE2
=bWwW
-----END PGP SIGNATURE-----
 
A

abosalim

abosalim said:
I used this code.It works fine,but on word not whole text.I want to
extend this code to correct
text file not only a word,but i don't know.If you have any help,please
inform me. ...
def correct(word):
    candidates = known([word]) or known(edits1(word)) or known_edits2
(word) or [word]
    return max(candidates, key=lambda w: NWORDS[w])

Here I assume that "word" is any string consisting of letters, feel free
to add your own check in place of str.isalpha, like word length or case.
Note that simple ops like concatenation work much faster with buffers
than str / unicode.

  text = 'some text to correct (anything, really)'
  result = buffer('')

  word, c = buffer(''), ''
  for c in text:
    if c.isalpha(): word += c
    else:
      if word:
        result += correct(word)
        word = buffer('')
      result += c

--
Mike Kazantsev // fraggod.net

 signature.asc
< 1KViewDownload
Thanks a lot
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,533
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top