Spell-checking Python source code

Discussion in 'Python' started by John Zenger, Sep 8, 2007.

  1. John Zenger

    John Zenger Guest

    To my horror, someone pointed out to me yesterday that a web app I
    wrote has been prominently displaying a misspelled word. The word was
    buried in my code.

    Is there a utility out there that will help spell-check literal
    strings entered into Python source code? I don't mean spell-check
    strings entered by the user; I mean, go through the .py file, isolate
    strings, and tell me when the strings contain misspelled words. In an
    ideal world, my IDE would do this with a red wavy line.

    I guess a second-best thing would be an easy technique to open a .py
    file and isolate all strings in it.

    (I know that the better practice is to isolate user-displayed strings
    from the code, but in this case that just didn't happen.)
    John Zenger, Sep 8, 2007
    #1
    1. Advertising

  2. John Zenger wrote:
    > To my horror, someone pointed out to me yesterday that a web app I
    > wrote has been prominently displaying a misspelled word. The word was
    > buried in my code.
    >
    > Is there a utility out there that will help spell-check literal
    > strings entered into Python source code? I don't mean spell-check
    > strings entered by the user; I mean, go through the .py file, isolate
    > strings, and tell me when the strings contain misspelled words. In an
    > ideal world, my IDE would do this with a red wavy line.
    >
    > I guess a second-best thing would be an easy technique to open a .py
    > file and isolate all strings in it.
    >
    > (I know that the better practice is to isolate user-displayed strings
    > from the code, but in this case that just didn't happen.)
    >


    Use the re module, identify the strings and write them to another file,
    then open the file with your spell checker. Program shouldn't be more
    than 10 lines.
    =?ISO-8859-1?Q?Ricardo_Ar=E1oz?=, Sep 8, 2007
    #2
    1. Advertising

  3. John Zenger

    David Guest

    > >
    > > (I know that the better practice is to isolate user-displayed strings
    > > from the code, but in this case that just didn't happen.)
    > >

    >
    > Use the re module, identify the strings and write them to another file,
    > then open the file with your spell checker. Program shouldn't be more
    > than 10 lines.
    >
    >


    Have a look at the tokenize python module for the regular expressions
    for extracting strings (for all possible Python string formats). On a
    Debian box you can find it here: /usr/lib/python2.4/tokenize.py

    It would probably be simpler to hack a copy of that script so it
    writes all the strings in your source to a text file, which you then
    spellcheck.

    Another method would be to log all the strings your web app writes, to
    a text file, then run through your entire site, and then spellcheck
    your logfile.
    David, Sep 8, 2007
    #3
  4. David wrote:
    >>> (I know that the better practice is to isolate user-displayed strings
    >>> from the code, but in this case that just didn't happen.)
    >>>

    >> Use the re module, identify the strings and write them to another file,
    >> then open the file with your spell checker. Program shouldn't be more
    >> than 10 lines.
    >>
    >>

    >
    > Have a look at the tokenize python module for the regular expressions
    > for extracting strings (for all possible Python string formats). On a
    > Debian box you can find it here: /usr/lib/python2.4/tokenize.py
    >
    > It would probably be simpler to hack a copy of that script so it
    > writes all the strings in your source to a text file, which you then
    > spellcheck.
    >
    > Another method would be to log all the strings your web app writes, to
    > a text file, then run through your entire site, and then spellcheck
    > your logfile.
    >


    Nice module :

    import tokenize

    def processStrings(type, token, (srow, scol), (erow, ecol), line):
    if tokenize.tok_name[type] == 'STRING' :
    print tokenize.tok_name[type], token, \
    (srow, scol), (erow, ecol), line

    file = open("myprogram.py")

    tokenize.tokenize(
    file.readline,
    processStrings
    )

    How would you go about writing the output to a file? I mean, I would
    like to open the file at main level and pass a handle to the file to
    processStrings to write to it, finally close output file at main level.
    Probably a class with a processString method?
    =?ISO-8859-1?Q?Ricardo_Ar=E1oz?=, Sep 8, 2007
    #4
  5. John Zenger

    DaveM Guest

    On Sat, 08 Sep 2007 14:04:55 -0700, John Zenger <>
    wrote:

    > In an ideal world, my IDE would do this with a red wavy line.


    I can't help with your problem, but this is the first thing I turn off in
    Word. It drives me _mad_.

    Sorry - just had to share that.

    DaveM
    DaveM, Sep 9, 2007
    #5
  6. John Zenger writes:

    > In an ideal world, my IDE would do this with a red wavy line.


    You didn't mention which IDE you use; however, if you use Emacs, there
    is flyspell-prog-mode which does that for you (checks your spelling
    "on the fly", but only within comments and strings).

    Regards,
    David Trudgett

    --
    These are not the droids you are looking for. Move along.
    David Trudgett, Sep 9, 2007
    #6
  7. John Zenger

    Miki Guest

    >> In an ideal world, my IDE would do this with a red wavy line.
    >
    > You didn't mention which IDE you use; however, if you use Emacs, there
    > is flyspell-prog-mode which does that for you (checks your spelling
    > "on the fly", but only within comments and strings).

    Same in Vim :)set spell)

    HTH,
    --
    Miki <>
    http://pythonwise.blogspot.com
    Miki, Sep 9, 2007
    #7
  8. John Zenger

    David Guest

    > tokenize.tokenize(
    > file.readline,
    > processStrings
    > )
    >
    > How would you go about writing the output to a file? I mean, I would
    > like to open the file at main level and pass a handle to the file to
    > processStrings to write to it, finally close output file at main level.
    > Probably a class with a processString method?


    tokenize.tokenize() takes a callable object as it's second arg. So you
    can use a class which you construct with the file, and you give it an
    appropriate __call__ method.

    http://docs.python.org/ref/callable-types.html

    Although with a short script a global var may be simpler.
    David, Sep 9, 2007
    #8
  9. John Zenger

    Benjamin Guest

    On Sep 8, 4:04 pm, John Zenger <> wrote:
    > To my horror, someone pointed out to me yesterday that a web app I
    > wrote has been prominently displaying a misspelled word. The word was
    > buried in my code.
    >
    > Is there a utility out there that will help spell-check literal
    > strings entered into Python source code? I don't mean spell-check
    > strings entered by the user; I mean, go through the .py file, isolate
    > strings, and tell me when the strings contain misspelled words. In an
    > ideal world, my IDE would do this with a red wavy line.
    >
    > I guess a second-best thing would be an easy technique to open a .py
    > file and isolate all strings in it.
    >
    > (I know that the better practice is to isolate user-displayed strings
    > from the code, but in this case that just didn't happen.)


    This is when it's good to use put all your UI strings in a file and
    get the advantages of spelling checking ease and the ability to
    translate the app.
    Benjamin, Sep 10, 2007
    #9
  10. John Zenger

    David Guest

    On 9/9/07, David <> wrote:
    > > tokenize.tokenize(
    > > file.readline,
    > > processStrings
    > > )
    > >
    > > How would you go about writing the output to a file? I mean, I would
    > > like to open the file at main level and pass a handle to the file to
    > > processStrings to write to it, finally close output file at main level.
    > > Probably a class with a processString method?

    >


    You can also use closures for this.

    http://ivan.truemesh.com/archives/000411.html

    See the example in the "Assignment considered awkward" section. Since
    you're not assigning to your file variable in processStrings you don't
    have to use the list work-around mentioned later in the doc.
    David, Sep 10, 2007
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Gilles Lenfant

    Spell checking and Python

    Gilles Lenfant, Dec 15, 2003, in forum: Python
    Replies:
    9
    Views:
    759
    Jarek Zgoda
    Dec 19, 2003
  2. Pater Maximus
    Replies:
    1
    Views:
    324
    Ian Parker
    Oct 20, 2004
  3. Tim Golden
    Replies:
    1
    Views:
    312
    Pater Maximus
    Oct 20, 2004
  4. Tim Golden
    Replies:
    2
    Views:
    424
    Roger Upole
    Oct 21, 2004
  5. Tim Golden
    Replies:
    1
    Views:
    407
    Pater Maximus
    Oct 25, 2004
Loading...

Share This Page