Re: How to detect typos in Python programs

Discussion in 'Python' started by Bob Gailer, Jul 25, 2003.

  1. Bob Gailer

    Bob Gailer Guest

    At 07:26 PM 7/25/2003 +0530, Manish Jethani wrote:

    >Hi all,
    >
    >Is there a way to detect typos in a Python program, before
    >actually having to run it. Let's say I have a function like this:
    >
    > def server_closed_connection():
    > session.abost()
    >
    >Here, abort() is actually misspelt. The only time my program
    >follows this path is when the server disconnects from its
    >end--and that's like once in 100 sessions. So sometimes I
    >release the program, people start using it, and then someone
    >reports this typo after 4-5 days of the release (though it's
    >trivial to fix manually at the user's end, or I can give a patch).
    >
    >How can we detect these kinds of errors at development time?
    >It's not practical for me to have a test script that can make
    >the program go through all (most) the possible code paths.


    consider:
    use a regular expression to get a list of all the identifiers in the program
    count occurrence of each by adding to/updating a dictionary
    sort and display the result

    program_text = """ def server_closed_connection():
    session.abost()"""
    import re
    words = re.findall(r'([A-Za-z_]\w*)\W*', program_text) # list of all
    identifiers
    wordDict = {}
    for word in words: wordDict[word] = wordDict.setdefault(word,0)+1 # dict of
    identifiers w/ occurrence count
    wordList = wordDict.items()
    wordList.sort()
    for wordCount in wordList: print '%-25s %3s' % wordCount

    output (approximate, as I used tabs):

    abost 1
    def 1
    server_closed_connection 1
    session 1

    You can then examine this list for suspect names, especially those that
    occur once. We could apply some filtering to remove keywords and builtin names.

    We could add a comment at the start of the program containing all the valid
    names, and extend this process to report just the ones that are not in the
    valid list.

    Bob Gailer

    303 442 2625


    ---
    Outgoing mail is certified Virus Free.
    Checked by AVG anti-virus system (http://www.grisoft.com).
    Version: 6.0.500 / Virus Database: 298 - Release Date: 7/10/2003
     
    Bob Gailer, Jul 25, 2003
    #1
    1. Advertising

  2. On Fri, 25 Jul 2003 12:20:57 -0600, Bob Gailer <> wrote:

    >--=======6B79482F=======
    >Content-Type: text/plain; x-avg-checked=avg-ok-74704BF8; charset=us-ascii; format=flowed
    >Content-Transfer-Encoding: 8bit
    >
    >At 07:26 PM 7/25/2003 +0530, Manish Jethani wrote:
    >
    >>Hi all,
    >>
    >>Is there a way to detect typos in a Python program, before
    >>actually having to run it. Let's say I have a function like this:
    >>
    >> def server_closed_connection():
    >> session.abost()
    >>
    >>Here, abort() is actually misspelt. The only time my program
    >>follows this path is when the server disconnects from its
    >>end--and that's like once in 100 sessions. So sometimes I
    >>release the program, people start using it, and then someone
    >>reports this typo after 4-5 days of the release (though it's
    >>trivial to fix manually at the user's end, or I can give a patch).
    >>
    >>How can we detect these kinds of errors at development time?
    >>It's not practical for me to have a test script that can make
    >>the program go through all (most) the possible code paths.

    >
    >consider:
    > use a regular expression to get a list of all the identifiers in the program
    > count occurrence of each by adding to/updating a dictionary
    > sort and display the result
    >
    >program_text = """ def server_closed_connection():
    > session.abost()"""
    >import re
    >words = re.findall(r'([A-Za-z_]\w*)\W*', program_text) # list of all
    >identifiers
    >wordDict = {}
    >for word in words: wordDict[word] = wordDict.setdefault(word,0)+1 # dict of
    >identifiers w/ occurrence count
    >wordList = wordDict.items()
    >wordList.sort()
    >for wordCount in wordList: print '%-25s %3s' % wordCount
    >
    >output (approximate, as I used tabs):
    >
    >abost 1
    >def 1
    >server_closed_connection 1
    >session 1
    >
    >You can then examine this list for suspect names, especially those that
    >occur once. We could apply some filtering to remove keywords and builtin names.
    >
    >We could add a comment at the start of the program containing all the valid
    >names, and extend this process to report just the ones that are not in the
    >valid list.
    >

    That's cool. If you want to go further, and use symbols that the actual program
    is using (excluding comment stuff) try:

    ====< prtok.py >========================================================
    #prtok.py
    import sys, tokenize, glob, token

    symdir={}

    def tokeneater(type, tokstr, start, end, line, symdir=symdir):
    if (type==token.NAME):
    TOKSTR = tokstr.upper() #should show up for this file
    if symdir.has_key(TOKSTR):
    d = symdir[TOKSTR]
    if d.has_key(tokstr):
    d[tokstr] += 1
    else:
    d[tokstr] = 1
    else:
    symdir[TOKSTR]={ tokstr:1 }

    for fileglob in sys.argv[1:]:
    for filename in glob.glob(fileglob):
    symdir.clear()
    tokenize.tokenize(open(filename).readline, tokeneater)

    header = '\n====< '+filename+' >===='
    singlecase = []
    multicase = [key for key in symdir.keys()
    if len(symdir[key])>1 or singlecase.append(key)]
    for key in multicase:
    if header:
    print header
    print ' (Multicase symbols)'
    header = None
    for name, freq in symdir[key].items():
    print '%15s:%-3s'% (name, freq),
    print
    if header: print header; header = None
    print ' (Singlecase symbols)'
    byfreq = [symdir[k].items()[0] for k in singlecase]
    byfreq = [(n,k) for k,n in byfreq]
    byfreq.sort()
    npr = 0
    for freq, key in byfreq:
    if header:
    print header
    header = None
    print '%15s:%-3s'% (key, freq),
    npr +=1
    if npr%4==3: print
    print
    ========================================================================
    Operating on itself and another little file (you can specify file glob expressions too):

    [18:55] C:\pywk\tok>prtok.py prtok.py gt.py

    ====< prtok.py >====
    (Multicase symbols)
    tokstr:6 TOKSTR:4
    NAME:1 name:2
    (Singlecase symbols)
    append:1 argv:1 clear:1
    def:1 end:1 import:1 keys:1
    len:1 line:1 open:1 or:1
    readline:1 sort:1 start:1 upper:1
    else:2 fileglob:2 has_key:2 items:2
    multicase:2 n:2 sys:2 token:2
    tokeneater:2 type:2 None:3 filename:3
    glob:3 npr:3 singlecase:3 tokenize:3
    d:4 freq:4 k:4 byfreq:5
    for:8 if:8 in:8 key:8
    header:10 print:10 symdir:11

    ====< gt.py >====
    (Singlecase symbols)
    __name__:1 argv:1 def:1
    if:1 fn:2 for:2 import:2
    in:2 main:2 print:2 sys:2
    arg:3 glob:3

    Regards,
    Bengt Richter
     
    Bengt Richter, Jul 26, 2003
    #2
    1. Advertising

  3. On 26 Jul 2003 01:54:19 GMT, (Bengt Richter) wrote:
    [code that got += line switched. needs change to increment after conditional print:]

    --- prtok.py~ Fri Jul 25 18:52:53 2003
    +++ prtok.py Fri Jul 25 19:58:24 2003
    @@ -43,6 +43,6 @@
    print header
    header = None
    print '%15s:%-3s'% (key, freq),
    - npr +=1
    if npr%4==3: print
    + npr +=1
    print

    Regards,
    Bengt Richter
     
    Bengt Richter, Jul 26, 2003
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. dagoodyear

    Parsing text acounting for typos?

    dagoodyear, Jun 12, 2005, in forum: Java
    Replies:
    1
    Views:
    396
    Harald
    Jun 12, 2005
  2. Siemel Naran

    typos in set functions

    Siemel Naran, Nov 30, 2004, in forum: C++
    Replies:
    5
    Views:
    386
    Siemel Naran
    Dec 2, 2004
  3. Manish Jethani

    How to detect typos in Python programs

    Manish Jethani, Jul 25, 2003, in forum: Python
    Replies:
    15
    Views:
    1,678
    David Bolen
    Jul 29, 2003
  4. Peter v.d. Berger

    Matching filenames with typos

    Peter v.d. Berger, Dec 4, 2006, in forum: Perl
    Replies:
    1
    Views:
    1,878
    Jim Gibson
    Dec 5, 2006
  5. Wolfgang Nádasi-donner

    Typos in eigenclass - Changes in Ruby 1.9

    Wolfgang Nádasi-donner, Aug 3, 2007, in forum: Ruby
    Replies:
    0
    Views:
    156
    Wolfgang Nádasi-donner
    Aug 3, 2007
Loading...

Share This Page