making a typing speed tester

Discussion in 'Python' started by tavspamnofwd@googlemail.com, Nov 14, 2007.

  1. Guest

    Referred here from the tutor list.

    > I'm trying to write a program to test someones typing speed and show
    > them their mistakes. However I'm getting weird results when looking
    > for the differences in longer (than 100 chars) strings:
    >
    > import difflib
    >
    > # a tape measure string (just makes it easier to locate a given index)
    > a =
    > '1-3-5-7-9-12-15-18-21-24-27-30-33-36-39-42-45-48-51-54-57-60-63-66-69
    > -72-75-78-81-84-87-90-93-96-99-103-107-111-115-119-123-127-131-135-139
    > -143-147-151-155-159-163-167-171-175-179-183-187-191-195--200'
    >
    > # now with a few mistakes
    > b = '1-3-5-7-
    > l-12-15-18-21-24-27-30-33-36-39o42-45-48-51-54-57-60-63-66-69-72-75-78
    > -81-84-8k-90-93-96-9l-103-107-111-115-119-12b-1v7-131-135-139-143-147-
    > 151-m55-159-163-167-a71-175j179-183-187-191-195--200'
    >
    > s = difflib.SequenceMatcher(None, a ,b)
    > ms = s.get_matching_blocks()
    >
    > print ms
    >
    >>>> [(0, 0, 8), (200, 200, 0)]

    >
    > Have I made a mistake or is this function designed to give up when the
    > input strings get too long? If so what could I use instead to compute
    > the mistakes in a typed text?


    ---------- Forwarded message ----------
    From: Evert Rol

    Hi Tom,

    Ok, I wasn't on the list last year, but I was a few days ago, so
    persistence pays off; partly, as I don't have a full answer.

    I got curious and looked at the source of difflib. There's a method
    __chain_b() which sets up the b2j variable, which contains the
    occurrences of characters in string b. So cutting b to 199
    characters, it looks like this:
    b2j= 19 {'a': [168], 'b': [122], 'm': [152], 'k': [86], 'v':
    [125], '-': [1, 3, 5, 7, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 42,
    45, 48, 51, 54, 57, 60, 63, 66, 69, 72, 75, 78, 81, 84, 87, 90, 93,
    96, 99, 103, 107, 111, 115, 119, 123, 127, 131, 135, 139, 143, 147,
    151, 155, 159, 163, 167, 171, 179, 183, 187, 191, 195, 196], 'l': [8,
    98], 'o': [39], 'j': [175], '1': [0, 10, 13, 16, 20, 50, 80, 100,
    104, 108, 109, 110, 112, 113, 116, 117, 120, 124, 128, 130, 132, 136,
    140, 144, 148, 150, 156, 160, 164, 170, 172, 176, 180, 184, 188, 190,
    192], '0': [29, 59, 89, 101, 105, 198], '3': [2, 28, 31, 32, 34, 37,
    62, 92, 102, 129, 133, 137, 142, 162, 182], '2': [11, 19, 22, 25, 41,
    71, 121, 197], '5': [4, 14, 44, 49, 52, 55, 74, 114, 134, 149, 153,
    154, 157, 174, 194], '4': [23, 40, 43, 46, 53, 83, 141, 145], '7':
    [6, 26, 56, 70, 73, 76, 106, 126, 146, 166, 169, 173, 177, 186], '6':
    [35, 58, 61, 64, 65, 67, 95, 161, 165], '9': [38, 68, 88, 91, 94, 97,
    118, 138, 158, 178, 189, 193], '8': [17, 47, 77, 79, 82, 85, 181,
    185]}

    This little detour is because of how b2j is built. Here's a part from
    the comments of __chain_b():

    # Before the tricks described here, __chain_b was by far the most
    # time-consuming routine in the whole module! If anyone sees
    # Jim Roskind, thank him again for profile.py -- I never would
    # have guessed that.

    And the part of the actual code reads:
    b = self.b
    n = len(b)
    self.b2j = b2j = {}
    populardict = {}
    for i, elt in enumerate(b):
    if elt in b2j:
    indices = b2j[elt]
    if n >= 200 and len(indices) * 100 > n: # <--- !!
    populardict[elt] = 1
    del indices[:]
    else:
    indices.append(i)
    else:
    b2j[elt] =

    So you're right: it has a stop at the (somewhat arbitrarily) limit of
    200 characters. How that exactly works, I don't know (needs more
    delving into the code), though it looks like there also need to be a
    lot of indices (len(indices*100>n); I guess that's caused in your
    strings by the dashes, '1's and '0's (that's why I printed the b2j
    string).
    If you feel safe enough and on a fast platform, you can probably up
    that limit (or even put it somewhere as an optional variable in the
    code, which I would think is generally better).
    Not sure who the author of the module is (doesn't list in the file
    itself), but perhaps you can find out and email him/her, to see what
    can be altered.

    Hope that helps.

    Evert
    , Nov 14, 2007
    #1
    1. Advertising

  2. Guest

    On Nov 14, 11:56 am, wrote:
    > Referred here from the tutor list.
    >
    > > I'm trying to write a program to test someones typing speed and show
    > > them their mistakes. However I'm getting weird results when looking
    > > for the differences in longer (than 100 chars) strings:

    >
    > > import difflib

    >
    > > # a tape measure string (just makes it easier to locate a given index)
    > > a =
    > > '1-3-5-7-9-12-15-18-21-24-27-30-33-36-39-42-45-48-51-54-57-60-63-66-69
    > > -72-75-78-81-84-87-90-93-96-99-103-107-111-115-119-123-127-131-135-139
    > > -143-147-151-155-159-163-167-171-175-179-183-187-191-195--200'

    >
    > > # now with a few mistakes
    > > b = '1-3-5-7-
    > > l-12-15-18-21-24-27-30-33-36-39o42-45-48-51-54-57-60-63-66-69-72-75-78
    > > -81-84-8k-90-93-96-9l-103-107-111-115-119-12b-1v7-131-135-139-143-147-
    > > 151-m55-159-163-167-a71-175j179-183-187-191-195--200'

    >
    > > s = difflib.SequenceMatcher(None, a ,b)
    > > ms = s.get_matching_blocks()

    >
    > > print ms

    >
    > >>>> [(0, 0, 8), (200, 200, 0)]

    >
    > > Have I made a mistake or is this function designed to give up when the
    > > input strings get too long? If so what could I use instead to compute
    > > the mistakes in a typed text?

    > ---------- Forwarded message ----------
    > From: Evert Rol
    >
    > Hi Tom,
    >
    > Ok, I wasn't on the list last year, but I was a few days ago, so
    > persistence pays off; partly, as I don't have a full answer.
    >
    > I got curious and looked at the source of difflib. There's a method
    > __chain_b() which sets up the b2j variable, which contains the
    > occurrences of characters in string b. So cutting b to 199
    > characters, it looks like this:
    > b2j= 19 {'a': [168], 'b': [122], 'm': [152], 'k': [86], 'v':
    > [125], '-': [1, 3, 5, 7, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 42,
    > 45, 48, 51, 54, 57, 60, 63, 66, 69, 72, 75, 78, 81, 84, 87, 90, 93,
    > 96, 99, 103, 107, 111, 115, 119, 123, 127, 131, 135, 139, 143, 147,
    > 151, 155, 159, 163, 167, 171, 179, 183, 187, 191, 195, 196], 'l': [8,
    > 98], 'o': [39], 'j': [175], '1': [0, 10, 13, 16, 20, 50, 80, 100,
    > 104, 108, 109, 110, 112, 113, 116, 117, 120, 124, 128, 130, 132, 136,
    > 140, 144, 148, 150, 156, 160, 164, 170, 172, 176, 180, 184, 188, 190,
    > 192], '0': [29, 59, 89, 101, 105, 198], '3': [2, 28, 31, 32, 34, 37,
    > 62, 92, 102, 129, 133, 137, 142, 162, 182], '2': [11, 19, 22, 25, 41,
    > 71, 121, 197], '5': [4, 14, 44, 49, 52, 55, 74, 114, 134, 149, 153,
    > 154, 157, 174, 194], '4': [23, 40, 43, 46, 53, 83, 141, 145], '7':
    > [6, 26, 56, 70, 73, 76, 106, 126, 146, 166, 169, 173, 177, 186], '6':
    > [35, 58, 61, 64, 65, 67, 95, 161, 165], '9': [38, 68, 88, 91, 94, 97,
    > 118, 138, 158, 178, 189, 193], '8': [17, 47, 77, 79, 82, 85, 181,
    > 185]}
    >
    > This little detour is because of how b2j is built. Here's a part from
    > the comments of __chain_b():
    >
    > # Before the tricks described here, __chain_b was by far the most
    > # time-consuming routine in the whole module! If anyone sees
    > # Jim Roskind, thank him again for profile.py -- I never would
    > # have guessed that.
    >
    > And the part of the actual code reads:
    > b = self.b
    > n = len(b)
    > self.b2j = b2j = {}
    > populardict = {}
    > for i, elt in enumerate(b):
    > if elt in b2j:
    > indices = b2j[elt]
    > if n >= 200 and len(indices) * 100 > n: # <--- !!
    > populardict[elt] = 1
    > del indices[:]
    > else:
    > indices.append(i)
    > else:
    > b2j[elt] =
    >
    > So you're right: it has a stop at the (somewhat arbitrarily) limit of
    > 200 characters. How that exactly works, I don't know (needs more
    > delving into the code), though it looks like there also need to be a
    > lot of indices (len(indices*100>n); I guess that's caused in your
    > strings by the dashes, '1's and '0's (that's why I printed the b2j
    > string).
    > If you feel safe enough and on a fast platform, you can probably up
    > that limit (or even put it somewhere as an optional variable in the
    > code, which I would think is generally better).
    > Not sure who the author of the module is (doesn't list in the file
    > itself), but perhaps you can find out and email him/her, to see what
    > can be altered.
    >
    > Hope that helps.
    >
    > Evert


    I would use the time module to "time" the user. Then you should be
    able to compare the original string with the user inputted string
    using cmp.

    <code>
    # untested

    start = time.time()
    print 'some complicated long string'

    # you should use a GUI toolkit's textbox rather than
    # using a variable
    user_string = raw_input('Please type the string above as quickly and
    accurately as you can:\n\n')
    end = time.time()
    print 'amount of time to complete: %s seconds' % (end-start)

    # do the comparison here
    # which I am not sure how to do right now
    </code>

    See the following for ideas on comparing similar strings/iterables:

    http://www.velocityreviews.com/forums/t345107-comparing-2-similar-strings.html

    Mike
    , Nov 14, 2007
    #2
    1. Advertising

  3. En Wed, 14 Nov 2007 14:56:25 -0300, <> escribió:

    >> I'm trying to write a program to test someones typing speed and show
    >> them their mistakes. However I'm getting weird results when looking
    >> for the differences in longer (than 100 chars) strings:
    >>
    >> import difflib
    >>
    >> # a tape measure string (just makes it easier to locate a given index)
    >> a =
    >> '1-3-5-7-9-12-15-18-21-24-27-30-33-36-39-42-45-48-51-54-57-60-63-66-69
    >> -72-75-78-81-84-87-90-93-96-99-103-107-111-115-119-123-127-131-135-139
    >> -143-147-151-155-159-163-167-171-175-179-183-187-191-195--200'
    >>
    >> # now with a few mistakes
    >> b = '1-3-5-7-
    >> l-12-15-18-21-24-27-30-33-36-39o42-45-48-51-54-57-60-63-66-69-72-75-78
    >> -81-84-8k-90-93-96-9l-103-107-111-115-119-12b-1v7-131-135-139-143-147-
    >> 151-m55-159-163-167-a71-175j179-183-187-191-195--200'
    >>
    >> s = difflib.SequenceMatcher(None, a ,b)
    >> ms = s.get_matching_blocks()
    >>
    >> print ms
    >>
    >>>>> [(0, 0, 8), (200, 200, 0)]

    >>
    >> Have I made a mistake or is this function designed to give up when the
    >> input strings get too long? If so what could I use instead to compute
    >> the mistakes in a typed text?


    Yes, there are some limitations on how SequenceMatcher works.

    > ---------- Forwarded message ----------
    > From: Evert Rol
    > [...]
    > And the part of the actual code reads:


    > if n >= 200 and len(indices) * 100 > n: # <--- !!
    > populardict[elt] = 1
    > del indices[:]
    > else:
    > indices.append(i)>


    > So you're right: it has a stop at the (somewhat arbitrarily) limit of
    > 200 characters. [...]If you feel safe enough and on a fast platform, you
    > can probably up
    > that limit (or even put it somewhere as an optional variable in the
    > code, which I would think is generally better).


    If you try with a slightly shorter text (190 chars, by example) you get
    the expected result, pretty fast:

    py> s = difflib.SequenceMatcher(None, a[:190], b[:190])
    py> ms = s.get_matching_blocks()
    py> print ms
    [(0, 0, 8), (9, 9, 30), (40, 40, 46), (87, 87, 11), (99, 99, 23), (123,
    123, 2),
    (126, 126, 26), (153, 153, 15), (169, 169, 6), (176, 176, 14), (190, 190,
    0)]

    So it appears that your strings are hitting that (arbitrary) limit. From
    the algorithm point of view, your strings are a rather degenerate case: so
    many '-' and '0' and '1's to match.
    Try increasing that 200 to somewhat larger than your strings.

    --
    Gabriel Genellina
    Gabriel Genellina, Nov 15, 2007
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ham

    I need speed Mr .Net....speed

    Ham, Oct 28, 2004, in forum: ASP .Net
    Replies:
    6
    Views:
    2,322
    Antony Baula
    Oct 29, 2004
  2. Guy
    Replies:
    2
    Views:
    537
    Sid Ismail
    Dec 4, 2003
  3. Replies:
    2
    Views:
    2,275
    Howard
    Apr 28, 2004
  4. Replies:
    2
    Views:
    330
    Christopher Benson-Manica
    Apr 28, 2004
  5. JimLad
    Replies:
    0
    Views:
    493
    JimLad
    Jan 26, 2010
Loading...

Share This Page