Don't understand SequenceMatcher from difflib

Discussion in 'Python' started by Antoon Pardon, Jun 21, 2011.

  1. I have the following code I wrote.

    ==============================================

    from difflib import SequenceMatcher

    import sys
    write = sys.stdout.write
    warn = sys.stderr.write

    def program(argv):
    ls1 = open(argv[1]).readlines()
    ls2 = open(argv[2]).readlines()
    matcher = SequenceMatcher(ls1, ls2)
    s1 = 0
    s2 = 0
    print ls1
    print ls2
    warn("*** %d %d \n" % (len(ls1), len(ls2)))
    for e1, e2, lg in matcher.get_matching_blocks():
    warn("*** %d %d %d\n" % (e1, e2, lg))
    for i in xrange(s1, e1):
    write('- ')
    write(ls1)
    for i in xrange(s2, e2):
    write('+ ')
    write(ls2)
    for i in xrange(e1, e1+lg):
    write(' ')
    write(ls1)
    s1, s2 = e1 + lg, e2 + lg

    if __name__ == '__main__':
    program(sys.argv)

    ===============================================

    Now when I run it I get the following result:

    python diff.py map.0 map.1
    ['\n', 'begin\n', ' a1\n', ' a2\n', ' a3\n', ' a4\n', ' a5\n', 'end\n', '\n', 'begin\n', ' c1\n', ' c2\n', ' c3\n', ' c4\n', ' c5\n', ' c6\n', ' c7\n', 'end\n', '\n', 'begin\n', ' e1\n', ' e2\n', ' e3\n', ' e4\n', ' e5\n', ' e6\n', ' e7\n', ' e8\n', ' e9\n', 'end\n']
    ['\n', 'begin\n', ' a1\n', ' a2\n', ' a3\n', ' a4\n', ' a5\n', 'end\n', '\n', 'begin\n', ' c1\n', ' c2\n', ' c3\n', ' c4\n', ' c5\n', ' c6\n', ' c7\n', 'end\n', '\n', 'begin\n', ' d1\n', ' d2\n', ' d3\n', 'end\n', '\n', 'begin\n', ' e1\n', ' e2\n', ' e3\n', ' e4\n', ' e5\n', ' e6\n', ' e7\n', ' e8\n', ' e9\n', 'end\n']
    *** 30 36
    *** 36 0 0
    -
    - begin
    - a1
    - a2
    - a3
    - a4
    ....
    - Traceback (most recent call last):
    File "diff.py", line 31, in <module>
    program(sys.argv)
    File "diff.py", line 21, in program
    write(ls1)
    IndexError: list index out of range

    What I don't understand is: The first list is 30 items long and the second 36.
    But the first match I get after calling get_matching_blocks says the match starts
    at item 36 of the first list.

    Yes I noticed it is the sepcial last match with 0 siza,e but even if that would be
    correct because there would be no common items, the first number of the match
    shouldn't be more than the length of the first list.

    What am I doing wrong?
    Antoon Pardon, Jun 21, 2011
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. shuhsien
    Replies:
    0
    Views:
    312
    shuhsien
    Oct 17, 2003
  2. Tim Peters
    Replies:
    0
    Views:
    402
    Tim Peters
    Oct 17, 2003
  3. Peter Galfi

    Overriding compare in SequenceMatcher

    Peter Galfi, Jan 26, 2004, in forum: Python
    Replies:
    0
    Views:
    284
    Peter Galfi
    Jan 26, 2004
  4. Vlastimil Brom
    Replies:
    0
    Views:
    272
    Vlastimil Brom
    Apr 16, 2010
  5. John Yeung
    Replies:
    3
    Views:
    374
    John Machin
    Nov 24, 2010
Loading...

Share This Page