How to check for single character change in a string?

T

tinnews

Can anyone suggest a simple/easy way to count how many characters have
changed in a string?

E.g. giving results as follows:-

abcdefg abcdefh 1
abcdefg abcdekk 2
abcdefg gfedcba 6


Note that position is significant, a character in a different position
should not count as a match.

Is there any simpler/neater way than just a for loop running through
both strings and counting non-matching characters?
 
I

Ian Kelly

Can anyone suggest a simple/easy way to count how many characters have
changed in a string?

E.g. giving results as follows:-

   abcdefg     abcdefh         1
   abcdefg     abcdekk         2
   abcdefg     gfedcba         6


Note that position is significant, a character in a different position
should not count as a match.

Is there any simpler/neater way than just a for loop running through
both strings and counting non-matching characters?

No, but the loop approach is pretty simple:

sum(a == b for a, b in zip(str1, str2))
 
R

Roy Smith

Can anyone suggest a simple/easy way to count how many characters have
changed in a string?

Depending on exactly how you define "changed", you're probably talking
about either Hamming Distance or Levenshtein Distance. I would start
with the wikipedia articles on both those topics and explore from there.

There are python packages for computing many of these metrics. For
example, http://pypi.python.org/pypi/python-Levenshtein/
Is there any simpler/neater way than just a for loop running through
both strings and counting non-matching characters?

If you don't care about insertions and deletions (and it sounds like you
don't), then this is the way to do it. It's O(n), and you're not going
to get any better than that. It's a one-liner in python:
s1 = 'abcdefg'
s2 = 'abcdekk'
len([x for x in zip(s1, s2) if x[0] != x[1]])
2

But go read the wikipedia articles. Computing distance between
sequences is an interesting, important, and well-studied topic. It's
worth exploring a bit.
 
A

Arnaud Delobelle

len([x for x in zip(s1, s2) if x[0] != x[1]])

Heh, Ian Kelly's version:
sum(a == b for a, b in zip(str1, str2))

is cleaner than mine.  Except that Ian's counts matches and the OP asked
for non-matches, but that's an exercise for the reader :)[/QUOTE]

Here's a variation on the same theme:

sum(map(str.__ne__, str1, str2))
 
T

tinnews

Roy Smith said:
len([x for x in zip(s1, s2) if x[0] != x[1]])

Heh, Ian Kelly's version:
sum(a == b for a, b in zip(str1, str2))

is cleaner than mine. Except that Ian's counts matches and the OP asked
for non-matches, but that's an exercise for the reader :)[/QUOTE]

:)

I'm actually walking through a directory tree and checking that file
characteristics don't change in a sequence of files.

What I'm looking for is 'unusual' changes in file characteristics
(they're image files with camera information and such in them) in a
sequential list of files.

Thus if file001, file002, file003, file004 have the same camera type
I'm happy, but if file003 appears to have been taken with a different
camera something is probably amiss. I realise there will be *two*
character changes when going from file009 to file010 but I can cope
with that. I can't just extract the sequence number because in some
cases they have non-numeric names, etc.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,143
Latest member
DewittMill
Top