String functions: what's the difference?

H

Harro de Jong

(absolute beginner here, sorry if this seems basic)

Section 7.10 of 'How to Think Like a Computer Scientist' contains this
discussion of string.find and other string functions:

(quote)
We can use these constants and find to classify characters. For example, if
find(lowercase, ch) returns a value other than -1, then ch must be lowercase:

def isLower(ch):
return string.find(string.lowercase, ch) != -1

Alternatively, we can take advantage of the in operator, which determines
whether a character appears in a string:
def isLower(ch):
return ch in string.lowercase

As yet another alternative, we can use the comparison operator:
def isLower(ch):
return 'a' <= ch <= 'z'
If ch is between a and z, it must be a lowercase letter.

As an exercise, discuss which version of isLower you think will be
fastest. Can you think of other reasons besides speed to prefer one
or the other?

(end quote)

I've tried all three, but the function is so small (test a single letter) I
can't measure the difference. I'm using time.time() to see how long it takes to
execute the function.
I could use a loop to increase execution time, but then I might be measuring
mostly overhead.

I'd expect the third option to be the fastest (involves looking up 3 values,
where the others have to iterate through a-z), but am I right?
And reasons to prefer one? a-z doesn't contain all lowercase letters (it omits
acents and symbols), but other than that?
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Harro said:
I've tried all three, but the function is so small (test a single letter) I
can't measure the difference. I'm using time.time() to see how long it takes to
execute the function.
I could use a loop to increase execution time, but then I might be measuring
mostly overhead.

Still, this is what you should do. Try the timeit.py module; it does the
loop for you. Surprisingly, one of the faster ways to do a loop is

nones = [None]*10000000

<start timer>
for x in nones:
<action>
<stop timer>

This is fast because no Python integers are created to implement the
loop.
I'd expect the third option to be the fastest (involves looking up 3 values,
where the others have to iterate through a-z), but am I right?

Just measure it for yourself. I just did, and the third option indeed
came out fastest, with the "in" operator only slightly slower.
And reasons to prefer one?

For what purpose? To find out whether a letter is lower-case?

Just use the .islower() method on the character for that.

Regards,
Martin
 
G

gry

First, don't appologize for asking questions. You read, you thought,
and you tested. That's more than many people on this list do. Bravo!

One suggestion: when asking questions here it's a good idea to always
briefly mention which version of python and what platform (linux,
windows, etc) you're using. It helps us answer your questions more
effectively.

For testing performance the "timeit" module is great. Try something
like:
python -mtimeit -s 'import string;from myfile import isLower'
"isLower('x')"

You didn't mention the test data, i.e. the character you're feeding to
isLower.
It might make a difference if the character is near the beginning or
end of the range.

As to reasons to prefer one or another implementation, one *very*
important question is "which one is clearer?". It may sound like a
minor thing, but when I'm accosted first thing in the
morning(pre-coffee) about a nasty urgent bug and sit down to pore over
code and face "string.find(string.lowercase, ch) != -1", I'm not happy.

Have fun with python!
-- George Young
 
H

Harro de Jong

One suggestion: when asking questions here it's a good idea to always
briefly mention which version of python and what platform (linux,
windows, etc) you're using.

Of course, forgot about that. It's Python 2.4.2 for Windows.
For testing performance the "timeit" module is great. Try something
like:
python -mtimeit -s 'import string;from myfile import isLower'
"isLower('x')"

Thanks for the pointer. I was using time.time(), which I now see isn't
very accurate on Windows.
You didn't mention the test data, i.e. the character you're feeding to
isLower.
It might make a difference if the character is near the beginning or
end of the range.

(slaps forehead) I used "A" as the input, of course that would make a
difference.

....
Have fun with python!

If the few days I've spent on it so far are any indication, I will. This
is my first foray into programming since college; I like 'How to Think
Like a Computer Scientist' much better than my impenetrable C textbook.
 
M

Magnus Lycka

Harro said:
Thanks for the pointer. I was using time.time(), which I now see isn't
very accurate on Windows.

time.clock() is more accurate on Windows (and much less so on
Linux, where it also measures something completely different.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,608
Members
45,241
Latest member
Lisa1997

Latest Threads

Top