Rookie Speaks

  • Thread starter William S. Perrin
  • Start date
W

William S. Perrin

I'm a python rookie, anyone have and suggestions to streamline this
function? Thanks in advance.....


def getdata(myurl):
sock = urllib.urlopen(myurl)
xmlSrc = sock.read()
sock.close()

xmldoc = minidom.parseString(xmlSrc)

def getattrs(weatherAttribute):
a = xmldoc.getElementsByTagName(weatherAttribute)
return a[0].firstChild.data

currname = getattrs("name")
currtemp = getattrs("fahrenheit")
currwind = getattrs("wind")
currdew = getattrs("dewpoint")
currbarom = getattrs("relative_humidity")
currhumid = getattrs("barometric_pressure")
currcondi = getattrs("conditions")

print "%13s\t%s\t%s\t%s\t%s\t%s\t%s" % (currname, currtemp,
currwind, currbarom, currdew, currhumid, currcondi)
 
S

Samuel Walters

|Thus Spake William S. Perrin On the now historical date of Wed, 07 Jan
2004 17:37:57 -0600|
I'm a python rookie, anyone have and suggestions to streamline this
function? Thanks in advance.....

Please define "streamline" in this context.

Do you mean:
faster
smaller
easier to read
etc.

Sam Walters.
 
W

William S. Perrin

Sorry, I guess is it efficient? That is if I called it 1000 times.....
 
S

Samuel Walters

|Thus Spake William S. Perrin On the now historical date of Wed, 07 Jan
2004 17:45:38 -0600|
Sorry, I guess is it efficient? That is if I called it 1000 times.....

Ponders... Even "efficient" has a loose meaning here. Since you describe
running it a thousand times, I'll assume you mean speed of execution.

There's a saying "Premature optimization is the root of all evil." Which
is another way of saying "Try it, and if it's too slow, figure out what
the hold up is. If it's not too slow, don't mess with it." So, try
running it in the context you need it in. Nothing about your code screams
"bad implementation." In fact, it's quite clearly written. Still, you
won't know if it's too slow until you try it.

There's a way to get a definitive answer on how how fast it's running
through the profile module in python. Do some research on that module. If
you're confused about it, come back and ask more questions then.

Take into consideration that it may not be your code that's slow, but
rather the way you're getting your information. This is called being "I/O
bound." The holdup might not be the program, but instead the disk or the
network. After all, you can't process information until you have the
information. The profile module will help you to see if this is the
problem.

One of the slicker solutions to slow code is the psyco module. It can
give an amazing speed boost to many processing intensive functions, but it
can sometimes even slow down your problem.

If you'd like to see an example of both the psyco and profile modules in
action, let me know and I'll give you some more understandable code that I
once wrote to see what types of things psyco is good at optimizing.

HTH

Sam Walters.
 
S

sdd

William said:
I'm a python rookie, anyone have and suggestions to streamline this
function? Thanks in advance.....

... currname = getattrs("name")
currtemp = getattrs("fahrenheit")
currwind = getattrs("wind")
currdew = getattrs("dewpoint")
currbarom = getattrs("relative_humidity")
currhumid = getattrs("barometric_pressure")
currcondi = getattrs("conditions")

print "%13s\t%s\t%s\t%s\t%s\t%s\t%s" % (currname, currtemp, currwind,
currbarom, currdew, currhumid, currcondi)

How about:
name, temp, wind, dew, barom, humid, condi = map(getattrs,
"name fahrenheit wind dewpoint relative_humidity "
" barometric_pressure conditions".split())

print "%13s\t%s\t%s\t%s\t%s\t%s\t%s" % (name, temp, wind,
barom, dew, humid, condi)


-Scott David Daniels
(e-mail address removed)
 
L

Lonnie Princehouse

Anytime you find yourself repeating the same pattern of
code (i.e. the getattrs bit), there's usually a more elegant
way of doing it.

def getdata(myurl):
sock = urllib.urlopen(myurl)
xmlSrc = sock.read()
sock.close()

xmldoc = minidom.parseString(xmlSrc)

def getattrs(weatherAttribute):
a = xmldoc.getElementsByTagName(weatherAttribute)
return a[0].firstChild.data

attributes = ['name', 'fahrenheit', 'wind',
'dewpoint', 'relative_humidity',
'barometric_pressure', 'conditions']

current = {}

for a in attributes:
current[a] = getattrs(a)

format_str = "%13s"+"\t%s"*(len(attributes)-1)
print format_str % tuple([current[a] for a in attributes])


OR, if all you want is to print your numbers, skip the dictionary-

attributes = ['name', 'fahrenheit', 'wind',
'dewpoint', 'relative_humidity',
'barometric_pressure', 'conditions']

format_str = "%13s"+"\t%s"*(len(attributes)-1)
print format_str % tuple([getattrs(a) for a in attributes])
 
P

Peter Otten

William S. Perrin wrote:

I thinke your function has a sane design :) XML is slow by design, but in
your case it doesn't really matter, because is probably I/O-bound, as
already pointed out by Samuel Walters.

Below is a slightly different approach, that uses a class:

class Weather(object):
def __init__(self, url=None, xml=None):
""" Will accept either a URL or a xml string,
preferrably as a keyword argument """
if url:
if xml:
# not sure what would be the right exception here
# (ValueError?), so keep it generic for now
raise Exception("Must provide either url or xml, not both")
sock = urllib.urlopen(url)
try:
xml = sock.read()
finally:
sock.close()
elif xml is None:
raise Exception("Must provide either url or xml")
self._dom = minidom.parseString(xml)

def getAttrFromDom(self, weatherAttribute):
a = self._dom.getElementsByTagName(weatherAttribute)
return a[0].firstChild.data

def asRow(self):
# this will defeat lazy attribute lookup
return "%13s\t%s\t%s\t%s\t%s\t%s\t%s" % (self.name,
self.fahrenheit, self.wind, self.barometric_pressure,
self.dewpoint, self.relative_humidity, self.conditions)
return

def __getattr__(self, name):
try:
value = self.getAttrFromDom(name)
except IndexError:
raise AttributeError(
"'%.50s' object has no attribute '%.400s'" %
(self.__class__, name))
# now set the attribute so it need not be looked up
# in the dom next time
setattr(self, name, value)
return value

This has a slight advantage if you are interested only in a subset of the
attributes, say the temperature:

for url in listOfUrls:
print Weather(url).fahrenheit

Here getAttrFromDom() - the equivalent of your getattrs() - is only called
once per URL. The possibility to print a tab-delimited row is still there,

print Weather(url).asRow()

but will of course defeat this optimization scheme.

Peter
 
J

Jacek Generowicz

Samuel Walters said:
If you'd like to see an example of both the psyco and profile modules in
action, let me know and I'll give you some more understandable code that I
once wrote to see what types of things psyco is good at optimizing.

I think this is generally interesting, and would be curious to see it,
if you'd care to share.
 
S

Samuel Walters

|Thus Spake Jacek Generowicz On the now historical date of Thu, 08 Jan
2004 11:43:01 +0100|
I think this is generally interesting, and would be curious to see it,
if you'd care to share.

Sure thing. The functions at the top are naive prime enumeration
algorithms. I chose them because they're each of a general "looping"
nature and I understand the complexity and methods of each of them. Some
use lists (and hence linearly indexed) methods and some use dictionary(
and hence are has bound). One of them, sieve_list is commented out because
it has such dog performance that I decided I wasn't interested in
how well it optimized.

These tests are by no means complete, nor is this probably a good example
of profiling or the manner in which psyco is useful. It's just from an
area where I understood the algorithmic bottlenecks to begin with.

Sam Walters.

--
Never forget the halloween documents.
http://www.opensource.org/halloween/
""" Where will Microsoft try to drag you today?
Do you really want to go there?"""

from math import sqrt
def primes_list(Limits = 1,KnownPrimes = [ 2 ]):
RetList = KnownPrimes
for y in xrange(2,Limits + 1):
w = y
p, r = 0,0
for x in RetList:
if x*x > w:
RetList.append(w)
break
p,r = divmod(y,x)
if r == 0:
w = p
return RetList

def primes_dict(Limits = 1,KnownPrimes = [ 2 ]):
RetList = KnownPrimes
RetDict = {}
for x in KnownPrimes:
RetDict[x] = 1
w = x + x
n = 2
while w <= Limits + 1:
RetDict[w] = n
w += x
n += 1
p, r = 0,0
for y in xrange(2, Limits + 1):
for x, z in RetDict.iteritems():
if x*x > y:
RetDict[y] = 1
break
p,r = divmod(y,x)
if r == 0:
RetDict[y] = p
break
return RetList

def sieve_list(Limits = 1, KnownPrimes = [ 2 ]):
RetList = KnownPrimes
CompList = [ ]
for y in xrange(2, Limits + 1):
if y not in CompList:
w = y
n = 1
while w <= Limits:
CompList.append(w)
w += y
n += 1
return RetList

def sieve_list_2(Limits = 1, KnownPrimes = [ 2 ]):
SieveList = [ 1 ]*(Limits )
RetList = [ ]
for y in xrange(2, Limits + 1):
if SieveList[y-2] == 1:
RetList.append(y)
w = y + y
n = 2
while w <= Limits + 1:
SieveList[w - 2] = n
w += y
n += 1
return RetList

def sieve_dict(Limits = 1, KnownPrimes = [ 2 ]):
SieveDict = { }
RetList = KnownPrimes
for x in KnownPrimes:
SieveDict[x] = 1
w = x + x
n = 2
while w <= Limits + 1:
SieveDict[w] = n
n += 1
w += x

for y in xrange(2, Limits + 1):
if not SieveDict.has_key(y):
RetList.append(y)
w = y
n = 1
while w <= Limits + 1:
SieveDict[w] = n
w += y
n += 1
return RetList

if __name__ == "__main__":
import sys
import profile
import pstats

import psyco

#this function wraps up all the calls that we wish to benchmark.
def multipass(number, args):
for x in xrange(1, number + 1):
primes_list(args, [ 2 ])
print ".",
sys.stdout.flush()
primes_dict(args, [ 2 ])
print ".",
sys.stdout.flush()
#Do not uncomment this line unless you have a *very* long time to wait.
#sieve_list(args)
sieve_dict(args, [ 2 ])
print ".",
sys.stdout.flush()
sieve_list_2(args, [ 2 ])
print "\r \r%i/%i"%(x, number),
sys.stdout.flush()
print "\n"

#number of times through the test
passes = 5
#find all primes up to maximum
maximum = 1000000

#create a profiling instance
#adjust the argument based on your system.
pr = profile.Profile( bias = 7.5e-06)

#run the tests
pr.run("multipass(%i, %i)"%(passes,maximum))
#save them to a file.
pr.dump_stats("primesprof")

#remove the profiling instance so that we can get a clean comparison.
del pr

#create a profiling instance
#adjust the argument based on your system.
pr = profile.Profile( bias = 7.5e-06)

#"recompile" each of the functions under consideration.
psyco.bind(primes_list)
psyco.bind(primes_dict)
psyco.bind(sieve_list)
psyco.bind(sieve_list_2)
psyco.bind(sieve_dict)

#run the tests
pr.run("multipass(%i, %i)"%(passes,maximum))
#save them to a file
pr.dump_stats("psycoprimesprof")

#clean up our mess
del pr

#load and display each of the run-statistics.
pstats.Stats('primesprof').strip_dirs().sort_stats('cum').print_stats()
pstats.Stats('psycoprimesprof').strip_dirs().sort_stats('cum').print_stats()
 
T

Tim Churches

|Thus Spake Jacek Generowicz On the now historical date of Thu, 08 Jan
2004 11:43:01 +0100|


Sure thing. The functions at the top are naive prime enumeration
algorithms. I chose them because they're each of a general "looping"
nature and I understand the complexity and methods of each of them. Some
use lists (and hence linearly indexed) methods and some use dictionary(
and hence are has bound). One of them, sieve_list is commented out because
it has such dog performance that I decided I wasn't interested in
how well it optimized.

Out of curiosity I ran your code, and obtained these results:

Fri Jan 9 08:30:25 2004 primesprof

23 function calls in 2122.530 CPU seconds

....

Fri Jan 9 08:43:24 2004 psycoprimesprof

23 function calls in -3537.828 CPU seconds

Does that mean that Armin Rigo has slipped some form of Einsteinian,
relativistic compiler into Psyco? I am reminded of the well-known
limerick:

There once was a lady called Bright,
Who could travel faster than light.
She went out one day,
In a relative way,
And came back the previous night.

--

Tim C

PGP/GnuPG Key 1024D/EAF993D0 available from keyservers everywhere
or at http://members.optushome.com.au/tchur/pubkey.asc
Key fingerprint = 8C22 BF76 33BA B3B5 1D5B EB37 7891 46A9 EAF9 93D0



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQA//dVyeJFGqer5k9ARAsuKAKDOA3t41ZqQy9QNIp9pZ2uuDE8yQACgo0wM
1w6Kzm37Xp/c3k5SaNk9iv4=
=XnLz
-----END PGP SIGNATURE-----
 
S

Samuel Walters

|Thus Spake Tim Churches On the now historical date of Fri, 09 Jan 2004
09:10:58 +1100|
Does that mean that Armin Rigo has slipped some form of Einsteinian,
relativistic compiler into Psyco?

No, no. It means one of two things: either you didn't adjust constant
that tries to factor out the overhead of profiling, or the call took so
long that the timer actually overflowed.

This will help you set the proper constant:

-----
import profile
import pprint

tests = 20
cycles = 10000
pr = profile.Profile()
proflist = []
for x in xrange(1, tests + 1):
proflist.append(pr.calibrate(cycles))

pprint.pprint(proflist)
-----

Increase cycles until your results don't exhibit much of a spread, then
take the lowest of those values. This is the constant you set when
instantiating a profiling object. It is specific to each individual
machine.

If it *still* gives you negative times, then the timer is overflowing and
you need to adjust the original script so that you're not running through
such a big list of numbers.

Then your apparent problems with causality should be solved.

Sam Walters.
 

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top