why am I getting a segmentation fault?

J

Jay donnell

I have a short multi-threaded script that checks web images to make
sure they are still there. I get a segmentation fault everytime I run
it and I can't figure out why. Writing threaded scripts is new to me so
I may be doing something wrong that should be obvious :(

google messes up the python code so here is a link to it.

http://kracomp.com/~jay/py.txt

This is the output of the script.
[jay@localhost scripts]$ ./py.py
update item set goodImage = 'yes' where productId='12603'
update item set goodImage = 'yes' where productId='18272'
update item set goodImage = 'yes' where productId='1927'
update item set goodImage = 'no' where productId='12709'
update item set goodImage = 'yes' where productId='32087'
update item set goodImage = 'no' where productId='25803'
Segmentation fault



Thanks in advance.
 
P

Paul McGuire

Jay donnell said:
I have a short multi-threaded script that checks web images to make
sure they are still there. I get a segmentation fault everytime I run
it and I can't figure out why. Writing threaded scripts is new to me so
I may be doing something wrong that should be obvious :(
<snip>

Here is a code excerpt from your link (the main routine, omits the class
definition for ImageChecker, which extends threading.Thread):

db = MySQLdb.connect(host="localhost", user="xxx", passwd="xxx", db="xxx")
cursor = db.cursor()
query = "select * from item order by rand() limit 0, 100"
#query = "select * from item"
cursor.execute(query)
result = cursor.fetchall()

maxThreads = 5

for r in result:
while(threading.activeCount() > maxThreads):
pass
flag='good'
#pass
#print str(r[0]) + ', ' + str(r[7])
tmp = r[7].split('/')
filename = tmp[-1]
#print 'filename ' + filename

filename = '/tmp/'+filename

threadList = []

#r[7] is the url of the image
#r[0] is the id for the row
imageChecker = ImageChecker(r[7], filename, r[0])
imageChecker.start()
threadList.append(imageChecker)

----------------------------------------------
1. What happens after you fall out of the loop "for r in result"? Shouldn't
you wait for the remaining threads to finish? Perhaps you need another
busy-loop to wait for the threads to finish, something like
while threading.activeCount() > 0: pass
2. Is this the best way to busy-wait? What about some kind of
thread.join()? At least throw a sleep call in there or something, or this
loop will churn and churn, diverting CPU from your threads that are actually
trying to do some real work.
3. I find it easier to work with named variables than numeric subscripts.
At the top of your for loop, try something like:
id,height,width,numBytes,whatever,slkdjf1,slkdjf2,url = r
This way you have much more meaningful names than r[0] and r[7], which you
later have to comment to explain whats going on!
4. filename=r[7].split('/')[-1] is not terribly portable. See if there is a
standard module for parsing filespecs (I'll bet there is).

-- Paul
 
J

Jeff Shannon

Paul said:
4. filename=r[7].split('/')[-1] is not terribly portable. See if there is a
standard module for parsing filespecs (I'll bet there is).

Indeed there is -- os.path. In particular, os.path.basename() seems
to do exactly that snippet is intending, in a much more robust (and
readable) fashion.

Jeff Shannon
Technician/Programmer
Credit International
 
J

John Machin

Jay said:
I have a short multi-threaded script that checks web images to make
sure they are still there. I get a segmentation fault everytime I run
it and I can't figure out why. Writing threaded scripts is new to me so
I may be doing something wrong that should be obvious :(

def run(self):
try:
self.site = urllib.urlopen(self.url)
self.f=open(self.filename, 'w')
self.im = Image.open(self.filename)
self.size = self.im.size
self.flag = 'yes'
self.q = "yadda yadda"

That's SIX names that don't need to be attributes of the object; they
are used _only_ in this method, so they can be local to the method,
saving you a whole lot of typing "self." and saving puzzlement &
careful scrutiny by those trying to read your code.

Back to your real problem: does it work when you set maxThreads to 1?
did it work before you added the threading code? what does your
debugger tell you about the location of the seg fault?

MOST IMPORTANTLY, sort this mess out:

self.q = "update item set goodImage = '" + self.flag + "' where
productId='" + str(self.id) + "'"
print self.q
self.cursor.execute(query) ###############################

### "query" is a global variable[YUK!] (see below) which isn't going to
do what you want. Looks like you meant "self.q".

self.db.close()

db = MySQLdb.connect(host="localhost", user="xxx", passwd="xxx",
db="xxx")
cursor = db.cursor()
query = "select * from item order by rand() limit 0, 100"


### Have you looked in your database to see if the script has actually
updated item.goodImage? Do you have a test plan?
 
J

Jay donnell

Thank you.

I made all the changes you recommended and everything seems to be
working.
 
J

Jay donnell

### Have you looked in your database to see if the >script has
actually
updated item.goodImage? Do you have a test plan?

Thank you for the help. Sorry for the messy code. I was under time
constraints. I had class, and I was rushing to get this working before
class. I should waited a day and read over it before I asked. Sorry
again.
 
J

John Machin

Jay said:
Thank you for the help. Sorry for the messy code. I was under time
constraints. I had class, and I was rushing to get this working before
class. I should waited a day and read over it before I asked. Sorry
again.

Apologise to _yourself_ for the messy code. If 'query' had been safely
tucked away as a local variable instead of a global, that problem
(typing 'query' instead of 'self.q') would have caused an exception on
the first time around.

A few points: Yes, don't rush. I've-forgotten-whom wrote something like
"Don't program standing up". Good advice.

Build things a piece at a time. Build your tests at the same time or
earlier. In this case, a stand-alone method or function that checked
that one file was OK, would have been a reasonable place to start. Then
add the code to query the db. Then add the threading stuff, gingerly.
If you build and test incrementally, then a segfault or other disaster
is highly likely to have been caused by the latest addition.

AND SOME MORE CRUFT:
def __init__(self, url, filename, id):
self.t = time.time() <<<<<<<<<======= never used again
threading.Thread.__init__(self)
self.db = MySQLdb.connect(host="localhost", user="xxx",
passwd="xxx", db="xxx")
# create a cursor
self.cursor = db.cursor() <<<=== should be self.db.cursor()
#### picks up the *GLOBAL* 'db'
self.url = url
self.filename = filename
self.id = id
===========================
threadList = []
[snip]
threadList.append(imageChecker)
===>>>> that doesn't achieve much!
============================

N.B. You still have two problems: (1) Your script as you said now
"seems to work". That doesn't sound like you have a test plan. (2) You
have shuffled your code around and the segfault went away; i.e. you
waved a dead chicken and the volcano stopped erupting. Most of the
changes suggested by others and myself were of a style/clarity nature.
The self.q/query thing would have caused a select instead an update;
hardly segfault territory. I wouldn't expect that busy-wait loop to
cause a segfault. You still don't know what caused the segfault. That
means you don't know how to avoid it in the future. You are still
living in the shadow of the volcano. Will the chicken trick work next
time?

Looking forward to the next episode,
John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,050
Latest member
AngelS122

Latest Threads

Top