Pre-defining an action to take when an expected error occurs

T

Tempo

Hello. I am getting the error that is displayed below, and I know
exactly why it occurs. I posted some of my program's code below, and if
you look at it you will see that the error terminates the program
pre-maturely. Becasue of this pre-mature termination, the program is
not able to execute it's final line of code, which is a very important
line. The last line saves the Excel spreadsheet. So is there a way to
make sure the last line executes? Thanks in advanced for all of the
help. Thank you.


Error
####

IndexError: list index out of range


Code Sample
###########

for rx in range(sh.nrows):
rx = rx +1
u = sh.cell_value(rx, 0)
u = str(u)
if u != end:
page = urllib2.urlopen(u)
soup = BeautifulSoup(page)
p = soup.findAll('span', "sale")
p = str(p)
p2 = re.findall('\$\d+\.\d\d', p)
for row in p2:
ws.write(r,0,row)

w.save('price_list.xls')
 
S

Steven D'Aprano

Hello. I am getting the error that is displayed below, and I know
exactly why it occurs. I posted some of my program's code below, and if
you look at it you will see that the error terminates the program
pre-maturely. Becasue of this pre-mature termination, the program is
not able to execute it's final line of code, which is a very important
line. The last line saves the Excel spreadsheet. So is there a way to
make sure the last line executes?

Two methods:

(1) Fix the bug so the program no longer terminates early. You are getting
an IndexError "list index out of range", so fix the program so it no
longer tries to access beyond the end of the list.

I'm guessing that your error is right at the beginning of the loop. You
say:

for rx in range(sh.nrows):
rx = rx +1

Why are you adding one to the loop variable? That's equivalent to:

for rx in range(1, sh.nrows + 1)

which probably means it skips row 0 and tries to access one row past the
end of sh. If all you want to do is skip row 0, do this instead:

for rx in range(1, sh.nrows)


(2) Stick a band-aid over the error with a try...except block, and hope
you aren't covering up other errors as well. While we're at it, let's
refactor the code a little bit...

# untested
def write_row(rx, sh, end, ws):
u = str(sh.cell_value(rx, 0))
if u != end:
soup = BeautifulSoup(urllib2.urlopen(u))
p = str(soup.findAll('span', "sale"))
for row in re.findall('\$\d+\.\d\d', p):
ws.write(r,0,row) # what's r? did you mean rx?


Now call this:

for rx in range(sh.nrows):
rx += 1 # but see my comments above...
try:
write_row(rx, sh, end, ws)
except IndexError:
pass
w.save('price_list.xls')
 
G

Gabriel Genellina

Hello. I am getting the error that is displayed below, and I know
exactly why it occurs. I posted some of my program's code below, and if
you look at it you will see that the error terminates the program
pre-maturely. Becasue of this pre-mature termination, the program is
not able to execute it's final line of code, which is a very important
line. The last line saves the Excel spreadsheet. So is there a way to
make sure the last line executes? Thanks in advanced for all of the
help. Thank you.

You want a try/finally block.
Read the Python Tutorial: http://docs.python.org/tut/node10.html



Gabriel Genellina
Softlab SRL





__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas
 
J

John Machin

Tempo said:
Hello. I am getting the error that is displayed below, and I know
exactly why it occurs. I posted some of my program's code below, and if
you look at it you will see that the error terminates the program
pre-maturely. Becasue of this pre-mature termination, the program is
not able to execute it's final line of code, which is a very important
line. The last line saves the Excel spreadsheet. So is there a way to
make sure the last line executes? Thanks in advanced for all of the
help. Thank you.


Error
####

IndexError: list index out of range
[snip]

Hi, Tempo, nice to see xlrd getting some usage :)

Are you sure that you *really* want to save a spreadsheet written by
your buggy program??? It is better to fix errors, rather than ignore
them.

However, as you asked how: To ensure that cleanup code is executed no
matter what happens, use try/except/finally as in the following
example.

HTH,
John


C:\junk>type tempo.py
import sys

def main(n):
return (10 ** n) / n

try:
try:
print "executing the application ..."
n = int(sys.argv[1])
if n < 0:
# simulate something nasty happening ...
import nosuchmodule
else:
x = main(n)
print "app completed normally: %r" % x
except KeyboardInterrupt:
# need this to escape when n is large
print "kbd interrupt ...."
raise
except ZeroDivisionError:
print "doh!"
raise
finally:
# code to be executed no matter what happens
print "finally ... cleaning up"

C:\junk>tempo.py 0
executing the application ...
doh!
finally ... cleaning up
Traceback (most recent call last):
File "C:\junk\tempo.py", line 14, in ?
x = main(n)
File "C:\junk\tempo.py", line 4, in main
return (10 ** n) / n
ZeroDivisionError: integer division or modulo by zero

C:\junk>tempo.py -1
executing the application ...
finally ... cleaning up
Traceback (most recent call last):
File "C:\junk\tempo.py", line 12, in ?
import nosuchmodule
ImportError: No module named nosuchmodule

C:\junk>tempo.py 3
executing the application ...
app completed normally: 333
finally ... cleaning up

C:\junk>tempo.py 10000000000
executing the application ...
kbd interrupt ....
finally ... cleaning up
Traceback (most recent call last):
File "C:\junk\tempo.py", line 14, in ?
x = main(n)
File "C:\junk\tempo.py", line 4, in main
return (10 ** n) / n
KeyboardInterrupt

C:\junk>
 
T

Tempo

Thanks for all of the help. It all has been very useful to an new
python programmer. I agree that I should fix the error/bug instead of
handeling it with a try/etc. However, I do not know why
"range(sh.nrows)" never gets the right amount of rows right. For
example, if the Excel sheet has 10 rows with data in them, the
statement "range(sh.nrows)" should build the list of numbers [0,
1,...9]. It should, but it doesn't do that. What it does is buld a list
from [0, 1...20] or more or a little less, but the point is that it
always grabs empy rows after the last row containing data. Why is that?
I have no idea why, but I do know that that is what is producing the
error I am getting. Thanks again for the responses that I have received
already, and again thanks for any further help. Thanks you.
 
S

Steve Lianoglou

if the Excel sheet has 10 rows with data in them, the
statement "range(sh.nrows)" should build the list of numbers [0,
1,...9]. It should, but it doesn't do that. What it does is buld a list
from [0, 1...20] or more or a little less, but the point is that it
always grabs empy rows after the last row containing data. Why is that?

Just a stab in the dark, but maybe there's some rogue whitespace in
some cell that's in a rowyou think is empty?

You could try to just select out a (small) region of your data, copy it
and paste it into a new spreadhseet to see if you're still getting the
problem.

-steve
 
J

John Machin

Tempo said:
Thanks for all of the help. It all has been very useful to an new
python programmer. I agree that I should fix the error/bug instead of
handeling it with a try/etc. However, I do not know why
"range(sh.nrows)" never gets the right amount of rows right. For
example, if the Excel sheet has 10 rows with data in them, the
statement "range(sh.nrows)" should build the list of numbers [0,
1,...9]. It should, but it doesn't do that. What it does is buld a list
from [0, 1...20] or more or a little less, but the point is that it
always grabs empy rows after the last row containing data. Why is that?
I have no idea why, but I do know that that is what is producing the
error I am getting. Thanks again for the responses that I have received
already, and again thanks for any further help. Thanks you.

So the xlrd package's Book.Sheet.nrows allegedly "never gets the right
amount of rows right"? Never?? Before making such rash statements in a
public forum [1], you might like to check exactly what you have in your
file. Here's how:

(1) Using OpenOffice.org Calc or Gnumeric (or Excel if you must), open
yourfile.xls and save it as yourfile.csv. Inspect yourfile.csv

(2) Use the runxlrd script that's supplied with xlrd:

runxlrd.py show yourfile.xls >yourfile_show.txt

Inspect yourfile_show.txt. You'll see things like:
cell A23: type=1, data: u'ascii'
cell B23: type=0, data: ''
cell C23: type=1, data: u'123456'
cell D23: type=0, data: ''
cell E23: type=4, data: 0
The cell-type numbers are in the docs, but briefly: 0 is empty cell, 1
is text, 2 is number, 3 is date, 4 is boolean, 5 is error. If you find
only type=0 in the last row, then indeed you have found a bug and
should report it to the package author (together with a file that
exhibits the problem).

You are likely to find that there are cells containing zero-length
strings, or strings that contain spaces. They *do* contain data, as
opposed to empty cells.

[1] There's a possibility that the package's author reads this
newsgroup, and I've heard tell that he's a cranky old so-and-so; you
wouldn't want him to take umbrage, would you?

HTH,
John
 
T

Tempo

John Machin thanks for all of your help, and I take responsibility for
the way I worded my sentences in my last reply to this topic. So in an
effort to say sorry, I want to make it clear to everybody that it seems
as though errors in my code and use of external programs (Excel in
particular) are making "range(sh.nrows)" have faulty results. I am
trying to pinpoint the spot in my code or use of Excel, before
"range(sh.nrows) is executed, that is bugged. John Machin, I am
thrilled that the package xlrd exists at all because it simplifies a
daunting task for a beginner programer--me. Its uses are not bound to
beginners either. So thanks for the package and your help to this point.
 
S

Steve Holden

John Machin wrote:
[...]
[1] There's a possibility that the package's author reads this
newsgroup, and I've heard tell that he's a cranky old so-and-so; you
wouldn't want him to take umbrage, would you?
Cranks doesn't even *begin* to describe it ...

regards
Steve
 
J

John Machin

John Machin thanks for all of your help, and I take responsibility for
the way I worded my sentences in my last reply to this topic. So in an
effort to say sorry, I want to make it clear to everybody that it seems
as though errors in my code and use of external programs (Excel in
particular) are making "range(sh.nrows)" have faulty results. I am
trying to pinpoint the spot in my code or use of Excel, before
"range(sh.nrows) is executed, that is bugged. John Machin, I am
thrilled that the package xlrd exists at all because it simplifies a
daunting task for a beginner programer--me. Its uses are not bound to
beginners either. So thanks for the package and your help to this point.

I'm sorry, too: I should have wrapped my post in <humour> ... </humour>
tags> :)

Of course it's up to you to decide the criteria for filtering out
accidental non-data from your spreadsheet. Note that this phenomenon is
not restricted to spreadsheets; one often sees text data files with
blank or empty lines on the end -- one's app just has to cope with that.

Here's an example of a function that will classify a bunch of cells for you:

def usefulness_of_cells(cells):
"""Score each cell:
as 0 if empty,
as 1 if zero-length text,
as 2 if text and value.isspace() is true,
otherwise as 3.
Return the highest score found.
"""
score = 0
for cell in cells:
if cell.ctype == xlrd.XL_CELL_EMPTY:
continue
if cell.ctype == xlrd.XL_CELL_TEXT:
if not cell.value:
if not score:
score = 1
continue
if cell.value.isspace():
score = 2
continue
return 3
return score

and here's an example of using it:

def number_of_good_rows(sheet):
"""Return 1 + the index of the last row with meaningful data in it."""
for rowx in xrange(sheet.nrows - 1, -1, -1):
score = usefulness_of_cells(sheet.row(rowx))
if score == 3:
return rowx+1
return 0

A note on using the isspace() method: ensure that you use it on
cell.value (which is Unicode), not on an 8-bit encoding (especially if
your locale is set to the default ("C")).

| >>> '\xA0'.isspace()
| False
| >>> u'\xA0'.isspace()
| True
| >>> import unicodedata as ucd
| >>> ucd.name(u'\xA0')
| 'NO-BREAK SPACE'

You can get these in spreadsheets when folk paste in stuff off a web
page that uses &nbsp; as padding (because HTML trims out
leading/trailing/multiple instances of SPACE). Puzzled the heck out of
me the first time I encountered it until I did:
print repr(data_that_the_users_were_shrieking_about)

Here's a tip: repr() in Python and "View > Page Source" in Firefox come
in very handy when you have "what you see is not what you've got" problems.

Anyway, I'll add something like the above functions in an examples
directory in the next release of xlrd (which is at alpha stage right
now). I'll also add in a Q&A section in the docs, starting with "Why
does xlrd report more rows than I see on the screen?" -- so do let us
know what you find down the end of your spreadsheet, in case it's a
strange beast that hasn't been seen before.

HTH,
John
 
T

Tempo

It worked. Those two functions (usefulness_of_cells &
number_of_good_rows) seem to work flawlessly...knock on wood. I have
run a number of different Excel spreadsheets through the functions, and
so far the functions have a 100% acuracy rating. The acuracy rating is
based on the functions' returned number of cells containing text,
excluding a space as text, against the actual, hand counted number of
cells with text. Thank you John Machin for all of your help. I am using
these two functions, with your name tagged to them both. Let me know if
that's a problem. Thank you again.
 
J

John Machin

Tempo said:
It worked. Those two functions (usefulness_of_cells &
number_of_good_rows) seem to work flawlessly...knock on wood. I have
run a number of different Excel spreadsheets through the functions, and
so far the functions have a 100% acuracy rating. The acuracy rating is
based on the functions' returned number of cells containing text,
excluding a space as text, against the actual, hand counted number of
cells with text.

So your worksheet(s) did have rows at the end with cells with spaces in
them?
Thank you John Machin for all of your help. I am using
these two functions, with your name tagged to them both. Let me know if
that's a problem. Thank you again.

Not a problem; like I said, I'll put those functions in the next
release as examples.

Cheers,
John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,479
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top