How do I read Excel file in Python?

K

kath

How do I read an Excel file in Python?

I have found a package to read excel file, which can be used on any
platform.

http://www.lexicon.net/sjmachin/xlrd.htm
I installed and working on the examples, I found its printing of cell's
contents in a different manner.
print sh.row(rx)
[text:u'name', text:u'address', text:u'ph']
[text:u'sudhir', text:u'bangalore', number:1234.0]
[text:u'vinay', text:u'bangalore', number:3264.0]

I am bit confused with slicing. help me....


Thank you,
Regards,
Sudhir.
 
J

John Machin

kath said:
How do I read an Excel file in Python?

I have found a package to read excel file, which can be used on any
platform.

Hi Sudhir,
So far, so good :)
http://www.lexicon.net/sjmachin/xlrd.htm
I installed and working on the examples, I found its printing of cell's
contents in a different manner.
print sh.row(rx)
[text:u'name', text:u'address', text:u'ph']
[text:u'sudhir', text:u'bangalore', number:1234.0]
[text:u'vinay', text:u'bangalore', number:3264.0]

It helps when asking questions if you copy/paste exactly what is on
your screen;
in this case
print sh.row(rx)
would have given an error; you must have typed
for rx in range.....

A row is returned as a sequence of Cell objects. What you are seeing is
Python automatically doing repr(cell) on each cell in the row. The
Cell.__repr__ method formats it that way for debugging. Here are some
examples from a little test file of mine:
import xlrd
bk = xlrd.open_workbook('sjm1.xls')
sh = bk.sheet_by_index(0)
row0 = sh.row(0)
row0 [text:u'fubar', number:1.0, number:2.0]
firstcell = row0[0]
type(firstcell)
firstcell.ctype 1
# cell type 1 is text
firstcell.value u'fubar'
repr(firstcell)
"text:u'fubar'"


I am bit confused with slicing. help me....
None of the above is anything to do with slicing; is this a 2nd
problem?

Perhaps you are having trouble with this:Help on method row_slice in module xlrd.sheet:

row_slice(self, rowx, start_colx=0, end_colx=None) method of
xlrd.sheet.Sheet instance
##
# Returns a slice of the Cell objects in the given row.
sh.row_slice(rowx, lo, hi) gives the same result as sh.row(rowx)[lo:hi]
-- it is provided because the latter would be inefficient for getting a
small slice from a long row.

If you are having trouble with the general concept of slicing, perhaps
you might like to try the Python tutorial. Otherwise, please try to be
a bit more specific about what the confusion is.

HTH, and e-mail me if you prefer ...

Cheers,
John
 
K

kath

John said:
kath said:
How do I read an Excel file in Python?

I have found a package to read excel file, which can be used on any
platform.

Hi Sudhir,
So far, so good :)
http://www.lexicon.net/sjmachin/xlrd.htm
I installed and working on the examples, I found its printing of cell's
contents in a different manner.
import xlrd
book=xlrd.open_workbook("Calculation_file.xls")
book=xlrd.open_workbook("testbook.xls")
sh=book.sheet_by_index(0)
for row in range(sh.nrows):
print sh.row(rx)
[text:u'name', text:u'address', text:u'ph']
[text:u'sudhir', text:u'bangalore', number:1234.0]
[text:u'vinay', text:u'bangalore', number:3264.0]

It helps when asking questions if you copy/paste exactly what is on
your screen;
in this case
print sh.row(rx)
would have given an error; you must have typed
for rx in range.....

A row is returned as a sequence of Cell objects. What you are seeing is
Python automatically doing repr(cell) on each cell in the row. The
Cell.__repr__ method formats it that way for debugging. Here are some
examples from a little test file of mine:
import xlrd
bk = xlrd.open_workbook('sjm1.xls')
sh = bk.sheet_by_index(0)
row0 = sh.row(0)
row0 [text:u'fubar', number:1.0, number:2.0]
firstcell = row0[0]
type(firstcell)
firstcell.ctype 1
# cell type 1 is text
firstcell.value u'fubar'
repr(firstcell)
"text:u'fubar'"


I am bit confused with slicing. help me....
None of the above is anything to do with slicing; is this a 2nd
problem?

Perhaps you are having trouble with this:Help on method row_slice in module xlrd.sheet:

row_slice(self, rowx, start_colx=0, end_colx=None) method of
xlrd.sheet.Sheet instance
##
# Returns a slice of the Cell objects in the given row.
sh.row_slice(rowx, lo, hi) gives the same result as sh.row(rowx)[lo:hi]
-- it is provided because the latter would be inefficient for getting a
small slice from a long row.

If you are having trouble with the general concept of slicing, perhaps
you might like to try the Python tutorial. Otherwise, please try to be
a bit more specific about what the confusion is.

HTH, and e-mail me if you prefer ...

Cheers,
John


Hi, thanks for the reply. I just took some time reading help file and
came to know to there is nothing do with slicing. But I do have a
problem with date field in the excel file.

the date( 8/9/2006 ) in Excel file, i am getting the value as 38938.0,
which I get when I convert date values to general format in Excel. I
want the actual date value. How do get that?


Thank you.
regards,
Sudhir.
 
M

Matimus

the date( 8/9/2006 ) in Excel file, i am getting the value as 38938.0,
which I get when I convert date values to general format in Excel. I
want the actual date value. How do get that?

38938 appears to be the date in days since 1/1/1900. I'm sure someone
can help you figure out how to convert that to a more useful value.

-Matt
 
S

Steve Holden

Matimus said:
38938 appears to be the date in days since 1/1/1900. I'm sure someone
can help you figure out how to convert that to a more useful value.

I believe the win32all extension contains functionality to help with
that, thought it's a long time since I even ran PythonWin (sorry, Mark).

regards
Steve
 
S

Simon Brunning

38938 appears to be the date in days since 1/1/1900. I'm sure someone
can help you figure out how to convert that to a more useful value.
datetime.date(2006, 8, 11)
 
H

houdinihound

excel_date = 38938.0
Err, that's the wrong answer, isn't it? Perhaps it shoud be
datetime.date(1900, 1, 29)?

Actually was about to post same solution and got same results. (BTW
Simon, the OP date is Aug 9th, 2006). Scratched head and googled for
excel date calculations... found this bug where it treats 1900 as leap
year incorrectly:
http://www.ozgrid.com/Excel/ExcelDateandTimes.htm

Plus it treats 1 jan 1900 as day 1, not 0 so just subtract 2 in the
calc:datetime.date(2006, 8, 9)

HTH.
 
J

John Machin

Actually was about to post same solution and got same results. (BTW
Simon, the OP date is Aug 9th, 2006). Scratched head and googled for
excel date calculations... found this bug where it treats 1900 as leap
year incorrectly:
http://www.ozgrid.com/Excel/ExcelDateandTimes.htm

Plus it treats 1 jan 1900 as day 1, not 0 so just subtract 2 in the
calc:
datetime.date(2006, 8, 9)

.... and 2006-08-09 is only the correct answer if the spreadsheet was,
when last saved, using the 1900 ("Windows") date system, not the 1904
("Macintosh") date system.

All the OP needs to do is to read the documentation that comes with the
xlrd package. It describes the problems with Excel dates, and offers
functions for conversion between the Excel date numbers and (year,
month, day, hour, minute, second) tuples which of course are
interoperable with Python's datetime module and with mx.DateTime.

| >>> import xlrd
| >>> xlrd.xldate_as_tuple(38938.0, 0)
| (2006, 8, 9, 0, 0, 0)
| >>> xlrd.xldate_as_tuple(38938.0, 1)
| (2010, 8, 10, 0, 0, 0)
| >>>

Cheers,
John
 
S

Simon Brunning

Actually was about to post same solution and got same results. (BTW
Simon, the OP date is Aug 9th, 2006). Scratched head and googled for
excel date calculations... found this bug where it treats 1900 as leap
year incorrectly:
http://www.ozgrid.com/Excel/ExcelDateandTimes.htm

Ah - I was reading the OP's 8/9/2006 in the European way - DD/MM/YYYY.

One might argue over whether DD/MM/YYYY or MM/DD/YYYY are more
rational, but I find it best to avoid *both* those formats, 'cos they
are so easily confused.
 
G

Giles Brown

John said:
... and 2006-08-09 is only the correct answer if the spreadsheet was,
when last saved, using the 1900 ("Windows") date system, not the 1904
("Macintosh") date system.

John,
Just for me own curiosity, is this Excel concept of date numbers same
as the OLE
concept (see http://msdn2.microsoft.com/en-us/library/82ab7w69.aspx or
search "MFC DATE" on MSDN).

I put in some test cases for conversion code here:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/496683

But would be interested to add any additional info on variations on
this theme.

Cheers,
Giles
 
K

kath

John said:
... and 2006-08-09 is only the correct answer if the spreadsheet was,
when last saved, using the 1900 ("Windows") date system, not the 1904
("Macintosh") date system.

All the OP needs to do is to read the documentation that comes with the
xlrd package. It describes the problems with Excel dates, and offers
functions for conversion between the Excel date numbers and (year,
month, day, hour, minute, second) tuples which of course are
interoperable with Python's datetime module and with mx.DateTime.

| >>> import xlrd
| >>> xlrd.xldate_as_tuple(38938.0, 0)
| (2006, 8, 9, 0, 0, 0)
| >>> xlrd.xldate_as_tuple(38938.0, 1)
| (2010, 8, 10, 0, 0, 0)
| >>>

Cheers,
John


Hi,

Traceback (most recent call last):
File "D:\Python23\Testing area\Python and Excel\xlrdRead.py", line
30, in ?
temp=xlrd.xldate_as_tuple(sh.cell_value(rowx=r,colx=c),0)
File "D:\PYTHON23\Lib\site-packages\xlrd\xldate.py", line 61, in
xldate_as_tuple
xldays = int(xldate)
ValueError: invalid literal for int(): Date

because xlrd.xldate_as_tuple() function expects first argument to be an
integer. How do I convert an unicode character to integer, so that I
could get the date using xlrd.xldate_as_tuple() function.

Thank you,
sudhir.
 
J

John Machin

Giles said:
John,
Just for me own curiosity, is this Excel concept of date numbers same
as the OLE
concept (see http://msdn2.microsoft.com/en-us/library/82ab7w69.aspx or
search "MFC DATE" on MSDN).

I put in some test cases for conversion code here:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/496683

But would be interested to add any additional info on variations on
this theme.

Hi Giles,

Those OLE date numbers coincide with Excel 1900-system date numbers
from 1900-03-01 onwards. Excel treats day 60 as the non-existent
1900-02-29.

Cheers,
John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top