Parsing Excel spreadsheets

Discussion in 'Python' started by andyhume@gmail.com, Dec 30, 2008.

  1. Guest

    Hi,

    Can anybody recommend an approach for loading and parsing Excel
    spreadsheets in Python. Any well known/recommended libraries for this?

    The only thing I found in a brief search was http://www.lexicon.net/sjmachin/xlrd.htm,
    but I'd rather get some more input before going with something I don't
    know.

    Thanks,
    Andy.
     
    , Dec 30, 2008
    #1
    1. Advertising

  2. r Guest

    On Dec 30, 10:07 am, "" <> wrote:
    > Hi,
    >
    > Can anybody recommend an approach for loading and parsing Excel
    > spreadsheets in Python. Any well known/recommended libraries for this?
    >
    > The only thing I found in a brief search washttp://www.lexicon.net/sjmachin/xlrd.htm,
    > but I'd rather get some more input before going with something I don't
    > know.
    >
    > Thanks,
    > Andy.


    xlrd(read only)
    also see pyExcelerator
     
    r, Dec 30, 2008
    #2
    1. Advertising

  3. Steve Holden Guest

    Tino Wildenhain wrote:
    > r wrote:
    >> On Dec 30, 10:07 am, "" <> wrote:
    >>> Hi,
    >>>
    >>> Can anybody recommend an approach for loading and parsing Excel
    >>> spreadsheets in Python. Any well known/recommended libraries for this?
    >>>
    >>> The only thing I found in a brief search
    >>> washttp://www.lexicon.net/sjmachin/xlrd.htm,
    >>> but I'd rather get some more input before going with something I don't
    >>> know.
    >>>
    >>> Thanks,
    >>> Andy.

    >>
    >> xlrd(read only)
    >> also see pyExcelerator

    >
    > which is now replaced by xlwt :)
    >

    I've had very good results with xlrd, having used it to extract data
    from spreadsheets for storage into a relational database. It's also
    cross-platform, which means you don't need to run under Windows.

    I don't know anything at all about xlwt.

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    Holden Web LLC http://www.holdenweb.com/
     
    Steve Holden, Dec 30, 2008
    #3
  4. On Dec 30, 10:07 am, "" <> wrote:
    > Hi,
    >
    > Can anybody recommend an approach for loading and parsing Excel
    > spreadsheets in Python. Any well known/recommended libraries for this?
    >
    > The only thing I found in a brief search washttp://www.lexicon.net/sjmachin/xlrd.htm,
    > but I'd rather get some more input before going with something I don't
    > know.
    >
    > Thanks,
    > Andy.


    If you need complete control of Excel, then you'll probably have to
    use COM (see Mark Hammond's PyWin32 package). Otherwise, the
    suggestions by the others will work.

    Mike
     
    Mike Driscoll, Dec 30, 2008
    #4
  5. John Machin Guest

    On Dec 31, 5:48 am, Mike Driscoll <> wrote:
    > On Dec 30, 10:07 am, "" <> wrote:
    >
    > > Hi,

    >
    > > Can anybody recommend an approach for loading and parsing Excel
    > > spreadsheets in Python. Any well known/recommended libraries for this?

    >
    > > The only thing I found in a brief search was http://www.lexicon.net/sjmachin/xlrd.htm,
    > > but I'd rather get some more input before going with something I don't
    > > know.

    >
    > > Thanks,
    > > Andy.

    >
    > If you need complete control of Excel, then you'll probably have to
    > use COM (see Mark Hammond's PyWin32 package). Otherwise, the
    > suggestions by the others will work.


    For avoidance of doubt, the OP should presume that you mean the *nett*
    suggestions by the others i.e. after Tino struck out pyExcelerator.
     
    John Machin, Dec 30, 2008
    #5
  6. brooklineTom Guest

    wrote:
    > Hi,
    >
    > Can anybody recommend an approach for loading and parsing Excel
    > spreadsheets in Python. Any well known/recommended libraries for this?
    >
    > The only thing I found in a brief search was http://www.lexicon.net/sjmachin/xlrd.htm,
    > but I'd rather get some more input before going with something I don't
    > know.
    >
    > Thanks,
    > Andy.


    I save the spreadsheets (in Excel) in xml format. I started with the
    standard xml tools (xml.dom and xml.dom.minidom). I built a
    pullparser, and then just crack them. The MS format is tedious and
    overly complex (like all MS stuff), but straightforward. Once I've
    cracked them into their component parts (headers, rows, cells, etc),
    then I walk through them doing whatever I want.

    I found this material to be no worse than doing similar crud with
    xhtml. I know there are various python packages around that do it, but
    I found the learning curve of those packages to be steeper than just
    grokking the spreadsheet structure itself.

    In spite of all the hair, the underlying MS structure really does have
    everything you'll need. My suggestion is just go for it, it isn't all
    that hard.
     
    brooklineTom, Dec 31, 2008
    #6
  7. John Machin Guest

    On Dec 31 2008, 4:02 pm, brooklineTom <> wrote:
    > wrote:
    > > Hi,

    >
    > > Can anybody recommend an approach for loading and parsing Excel
    > > spreadsheets in Python. Any well known/recommended libraries for this?

    >
    > > The only thing I found in a brief search washttp://www.lexicon.net/sjmachin/xlrd.htm,
    > > but I'd rather get some more input before going with something I don't
    > > know.

    >
    > > Thanks,
    > > Andy.

    >
    > I save the spreadsheets (in Excel) in xml format.


    Which means that you need to be on a Windows box with a licensed copy
    of Excel. I presume you talking about using Excel 2003 and saving as
    "XML Spreadsheet (*.xml)". Do you save the files manually, or using a
    COM script? What is the largest xls file that you've saved as xml, how
    big was the xml file, and how long did it take to parse the xml file?
    Do you extract formatting information or just cell contents?

    > I started with the
    > standard xml tools (xml.dom and xml.dom.minidom). I built a
    > pullparser, and then just crack them. The MS format is tedious and
    > overly complex (like all MS stuff), but straightforward.


    What do you think of the xml spat out by Excel 2007's (default) xlsx
    format?

    > Once I've
    > cracked them into their component parts (headers, rows, cells, etc),
    > then I walk through them doing whatever I want.
    >
    > I found this material to be no worse than doing similar crud with
    > xhtml. I know there are various python packages around that do it, but
    > I found the learning curve of those packages to be steeper than just
    > grokking the spreadsheet structure itself.


    I'm curious to know which are the "various python packages" with the
    so steep learning curves, and what the steep bits were.

    Cheers,
    John
     
    John Machin, Dec 31, 2008
    #7
  8. Joshua Moore Guest

    On Dec 30 2008, 8:07 am, "" <>
    wrote:
    > Hi,
    >
    > Can anybody recommend an approach for loading and parsing Excel
    > spreadsheets in Python. Any well known/recommended libraries for this?
    >
    > The only thing I found in a brief search washttp://www.lexicon.net/sjmachin/xlrd.htm,
    > but I'd rather get some more input before going with something I don't
    > know.
    >
    > Thanks,
    > Andy.


    Is saving the spreadsheet as a comma separated file an option for you?
    If so, it's not hard to throw something together for cvs files, but
    there are even modules for that.

    Josh
     
    Joshua Moore, Jan 1, 2009
    #8
  9. brooklineTom Guest

    On Dec 31 2008, 9:56 am, John Machin <> wrote:
    > On Dec 31 2008, 4:02 pm, brooklineTom <> wrote:
    >
    > > wrote:
    > > > Hi,

    >
    > > > Can anybody recommend an approach for loading and parsing Excel
    > > > spreadsheets in Python. Any well known/recommended libraries for this?

    >
    > > > The only thing I found in a brief search washttp://www.lexicon.net/sjmachin/xlrd.htm,
    > > > but I'd rather get some more input before going with something I don't
    > > > know.

    >
    > > > Thanks,
    > > > Andy.

    >
    > > I save the spreadsheets (in Excel) in xml format.

    >
    > Which means that you need to be on a Windows box with a licensed copy
    > of Excel. I presume you talking about using Excel 2003 and saving as
    > "XML Spreadsheet (*.xml)". Do you save the files manually, or using a
    > COM script? What is the largest xls file that you've saved as xml, how
    > big was the xml file, and how long did it take to parse the xml file?
    > Do you extract formatting information or just cell contents?


    1. The OP requested Excel files, by construction those must be
    generated with a licensed copy of Excel. I did the actual processing
    on both linux and windoze platforms.
    2. Yes, I used Excel 2003. I haven't looked at later versions.
    3. The largest file I used was about 228M, containing 36,393 hotel
    properties from Commission Junction. Each entry had 113 cells. The
    parsing overhead was minimal (on a per-entry basis) -- that's why I
    choose to use a pull-parser.
    4. I extracted primarily cell contents, though I did some very limited
    format handling (looking for non-text fields and such).

    > > I started with the
    > > standard xml tools (xml.dom and xml.dom.minidom). I built a
    > > pullparser, and then just crack them. The MS format is tedious and
    > > overly complex (like all MS stuff), but straightforward.

    >
    > What do you think of the xml spat out by Excel 2007's (default) xlsx
    > format?


    I haven't looked at this.

    > > Once I've
    > > cracked them into their component parts (headers, rows, cells, etc),
    > > then I walk through them doing whatever I want.

    >
    > > I found this material to be no worse than doing similar crud with
    > > xhtml. I know there are various python packages around that do it, but
    > > I found the learning curve of those packages to be steeper than just
    > > grokking the spreadsheet structure itself.

    >
    > I'm curious to know which are the "various python packages" with the
    > so steep learning curves, and what the steep bits were.


    I looked, briefly, at xlrd. I found and scanned a few alternatives,
    though I don't remember what the others were. I needed something I
    could incorporate into my own application framework, and I knew I
    didn't need most of the formatting information. I'm not in any way
    criticizing xlrd, it's simply that, based on its API summary, it seems
    focused on problems I didn't have to solve. I knew that I needed only
    a small subset of the xlrd behavior, and I concluded (perhaps
    incorrectly) that it would be easier to roll my own parser than find,
    extract, and then port (to my own framework) the corresponding parts
    of xlrd.

    I needed to extract the content of each row, cell by cell, and build
    data objects (in my framework) with the content of various cells. I
    also needed to build an "exception file" containing malformed entries
    that I could re-open with Excel after my code finished, so that the
    bogus entries could be manually corrected. What I mean by "malformed"
    entry is, for example, an address field that fails to correctly
    geocode or comment fields with confused utf8/unicode contents. My
    focus was on data content, as opposed to presentation. I needed to
    crack the cells into things like "string", "boolean", "float", and so
    on.

    Most importantly, I needed to do this one entry at a time -- I did
    *not* want to load the entire spreadsheet at once.

    I'm not saying that this couldn't be done with xlrd; only that I chose
    to roll my own and had minimal difficulty doing so.

    I hope this helps!
     
    brooklineTom, Jan 2, 2009
    #9
  10. Steve Holden Guest

    brooklineTom wrote:
    > On Dec 31 2008, 9:56 am, John Machin <> wrote:
    >> On Dec 31 2008, 4:02 pm, brooklineTom <> wrote:
    >>
    >>> wrote:
    >>>> Hi,
    >>>> Can anybody recommend an approach for loading and parsing Excel
    >>>> spreadsheets in Python. Any well known/recommended libraries for this?
    >>>> The only thing I found in a brief search washttp://www.lexicon.net/sjmachin/xlrd.htm,
    >>>> but I'd rather get some more input before going with something I don't
    >>>> know.
    >>>> Thanks,
    >>>> Andy.
    >>> I save the spreadsheets (in Excel) in xml format.

    >> Which means that you need to be on a Windows box with a licensed copy
    >> of Excel. I presume you talking about using Excel 2003 and saving as
    >> "XML Spreadsheet (*.xml)". Do you save the files manually, or using a
    >> COM script? What is the largest xls file that you've saved as xml, how
    >> big was the xml file, and how long did it take to parse the xml file?
    >> Do you extract formatting information or just cell contents?

    >
    > 1. The OP requested Excel files, by construction those must be
    > generated with a licensed copy of Excel. I did the actual processing
    > on both linux and windoze platforms.


    Well, even if Andy meant "Excel files" rather than "Excel-formatted
    files" there are many ways to come by these without having a licensed
    copy of Excel. FTP and email attachment come to mind most readily.

    How then to convert those to XML without Excel?

    [...]> I looked, briefly, at xlrd. I found and scanned a few alternatives,
    > though I don't remember what the others were. I needed something I
    > could incorporate into my own application framework, and I knew I
    > didn't need most of the formatting information. I'm not in any way
    > criticizing xlrd, it's simply that, based on its API summary, it seems
    > focused on problems I didn't have to solve. I knew that I needed only
    > a small subset of the xlrd behavior, and I concluded (perhaps
    > incorrectly) that it would be easier to roll my own parser than find,
    > extract, and then port (to my own framework) the corresponding parts
    > of xlrd.
    >

    My own case was similar, in that I only needed the value data. The
    approach I took was to install xlrd and use it. Job done.

    > I needed to extract the content of each row, cell by cell, and build
    > data objects (in my framework) with the content of various cells. I
    > also needed to build an "exception file" containing malformed entries
    > that I could re-open with Excel after my code finished, so that the
    > bogus entries could be manually corrected. What I mean by "malformed"
    > entry is, for example, an address field that fails to correctly
    > geocode or comment fields with confused utf8/unicode contents. My
    > focus was on data content, as opposed to presentation. I needed to
    > crack the cells into things like "string", "boolean", "float", and so
    > on.
    >
    > Most importantly, I needed to do this one entry at a time -- I did
    > *not* want to load the entire spreadsheet at once.
    >

    My data files weren't that large (IIRC the largest spreadsheet was about
    6MB), so I was quite happy to load the whole thing in memory, iterate
    over it and then write the results to the database as they were extracted.

    > I'm not saying that this couldn't be done with xlrd; only that I chose
    > to roll my own and had minimal difficulty doing so.
    >

    Given the constraints of your problem it seems like an intelligent approach.

    > I hope this helps!


    I'm sure it will.

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    Holden Web LLC http://www.holdenweb.com/
     
    Steve Holden, Jan 2, 2009
    #10
  11. Steve Holden Guest

    brooklineTom wrote:
    > On Dec 31 2008, 9:56 am, John Machin <> wrote:
    >> On Dec 31 2008, 4:02 pm, brooklineTom <> wrote:
    >>
    >>> wrote:
    >>>> Hi,
    >>>> Can anybody recommend an approach for loading and parsing Excel
    >>>> spreadsheets in Python. Any well known/recommended libraries for this?
    >>>> The only thing I found in a brief search washttp://www.lexicon.net/sjmachin/xlrd.htm,
    >>>> but I'd rather get some more input before going with something I don't
    >>>> know.
    >>>> Thanks,
    >>>> Andy.
    >>> I save the spreadsheets (in Excel) in xml format.

    >> Which means that you need to be on a Windows box with a licensed copy
    >> of Excel. I presume you talking about using Excel 2003 and saving as
    >> "XML Spreadsheet (*.xml)". Do you save the files manually, or using a
    >> COM script? What is the largest xls file that you've saved as xml, how
    >> big was the xml file, and how long did it take to parse the xml file?
    >> Do you extract formatting information or just cell contents?

    >
    > 1. The OP requested Excel files, by construction those must be
    > generated with a licensed copy of Excel. I did the actual processing
    > on both linux and windoze platforms.


    Well, even if Andy meant "Excel files" rather than "Excel-formatted
    files" there are many ways to come by these without having a licensed
    copy of Excel. FTP and email attachment come to mind most readily.

    How then to convert those to XML without Excel?

    [...]> I looked, briefly, at xlrd. I found and scanned a few alternatives,
    > though I don't remember what the others were. I needed something I
    > could incorporate into my own application framework, and I knew I
    > didn't need most of the formatting information. I'm not in any way
    > criticizing xlrd, it's simply that, based on its API summary, it seems
    > focused on problems I didn't have to solve. I knew that I needed only
    > a small subset of the xlrd behavior, and I concluded (perhaps
    > incorrectly) that it would be easier to roll my own parser than find,
    > extract, and then port (to my own framework) the corresponding parts
    > of xlrd.
    >

    My own case was similar, in that I only needed the value data. The
    approach I took was to install xlrd and use it. Job done.

    > I needed to extract the content of each row, cell by cell, and build
    > data objects (in my framework) with the content of various cells. I
    > also needed to build an "exception file" containing malformed entries
    > that I could re-open with Excel after my code finished, so that the
    > bogus entries could be manually corrected. What I mean by "malformed"
    > entry is, for example, an address field that fails to correctly
    > geocode or comment fields with confused utf8/unicode contents. My
    > focus was on data content, as opposed to presentation. I needed to
    > crack the cells into things like "string", "boolean", "float", and so
    > on.
    >
    > Most importantly, I needed to do this one entry at a time -- I did
    > *not* want to load the entire spreadsheet at once.
    >

    My data files weren't that large (IIRC the largest spreadsheet was about
    6MB), so I was quite happy to load the whole thing in memory, iterate
    over it and then write the results to the database as they were extracted.

    > I'm not saying that this couldn't be done with xlrd; only that I chose
    > to roll my own and had minimal difficulty doing so.
    >

    Given the constraints of your problem it seems like an intelligent approach.

    > I hope this helps!


    I'm sure it will.

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    Holden Web LLC http://www.holdenweb.com/
     
    Steve Holden, Jan 2, 2009
    #11
  12. John Machin Guest

    On Jan 3, 2:01 am, brooklineTom <> wrote:
    > On Dec 31 2008, 9:56 am, John Machin <> wrote:


    > > On Dec 31 2008, 4:02 pm, brooklineTom <> wrote:

    >
    > > > wrote:
    > > > > Hi,

    >
    > > > > Can anybody recommend an approach for loading and parsing Excel
    > > > > spreadsheets in Python. Any well known/recommended libraries for this?

    >
    > > > > The only thing I found in a brief search was http://www.lexicon.net/sjmachin/xlrd.htm,
    > > > > but I'd rather get some more input before going with something I don't
    > > > > know.

    >
    > > > > Thanks,
    > > > > Andy.

    >
    > > > I save the spreadsheets (in Excel) in xml format.

    >
    > > Which means that you need to be on a Windows box with a licensed copy
    > > of Excel. I presume you talking about using Excel 2003 and saving as
    > > "XML Spreadsheet (*.xml)". Do you save the files manually, or using a
    > > COM script? What is the largest xls file that you've saved as xml, how
    > > big was the xml file, and how long did it take to parse the xml file?
    > > Do you extract formatting information or just cell contents?

    >
    > 1. The OP requested Excel files, by construction those must be
    > generated with a licensed copy of Excel. I did the actual processing
    > on both linux and windoze platforms.


    My point was that however the original XLS files were created or
    acquired, the first step in your solution involves converting the XLS
    file to "XML Spreadsheet" format, which requires a copy of Excel on a
    Windows box. Many people start with an XLS file, no Excel and no
    Windows box, no COM, and users who can't be relied on to open a file
    and save it in the right format with the right name and extension.

    BTW, did you consider opening the XLS files with OpenOffice.org's Calc
    and saving it in their default ods format (chunks of XML in a zip
    file)?


    > 3. The largest file I used was about 228M,


    Is that the XLS file or the XML file?

    > containing 36,393 hotel
    > properties from Commission Junction. Each entry had 113 cells. The
    > parsing overhead was minimal (on a per-entry basis) -- that's why I
    > choose to use a pull-parser.
    > 4. I extracted primarily cell contents, though I did some very limited
    > format handling (looking for non-text fields and such).


    I don't understand "looking for non-text fields" as "format handling".
    To my way of thinking, knowing the data-type that Excel has assigned
    to a cell is close to essential for effective use of the contents. And
    you don't have to look very far: the ss:Type attribute tells you
    whether a cell's content is String, Number, DateTime, Boolean, or
    Error.

    > > > xhtml. I know there are various python packages around that do it, but
    > > > I found the learning curve of those packages to be steeper than just
    > > > grokking the spreadsheet structure itself.

    >
    > > I'm curious to know which are the "various python packages" with the
    > > so steep learning curves, and what the steep bits were.

    >
    > I looked, briefly, at xlrd. I found and scanned a few alternatives,
    > though I don't remember what the others were. I needed something I
    > could incorporate into my own application framework, and I knew I
    > didn't need most of the formatting information. I'm not in any way
    > criticizing xlrd, it's simply that, based on its API summary, it seems
    > focused on problems I didn't have to solve.


    Not so focused at all. The default behaviour of xlrd is to ignore
    formatting info as much as possible. AFAICT this is the mode used by
    most users.

    > I knew that I needed only
    > a small subset of the xlrd behavior, and I concluded (perhaps
    > incorrectly) that it would be easier to roll my own parser than find,
    > extract, and then port (to my own framework) the corresponding parts
    > of xlrd.


    Possibly incorrectly. If approached at the time, I would have said:
    (a) if desperate to DIY:
    (a1) ignore any code for old Excel versions (self.biff_version < 80)
    (a2) ignore any code for extracting formatting info
    (self.formatting_info)
    (a3) find the Sheet.put_cell* methods in sheet.py e.g.

    def put_cell(self, rowx, colx, ctype, value, xf_index):

    ignore the xf_index arg and subvert them to your own needs instead of
    filling up a big rectangular arena with data

    (b) if not really so desperate, talk to me about implementing an
    option in xlrd where callers can specify a callback to be used instead
    of the Sheet.put_cell* methods

    (c) What is all this "port to my own framework" caper anyway? If you
    need to extract data from a database, do you rummage in their code
    libraries and port the relevant bits to your own framework?

    >
    > I needed to extract the content of each row, cell by cell, and build
    > data objects (in my framework) with the content of various cells. I
    > also needed to build an "exception file" containing malformed entries
    > that I could re-open with Excel after my code finished, so that the
    > bogus entries could be manually corrected. What I mean by "malformed"
    > entry is, for example, an address field that fails to correctly
    > geocode or comment fields with confused utf8/unicode contents.


    Building objects in your own framework and checking data integrity is
    something that happens *after* you've got the basics for a cell (row
    id, column id, data type, data value). You have dodgy postal
    addresses? You're not alone, and it's not relevant to how you parse
    the spreadsheet or even whether the data source was a spreadsheet or a
    database query or a box of punched cards.

    > My
    > focus was on data content, as opposed to presentation. I needed to
    > crack the cells into things like "string", "boolean", "float", and so
    > on.


    "Crack"? It's not the Enigma code. It's not even rot13. Spreadsheet
    XML tells you the type (String, Boolean, ...). xlrd tells you the type
    (XL_CELL_TEXT, XL_CELL_BOOLEAN, ...).

    >
    > Most importantly, I needed to do this one entry at a time -- I did
    > *not* want to load the entire spreadsheet at once.
    >
    > I'm not saying that this couldn't be done with xlrd; only that I chose
    > to roll my own and had minimal difficulty doing so.
    >
    > I hope this helps!


    I think so. Thanks. You didn't directly address the "steep learning
    curve" question, but you explained enough of where you came from.

    It's a matter of "horses for courses". I guess some people might
    regard (xml.dom, xml.dom.minidom, DIY "pullparser") as having a
    slightly non-horizontal learning curve :)

    Cheers,
    John
     
    John Machin, Jan 3, 2009
    #12
  13. brooklineTom Guest

    On Jan 2, 7:04 pm, John Machin <> wrote:
    > On Jan 3, 2:01 am, brooklineTom <> wrote:
    >
    > <snip>
    >
    > My point was that however the original XLS files were created or
    > acquired, the first step in your solution involves converting the XLS
    > file to "XML Spreadsheet" format, which requires a copy of Excel on a
    > Windows box. Many people start with an XLS file, no Excel and no
    > Windows box, no COM, and users who can't be relied on to open a file
    > and save it in the right format with the right name and extension.


    True enough. I develop on a WinXP box and have Excel. I just used it.

    > BTW, did you consider opening the XLS files with OpenOffice.org's Calc
    > and saving it in their default ods format (chunks of XML in a zip
    > file)?


    No. As I said, I have Excel.

    > > 3. The largest file I used was about 228M,

    >
    > Is that the XLS file or the XML file?


    That's the xml file size. I just *love* file-bloat, don't you? :)

    >
    > <snip>
    >


    > > I knew that I needed only
    > > a small subset of the xlrd behavior, and I concluded (perhaps
    > > incorrectly) that it would be easier to roll my own parser than find,
    > > extract, and then port (to my own framework) the corresponding parts
    > > of xlrd.

    >
    > Possibly incorrectly. If approached at the time, I would have said:
    > (a) if desperate to DIY:
    > (a1) ignore any code for old Excel versions (self.biff_version < 80)
    > (a2) ignore any code for extracting formatting info
    > (self.formatting_info)
    > (a3) find the Sheet.put_cell* methods in sheet.py e.g.
    >
    > def put_cell(self, rowx, colx, ctype, value, xf_index):
    >
    > ignore the xf_index arg and subvert them to your own needs instead of
    > filling up a big rectangular arena with data
    >
    > (b) if not really so desperate, talk to me about implementing an
    > option in xlrd where callers can specify a callback to be used instead
    > of the Sheet.put_cell* methods
    >
    > (c) What is all this "port to my own framework" caper anyway? If you
    > need to extract data from a database, do you rummage in their code
    > libraries and port the relevant bits to your own framework?


    I'm sure xlrd is fine, I wasn't any way suggesting that any
    "improvements" are needed. I was just explaining what I did.

    >
    > <snip>
    >


    > "Crack"? It's not the Enigma code. It's not even rot13. Spreadsheet
    > XML tells you the type (String, Boolean, ...). xlrd tells you the type
    > (XL_CELL_TEXT, XL_CELL_BOOLEAN, ...).


    Sure, once you've sorted through the excess MS-specific stuff, dealt
    with the runs of empty cells, and so on.

    I needed to write an exception file, containing the original
    spreadsheet entries that failed, so that our curators could find and
    fix (or delete) them. Thus, I already had to do row-by-row handling of
    the input, I had to catch and handle exceptions, I had to know and
    remember enough about the original SS format/layout to replicate it in
    the exception file, and so on.

    > I think so. Thanks. You didn't directly address the "steep learning
    > curve" question, but you explained enough of where you came from.
    >
    > It's a matter of "horses for courses". I guess some people might
    > regard (xml.dom, xml.dom.minidom, DIY "pullparser") as having a
    > slightly non-horizontal learning curve :)


    *LOL*

    Yes. Sadly, I had already climbed the dom/minidom/pullparser curve for
    all the other stuff I have to do. In the space I work in, xml handling
    is pervasive enough that I already have all that stuff "in my hands"
    anyway.

    I wonder if the OP ever read any of this.... :)

    Thx,
    Tom
     
    brooklineTom, Jan 7, 2009
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Raoul Snyman
    Replies:
    14
    Views:
    11,143
    Curt_C [MVP]
    Apr 29, 2004
  2. John Machin
    Replies:
    15
    Views:
    778
  3. spirit
    Replies:
    4
    Views:
    565
    spirit
    Aug 3, 2007
  4. Jeff

    DTS Package using Excel Spreadsheets

    Jeff, Aug 28, 2003, in forum: ASP General
    Replies:
    0
    Views:
    175
  5. Colin Walls

    Reading AND writing Excel spreadsheets

    Colin Walls, Apr 30, 2005, in forum: Perl Misc
    Replies:
    0
    Views:
    89
    Colin Walls
    Apr 30, 2005
Loading...

Share This Page