XML pickle

C

castironpi

Readability of the Pickle module. Can one export to XML, from cost of
speed and size, to benefit of user-readability?

It does something else: plus functions do not export their code,
either in interpreter instructions, or source, or anything else; and
classes do not export their dictionaries, just their names. But it
does export in ASCII.

Pickle checks any __safe_for_unpickling__ and __setstate__ methods,
which enable a little encapsulating, but don't go far.

At the other end of the spectrum, there is an externally-readable
datafile:

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook
xmlns="urn:schemas-microsoft-com:eek:ffice:spreadsheet"
xmlns:ss="urn:schemas-microsoft-com:eek:ffice:spreadsheet">
<Worksheet ss:Name="Sheet1">
<Table>
<Row>
<Cell><Data ss:Type="String">abc</Data></Cell>
<Cell><Data ss:Type="Number">123</Data></Cell>
</Row>
</Table>
</Worksheet>
</Workbook>

Classes can be arranged to mimic this hierarchy:

class XMLable:
def __init__( self, **kwar ):
self.attrs= kwar
class Workbook( XMLable ):
cattrs= {
'xmlns': "urn:schemas-microsoft-com:eek:ffice:spreadsheet",
'xmlns:ss': "urn:schemas-microsoft-com:eek:ffice:spreadsheet" }
class Worksheet( XMLable ):
cattrs= { 'name': 'ss:Name' }
class Table( XMLable ): pass
class Row( XMLable ): pass
class Cell( XMLable ): pass
class Data( XMLable ):
cattrs= { 'type': 'ss:Type' }

data= Data( content= 'abc', type= 'String' )
cell= Cell( data= data )
row= Row( cells= [ cell ] )
table= Table( rows= [ row ] )
sheet= Worksheet( table= table, name= "Sheet1" )
book= Workbook( sheets= [ sheet ] )

(These might make things cleaner, but are not allowed:

#data= Data( 'abc', 'ss:Type'= 'String' )
#sheet= Worksheet( table= table, 'ss:Name'= "Sheet1" )

For keys can only be identifiers in keyword argument syntax.)

How close to this end can the standard library come? Is it more
prevalent than something else that's currently in it? What does the
recipie look like to convert this to XML, either using import xml or
not?

import pickle
print( pickle.dumps( book ) )

is not quite what I have in mind.

I guess I'm not convinced that 'is currently in use' has always been
or even is the standard by which standard library additions are
judged. If it's not, then I hold that XML is a good direction to go.
Will core developers listen to reason? Does +1 = +1?
 
G

George Sakkis

Readability of the Pickle module. Can one export to XML, from cost
of speed and size, to benefit of user-readability?

Take a look at gnosis.xml.pickle, it seems a good starting point.

George
 
C

castironpi

Take a look at gnosis.xml.pickle, it seems a good starting point.

George

The way the OP specifies it, dumps-loads pairs are broken: say if
Table and Worksheet are defined in different modules. He'd have to
have some kind of unifying pair sequence, that says that "Worksheet"
document elements come from WS.py, etc.
 
S

Stefan Behnel

Hi,

Readability of the Pickle module. Can one export to XML, from cost of
speed and size, to benefit of user-readability?

Regarding pickling to XML, lxml.objectify can do that:

http://codespeak.net/lxml/objectify.html

however:
It does something else: plus functions do not export their code,
either in interpreter instructions, or source, or anything else; and
classes do not export their dictionaries, just their names. But it
does export in ASCII.

Pickle checks any __safe_for_unpickling__ and __setstate__ methods,
which enable a little encapsulating, but don't go far.

I'm having a hard time to understand what you are trying to achieve. Could you
state that in a few words? That's usually better than asking for a way to do X
with Y. Y (i.e. pickling in this case) might not be the right solution for you.

Stefan
 
C

castironpi

Hi,



Regarding pickling to XML, lxml.objectify can do that:

http://codespeak.net/lxml/objectify.html

however:



I'm having a hard time to understand what you are trying to achieve. Could you
state that in a few words? That's usually better than asking for a way to do X
with Y. Y (i.e. pickling in this case) might not be the right solution for you.

Stefan

The example isn't so bad. It's not clear that it isn't already too
specific. Pickling isn't what I want. XML is persistent too.

XML could go a couple ways. You could export source, byte code, and
type objects. (Pickle could do that too, thence the confusion
originally.)

gnosis.xml and lxml have slightly different outputs. What I'm going
for has been approached a few different times a few different ways
already. If all I want is an Excel-readable file, that's one end of
the spectrum. If you want something more general, but still include
Excel, that's one of many decisions to make. Ideas.

How does lxml export: b= B(); a.b= b; dumps( a )?

It looks like he can create the XML from the objects already.
 
S

Stefan Behnel

The example isn't so bad. It's not clear that it isn't already too
specific. Pickling isn't what I want. XML is persistent too.

XML could go a couple ways. You could export source, byte code, and
type objects. (Pickle could do that too, thence the confusion
originally.)

What I meant was: please state what you are trying to do. What you describe
are the environmental conditions and possible solutions that you are thinking
of, but it doesn't tell me what problem you are actually trying to solve.

gnosis.xml and lxml have slightly different outputs. What I'm going
for has been approached a few different times a few different ways
already. If all I want is an Excel-readable file, that's one end of
the spectrum. If you want something more general, but still include
Excel, that's one of many decisions to make. Ideas.

How does lxml export: b= B(); a.b= b; dumps( a )?

It looks like he can create the XML from the objects already.

In lxml.objectify, the objects *are* the XML tree. It's all about objects
being bound to specific elements in the tree.

Stefan
 
C

castironpi

What I meant was: please state what you are trying to do. What you describe
are the environmental conditions and possible solutions that you are thinking
of, but it doesn't tell me what problem you are actually trying to solve.

What problem -am- I trying to solve? Map the structure -in- to XML.
In lxml.objectify, the objects *are* the XML tree. It's all about objects
being bound to specific elements in the tree.

Stefan- Hide quoted text -

- Show quoted text -

Objects first. Create. The use case is a simulated strategy
tournament.
 
S

Stefan Behnel

Hi,

http://catb.org/~esr/faqs/smart-questions.html#goal

What problem -am- I trying to solve? Map the structure -in- to XML.

http://catb.org/~esr/faqs/smart-questions.html#beprecise

Is it a fixed structure you have, or are you free to use whatever you like?

Objects first. Create.

http://catb.org/~esr/faqs/smart-questions.html#writewell

My guess is that this is supposed to mean: "I want to create Python objects
and then write their structure out as XML". Is that the right translation?

There are many ways to do so, one is to follow these steps:

http://codespeak.net/lxml/objectify.html#tree-generation-with-the-e-factory
http://codespeak.net/lxml/objectify.html#element-access-through-object-attributes
http://codespeak.net/lxml/objectify.html#python-data-types
then maybe this:
http://codespeak.net/lxml/objectify.html#defining-additional-data-classes
and finally this:
http://codespeak.net/lxml/tutorial.html#serialisation

But as I do not know enough about the problem you are trying to solve, except:
The use case is a simulated strategy tournament.

I cannot tell if the above approach will solve your problem or not.

Stefan
 
C

castironpi

Hi,



http://catb.org/~esr/faqs/smart-questions.html#beprecise

Is it a fixed structure you have, or are you free to use whatever you like?


http://catb.org/~esr/faqs/smart-questions.html#writewell

My guess is that this is supposed to mean: "I want to create Python objects
and then write their structure out as XML". Is that the right translation?

There are many ways to do so, one is to follow these steps:

http://codespeak.net/lxml/objectify...eak.net/lxml/objectify.html#python-data-types
then maybe this:http://codespeak.net/lxml/objectify.html#defining-additional-data-cla...
and finally this:http://codespeak.net/lxml/tutorial.html#serialisation

But as I do not know enough about the problem you are trying to solve, except:


I cannot tell if the above approach will solve your problem or not.

Stefan

I was trying to start a discussion on a cool OO design. Problem's
kind of solved; downer, huh?

I haven't completed it, but it's a start. I expect I'll post some
thoughts along with progress. Will Excel read it? We'll see.

A design difference:

Worksheet= lambda parent: etree.SubElement( parent, "Worksheet" )
Table= lambda parent: etree.SubElement( parent, "Table" )
sheet= Worksheet( book ) #parent
table= Table( sheet )
vs.

table= Table() #empty table
sheet= Worksheet( table= table ) #child

I want to call sheet.table sometimes. Is there a lxml equivalent?
 
C

castironpi

I was trying to start a discussion on a cool OO design.  Problem's
kind of solved; downer, huh?

I haven't completed it, but it's a start.  I expect I'll post some
thoughts along with progress.  Will Excel read it?  We'll see.

A design difference:

Worksheet= lambda parent: etree.SubElement( parent, "Worksheet" )
Table= lambda parent: etree.SubElement( parent, "Table" )
sheet= Worksheet( book ) #parent
table= Table( sheet )
vs.

table= Table() #empty table
sheet= Worksheet( table= table ) #child

I want to call sheet.table sometimes.  Is there a lxml equivalent?- Hide quoted text -

- Show quoted text -

Minimize redundancy. Are there some possibilities ignored, such as
reading a class structure from an existing Excel XML file, downloading
the official spec, and if one is coding in Windows, how bulky is the
equiavelent COM code? One doesn't want to be re-coding the "wheel" if
it's big and hairy.
 
C

castironpi

Great!

--
 \          "I moved into an all-electric house. I forgot and left the
|
  `\   porch light on all day. When I got home the front door wouldn't
|
_o__)                                         open."  -- Steven Wright
|
Ben Finney
 
C

castironpi

I cannot tell if the above approach will solve your problem or not.

Well, declare me a persistent object.

from lxml import etree

SS= '{urn:schemas-microsoft-com:eek:ffice:spreadsheet}'
book= etree.Element( 'Workbook' )
book.set( 'xmlns', 'urn:schemas-microsoft-com:eek:ffice:spreadsheet' )
sheet= etree.SubElement(book, "Worksheet")
sheet.set( SS+ 'Name', 'WSheet1' )
table= etree.SubElement(sheet, "Table")
row= etree.SubElement(table, "Row")
cell1= etree.SubElement(row, "Cell")
data1= etree.SubElement(cell1, "Data" )
data1.set( SS+ 'Type', "Number" )
data1.text= '123'
cell2= etree.SubElement(row, "Cell")
data2= etree.SubElement(cell2, "Data" )
data2.set( SS+ 'Type', "String" )
data2.text= 'abc'
out= etree.tostring( book, pretty_print= True, xml_declaration=True )
print( out )
open( 'xl.xml', 'w' ).write( out )

Can you use set( '{ss}Type' ) somehow? And any way to make this look
closer to the original? But it works.

<?xml version='1.0' encoding='ASCII'?>
<Workbook xmlns="urn:schemas-microsoft-com:eek:ffice:spreadsheet">
<Worksheet xmlns:ns0="urn:schemas-microsoft-com:eek:ffice:spreadsheet"
ns0:Name="WSheet1">
<Table>
<Row>
<Cell>
<Data ns0:Type="Number">123</Data>
</Cell>
<Cell>
<Data ns0:Type="String">abc</Data>
</Cell>
</Row>
</Table>
</Worksheet>
</Workbook>
 
S

Stefan Behnel

Well, declare me a persistent object.

Ok, from now on, you are a persistent object. :)

from lxml import etree

SS= '{urn:schemas-microsoft-com:eek:ffice:spreadsheet}'
book= etree.Element( 'Workbook' )
book.set( 'xmlns', 'urn:schemas-microsoft-com:eek:ffice:spreadsheet' )
sheet= etree.SubElement(book, "Worksheet")
sheet.set( SS+ 'Name', 'WSheet1' )
table= etree.SubElement(sheet, "Table")
row= etree.SubElement(table, "Row")
cell1= etree.SubElement(row, "Cell")
data1= etree.SubElement(cell1, "Data" )
data1.set( SS+ 'Type', "Number" )
data1.text= '123'
cell2= etree.SubElement(row, "Cell")
data2= etree.SubElement(cell2, "Data" )
data2.set( SS+ 'Type', "String" )
data2.text= 'abc'
out= etree.tostring( book, pretty_print= True, xml_declaration=True )
print( out )
open( 'xl.xml', 'w' ).write( out )
http://codespeak.net/lxml/tutorial.html#namespaces
http://codespeak.net/lxml/tutorial.html#the-e-factory
http://codespeak.net/lxml/objectify.html#tree-generation-with-the-e-factory


Can you use set( '{ss}Type' ) somehow?

What is 'ss' here? A prefix?

What about actually reading the tutorial?

http://codespeak.net/lxml/tutorial.html#namespaces

And any way to make this look
closer to the original?

What's the difference you experience?

Stefan
 
C

castironpi

Can you use set( '{ss}Type' ) somehow?
What is 'ss' here? A prefix?

What about actually reading the tutorial?

http://codespeak.net/lxml/tutorial.html#namespaces


What's the difference you experience?

Target:
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook
xmlns="urn:schemas-microsoft-com:eek:ffice:spreadsheet"
xmlns:ss="urn:schemas-microsoft-com:eek:ffice:spreadsheet">
<Worksheet ss:Name="Sheet1">
<Table>
<Row>
<Cell><Data ss:Type="String">abc</Data></Cell>
<Cell><Data ss:Type="Number">123</Data></Cell>
</Row>
</Table>
</Worksheet>
</Workbook>

It helped get me the working one, actually-- the tutorial. 'ss' is,
and I don't know the jargon for it, a local variable, or namespace
variable, prefix?, or something. xmlns:ss="urn:schemas-microsoft-
com:eek:ffice:spreadsheet". The ElementMaker example is closest, I
think, but it's working, so, ...

I'm more interested in a simplification of the construction code, and
at this point I can get goofy and brainstorm. Ideas?
 
C

castironpi

Something else that crept up is:

<?xml version='1.0' encoding='ASCII'?>
<Workbook xmlns="[hugethingA]">
<Worksheet xmlns:ns0="[hugethingA]" ns0:name="WSheet1">
</Worksheet>
<Styles>
<Style xmlns:ns1="[hugethingA]" ns1:ID="s21"/>
</Styles>
</Workbook>

Which xmlns:ns1 gets "redefined" because I just didn't figure out how
get xmlns:ns0 definition into the Workbook tag. But too bad for me.
 
C

castironpi

Something else that crept up is:

<?xml version='1.0' encoding='ASCII'?>
<Workbook xmlns="[hugethingA]">
  <Worksheet xmlns:ns0="[hugethingA]" ns0:name="WSheet1">
  </Worksheet>
  <Styles>
    <Style xmlns:ns1="[hugethingA]" ns1:ID="s21"/>
  </Styles>
</Workbook>

Which xmlns:ns1 gets "redefined" because I just didn't figure out how
get xmlns:ns0 definition into the Workbook tag.  But too bad for me.

In Economics, they call it "Economy to Scale"- the effect, and the
point, and past it, where the cost to produce N goods on a supply
curve on which 0 goods costs 0 exceeds that on one on which 0 goods
costs more than 0: the opposite of diminishing returns. Does the
benefit of encapsulating the specifics of the XML file, including the
practice, exceed the cost of it?

For an only slightly more complex result, the encapsulated version is
presented; and the hand-coded, unencapsulated one is left as an
exercise to the reader.

book= Workbook()
sheet= Worksheet( book, 'WSheet1' )
table= Table( sheet )
row= Row( table, index= '2' )
style= Style( book, bold= True )
celli= Cell( row, styleid= style )
datai= Data( celli, 'Number', '123' )
cellj= Cell( row )
dataj= Data( cellj, 'String', 'abc' )

46 lines of infrastructure, moderately packed. Note that:

etree.XML( etree.tostring( book ) )

succeeds.
 
C

castironpi

In Economics, they call it "Economy to Scale"- the effect, and the
point, and past it, where the cost to produce N goods on a supply
curve on which 0 goods costs 0 exceeds that on one on which 0 goods
costs more than 0: the opposite of diminishing returns.  Does the
benefit of encapsulating the specifics of the XML file, including the
practice, exceed the cost of it?

And for all the management out there, yes. As soon as possible does
mean as crappy as possible. Extra is extra. Assume the sooner the
crappier and the theorem follows. (Now, corroborate the premise...)

P.S. Gluttony is American too.
 
C

castironpi

And for all the management out there, yes.  As soon as possible does
mean as crappy as possible.  Extra is extra.  Assume the sooner the
crappier and the theorem follows.  (Now, corroborate the premise...)

The sooner the crappier or the parties waste time.
 
S

Stefan Behnel

Something else that crept up is:

<?xml version='1.0' encoding='ASCII'?>
<Workbook xmlns="[hugethingA]">
<Worksheet xmlns:ns0="[hugethingA]" ns0:name="WSheet1">
</Worksheet>
<Styles>
<Style xmlns:ns1="[hugethingA]" ns1:ID="s21"/>
</Styles>
</Workbook>

Which xmlns:ns1 gets "redefined" because I just didn't figure out how
get xmlns:ns0 definition into the Workbook tag. But too bad for me.

What about actually *reading* the links I post?

http://codespeak.net/lxml/tutorial.html#the-e-factory

Hint: look out for the "nsmap" keyword argument.

Stefan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,017
Latest member
GreenAcreCBDGummiesReview

Latest Threads

Top