How to parse multi-part content

D

Dave Kuhlman

Suppose that I have content that looks like what I've included at
the end of this message. Is there something in the standard
Python library that will help me parse it, break into the parts
separated by the boundary strings, extract headers from each
sub-part, etc?

Do I need to add something like the following to the beginning?

Content-Type: multipart/related;
type="multipart/alternative";
boundary="-----------------------------1646970154570313593966717980"

I've tried working with the email, mimetools, and multifile
modules in the standard library. But my understanding of these
things is dim, and I have not had success.

Is there a beginner's guide somewhere that I should read?

In case you are curious, this is content posted to my Zope server
when I include an element '<input type="file" .../>' in my form.

Here is the content that I need to parse:


-----------------------------1646970154570313593966717980
Content-Disposition: form-data; name="xschemaContent"


-----------------------------1646970154570313593966717980
Content-Disposition: form-data; name="xschemaFile"; filename="po.xsd"
Content-Type: application/octet-stream

<xs:schema targetNamespace="http://openuri.org/easypo"
xmlns:po="http://openuri.org/easypo"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">

<xs:element name="purchase-order">
<xs:complexType>
<xs:sequence>
<xs:element name="customer" type="po:customer"/>
<xs:element name="date" type="xs:dateTime"/>
<xs:element name="line-item" type="po:line-item"
minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="shipper" type="po:shipper"
minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:complexType name="customer">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="address" type="xs:string"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="line-item">
<xs:sequence>
<xs:element name="description" type="xs:string"/>
<xs:element name="per-unit-ounces" type="xs:decimal"/>
<xs:element name="price" type="xs:double"/>
<xs:element name="quantity" type="xs:integer"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="shipper">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="per-ounce-rate" type="xs:decimal"/>
</xs:sequence>
</xs:complexType>
</xs:schema>

-----------------------------1646970154570313593966717980
Content-Disposition: form-data; name="which"

superclass
-----------------------------1646970154570313593966717980
Content-Disposition: form-data; name="Submit"

Submit
-----------------------------1646970154570313593966717980--
 
J

John J. Lee

Dave Kuhlman said:
In case you are curious, this is content posted to my Zope server
when I include an element '<input type="file" .../>' in my form.
[...]

*Surely* Zope has a standard way of doing this. Try a Zope list?


John
 
T

Tim Roberts

Dave Kuhlman said:
Suppose that I have content that looks like what I've included at
the end of this message. Is there something in the standard
Python library that will help me parse it, break into the parts
separated by the boundary strings, extract headers from each
sub-part, etc?
...
In case you are curious, this is content posted to my Zope server
when I include an element '<input type="file" .../>' in my form.

Actually, you get this because your <form> header has
enctype="multipart/form-data". It happens that file upload only works with
that enctype, but you can use it without a file upload.

That's why cgi.py knows how to parse this. Look at cgi.parse_multipart.
 
D

Dave Kuhlman

John said:
Dave Kuhlman said:
In case you are curious, this is content posted to my Zope server
when I include an element '<input type="file" .../>' in my form.
[...]

*Surely* Zope has a standard way of doing this. Try a Zope list?

That's a good suggestion. Thanks. Zope people are Python people,
so they would give me the kind of help I'd need. I'll ask on the
Zope users list.

However, there is nothing Zope-specific about this. The content
was produced by my Web browser (actually two Web browsers that I
test with: Opera and Firefox).

Dave
 
D

Dave Kuhlman

Tim said:
Actually, you get this because your <form> header has
enctype="multipart/form-data". It happens that file upload only works
with that enctype, but you can use it without a file upload.

That's why cgi.py knows how to parse this. Look at cgi.parse_multipart.

Ah. A clue. I think you're telling me that it's the CGI
specification that I need to be reading, right? I'll read some of
that.

Per your suggestion, I tried cgi.parse_multipart() and also
class cgi.FieldStorage. They don't work. Or more correctly, I
don't know how to use them.

I guess I'll have to concede defeat, which in Python-speak means:
"It was easier to write it myself."

Basically, I wrote a little parser class ContentParser which
exposes a method get_content_by_name. This method returns the
body (what follows two carriage returns, up to the next
boundary line) for a given name, where name is the value of the
"name" field in the line:

Content-Disposition: form-data; name="xschemaFile"

I was in a bit of a hurry, so my solution (class ContentParser) is
not very elegant. But if anyone needs it, let me know.

And, thanks for the suggestions.

Dave
 
M

Michael Foord

Dave Kuhlman said:
Ah. A clue. I think you're telling me that it's the CGI
specification that I need to be reading, right? I'll read some of
that.

Per your suggestion, I tried cgi.parse_multipart() and also
class cgi.FieldStorage. They don't work. Or more correctly, I
don't know how to use them.

I guess I'll have to concede defeat, which in Python-speak means:
"It was easier to write it myself."

Basically, I wrote a little parser class ContentParser which
exposes a method get_content_by_name. This method returns the
body (what follows two carriage returns, up to the next
boundary line) for a given name, where name is the value of the
"name" field in the line:

Content-Disposition: form-data; name="xschemaFile"

I was in a bit of a hurry, so my solution (class ContentParser) is
not very elegant. But if anyone needs it, let me know.

And, thanks for the suggestions.

Dave

If you are receiving this data to a python script on a server from an
HTML form (i.e. a cgi) then it's striaghtforward to do.

import cgi
theform = cgi.FieldStorage()

parses the contents of the form into a dictionary like object.
The HTML form that posted the information will assign each file (or
element of the form) a name.
You can access the saved data ausing :

thedata = theform['name].value

Look under the cgi documentation for other attributes that uploaded
files will have. (Potential pitfall with 'list values' as well, where
several values have the same name - again see the docs to see ways
round this).

Regards,

Fuzzyman
http://www.voidspace.org.uk/atlantibots/pythonutils.html
 
D

Dave Kuhlman

Dave said:
John said:
Dave Kuhlman said:
In case you are curious, this is content posted to my Zope server
when I include an element '<input type="file" .../>' in my form.
[...]

*Surely* Zope has a standard way of doing this. Try a Zope list?

That's a good suggestion. Thanks. Zope people are Python people,
so they would give me the kind of help I'd need. I'll ask on the
Zope users list.

However, there is nothing Zope-specific about this. The content
was produced by my Web browser (actually two Web browsers that I
test with: Opera and Firefox).

I was wrong. You were right. There is a Zope way to do this.
Thanks for pushing me to dig deeper. It's a much easier way, too.

If there are any Zopesters reading, here is how to do it:

def my_external_method(request, ...):
# Retrieve a stream-like object.
myStream = request['myFileData']
# Read the data from the stream object.
data = myStream.read()

My problem was that I was so sure that I had to retrieve and parse
the content in the body of the request.

Thanks for help.

Dave
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,680
Members
48,796
Latest member
Greg L.

Latest Threads

Top