Parsing HTTP messages

C

Chris Gray

My initial question is, what Python library do I use to parse HTTP
messages?

Trying to use the "email" module has led me to another question and I'm
not even sure where to ask this second question.

When I parse an HTTP request using the email module, I get a field name of
"GET http:", which isn't a field name or part of header at all but part of
the "start-line" (request line) of the HTTP request.

Checking the HTTP/1.1 spec (RFC 1616) I find that HTTP messages use the
generic message format of RFC 822 (obsoleted by RFC 2822) and that:

"Both types of message consist of a start-line, one or more header fields
(also known as 'headers'), an empty line (i.e., a line with nothing
preceding the CRLF) indicating the end of the header fields, and an
optional message-body."

But my understanding of RFC (2)822 is that there is no such thing as a
"start-line" in that format, and so the "email" module is right in trying
to treat the HTTP "start-line" as a header and that that start-line should
be stripped out before feeding it the remainder of the message which _is_
in (2)822 format.

Am I (don't laugh) missing something here?

Chris Gray
 
F

Fredrik Lundh

Chris said:
My initial question is, what Python library do I use to parse HTTP
messages?

mimetools.Message is a good choice. httplib.HTTPHeader is a slightly
better choice (it's a subclass of mimetools.Message; see the httplib.py
source code for more info)
But my understanding of RFC (2)822 is that there is no such thing as a
"start-line" in that format, and so the "email" module is right in trying
to treat the HTTP "start-line" as a header and that that start-line should
be stripped out before feeding it the remainder of the message which _is_
in (2)822 format.

Am I (don't laugh) missing something here?

not really, as long as "stripped out" means "processed", not "ignored"
(the start line contains the HTTP method and the target URL)

</F>
 
C

Chris Gray

mimetools.Message is a good choice. httplib.HTTPHeader is a slightly
better choice (it's a subclass of mimetools.Message; see the httplib.py
source code for more info)

Thanks, Fredrik. I'll look at those.
not really, as long as "stripped out" means "processed", not "ignored"
(the start line contains the HTTP method and the target URL)

Granted. I was being kind of flip. I just meant that email.Message or
mimetools.Message can't process the HTTP start-line properly.

I also realized the source of my misunderstanding of the RFC. The "Both"
refers to "both HTTP requests and responses" not to "both MIME and HTTP
messages". All the lights are on now.

Thanks to all who responded,
Chris Gray
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,682
Members
48,796
Latest member
Greg L.

Latest Threads

Top