To decode the Subject =?iso-8859-2?Q?=... in email in python

D

Dan Polansky

When parsing messages using python's libraries email and mailbox, the
subject is often encoded using some kind of = notation. Apparently, the
encoding used in this notation is specified like =?iso-8859-2?Q?=... or
=?iso-8859-2?B?=. Is there a python library function to decode such a
subject, returning a unicode string? The use would be like

human_readable = cool_library.decode_equals(message['Subject'])

Thank you, Dan
 
M

Max M

Dan said:
When parsing messages using python's libraries email and mailbox, the
subject is often encoded using some kind of = notation. Apparently, the
encoding used in this notation is specified like =?iso-8859-2?Q?=... or
=?iso-8859-2?B?=. Is there a python library function to decode such a
subject, returning a unicode string? The use would be like

human_readable = cool_library.decode_equals(message['Subject'])


parts = email.Header.decode_header(header)
new_header = email.Header.make_header(parts)
human_readable = unicode(new_header)



--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science
 
R

Roman Neuhauser

# (e-mail address removed) / 2005-04-20 00:30:35 -0700:
When parsing messages using python's libraries email and mailbox, the
subject is often encoded using some kind of = notation. Apparently, the
encoding used in this notation is specified like =?iso-8859-2?Q?=... or
=?iso-8859-2?B?=.

That's RFC 2047 encoding, both examples introduce an ISO8859-2
string, the first variant says it's ascii-ized using
"Q"uoted-Printable, the other says the string is "B"ase64-encoded.
Is there a python library function to decode such a
subject, returning a unicode string? The use would be like

human_readable = cool_library.decode_equals(message['Subject'])

quoting from http://docs.python.org/lib/module-email.Header.html
[('p\xf6stal', 'iso-8859-1')]
 
N

Neil Hodgson

Dan Polansky:
When parsing messages using python's libraries email and mailbox, the
subject is often encoded using some kind of = notation. Apparently, the
encoding used in this notation is specified like =?iso-8859-2?Q?=... or
=?iso-8859-2?B?=. Is there a python library function to decode such a
subject, returning a unicode string? The use would be like

human_readable = cool_library.decode_equals(message['Subject'])

Here is some code from a front end to Mailman moderation pages:

import email.Header
hdr = email.Header.make_header(email.Header.decode_header(sub))

Neil
 
D

Dan Polansky

Max, thanks; that was helpful. Roman, your explanation was helpful as
well. Dan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,534
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top