To decode the Subject =?iso-8859-2?Q?=... in email in python

Discussion in 'Python' started by Dan Polansky, Apr 20, 2005.

  1. Dan Polansky

    Dan Polansky Guest

    When parsing messages using python's libraries email and mailbox, the
    subject is often encoded using some kind of = notation. Apparently, the
    encoding used in this notation is specified like =?iso-8859-2?Q?=... or
    =?iso-8859-2?B?=. Is there a python library function to decode such a
    subject, returning a unicode string? The use would be like

    human_readable = cool_library.decode_equals(message['Subject'])

    Thank you, Dan
    Dan Polansky, Apr 20, 2005
    #1
    1. Advertising

  2. Dan Polansky

    Max M Guest

    Dan Polansky wrote:
    > When parsing messages using python's libraries email and mailbox, the
    > subject is often encoded using some kind of = notation. Apparently, the
    > encoding used in this notation is specified like =?iso-8859-2?Q?=... or
    > =?iso-8859-2?B?=. Is there a python library function to decode such a
    > subject, returning a unicode string? The use would be like
    >
    > human_readable = cool_library.decode_equals(message['Subject'])



    parts = email.Header.decode_header(header)
    new_header = email.Header.make_header(parts)
    human_readable = unicode(new_header)



    --

    hilsen/regards Max M, Denmark

    http://www.mxm.dk/
    IT's Mad Science
    Max M, Apr 20, 2005
    #2
    1. Advertising

  3. # / 2005-04-20 00:30:35 -0700:
    > When parsing messages using python's libraries email and mailbox, the
    > subject is often encoded using some kind of = notation. Apparently, the
    > encoding used in this notation is specified like =?iso-8859-2?Q?=... or
    > =?iso-8859-2?B?=.


    That's RFC 2047 encoding, both examples introduce an ISO8859-2
    string, the first variant says it's ascii-ized using
    "Q"uoted-Printable, the other says the string is "B"ase64-encoded.

    > Is there a python library function to decode such a
    > subject, returning a unicode string? The use would be like
    >
    > human_readable = cool_library.decode_equals(message['Subject'])


    quoting from http://docs.python.org/lib/module-email.Header.html

    >>> from email.Header import decode_header
    >>> decode_header('=?iso-8859-1?q?p=F6stal?=')

    [('p\xf6stal', 'iso-8859-1')]

    --
    How many Vietnam vets does it take to screw in a light bulb?
    You don't know, man. You don't KNOW.
    Cause you weren't THERE. http://bash.org/?255991
    Roman Neuhauser, Apr 20, 2005
    #3
  4. Dan Polansky

    Neil Hodgson Guest

    Dan Polansky:

    > When parsing messages using python's libraries email and mailbox, the
    > subject is often encoded using some kind of = notation. Apparently, the
    > encoding used in this notation is specified like =?iso-8859-2?Q?=... or
    > =?iso-8859-2?B?=. Is there a python library function to decode such a
    > subject, returning a unicode string? The use would be like
    >
    > human_readable = cool_library.decode_equals(message['Subject'])


    Here is some code from a front end to Mailman moderation pages:

    import email.Header
    hdr = email.Header.make_header(email.Header.decode_header(sub))

    Neil
    Neil Hodgson, Apr 20, 2005
    #4
  5. Dan Polansky

    Dan Polansky Guest

    Max, thanks; that was helpful. Roman, your explanation was helpful as
    well. Dan
    Dan Polansky, Apr 22, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?iso-8859-1?b?Sulq6Q==?=

    [EXCEL] sequential =?iso-8859-1?q?acc=E8s=2E?=

    =?iso-8859-1?b?Sulq6Q==?=, Jan 23, 2004, in forum: Perl
    Replies:
    0
    Views:
    708
    =?iso-8859-1?b?Sulq6Q==?=
    Jan 23, 2004
  2. Timiun
    Replies:
    1
    Views:
    513
    Joerg Jooss
    Dec 24, 2003
  3. SHIRE
    Replies:
    4
    Views:
    2,691
    SHIRE
    Feb 4, 2004
  4. Franck DARRAS
    Replies:
    12
    Views:
    630
    Jim Higson
    Aug 23, 2004
  5. Peter Jacobi
    Replies:
    13
    Views:
    842
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Aug 3, 2004
Loading...

Share This Page