email header decoding fails

Z

ZeeGeek

It seems that the decode_header function in email.Header fails when
the string is in the following form,

'=?gb2312?Q?=D0=C7=C8=FC?=(revised)'

That's when a non-encoded string follows the encoded string without
any whitespace. In this case, decode_header function treats the whole
string as non-encoded. Is there a work around for this problem?

Thanks.
 
G

Gabriel Genellina

It seems that the decode_header function in email.Header fails when
the string is in the following form,

'=?gb2312?Q?=D0=C7=C8=FC?=(revised)'

That's when a non-encoded string follows the encoded string without
any whitespace. In this case, decode_header function treats the whole
string as non-encoded. Is there a work around for this problem?

That header does not comply with RFC2047 (MIME Part Three: Message Header
Extensions for Non-ASCII Text)

Section 5 (1)
An 'encoded-word' may replace a 'text' token (as defined by RFC 822)
in any Subject or Comments header field, any extension message
header field, or any MIME body part field for which the field body
is defined as '*text'. [...]
Ordinary ASCII text and 'encoded-word's may appear together in the
same header field. However, an 'encoded-word' that appears in a
header field defined as '*text' MUST be separated from any adjacent
'encoded-word' or 'text' by 'linear-white-space'.

Section 5 (3)
As a replacement for a 'word' entity within a 'phrase', for example,
one that precedes an address in a From, To, or Cc header. [...]
An 'encoded-word' that appears within a
'phrase' MUST be separated from any adjacent 'word', 'text' or
'special' by 'linear-white-space'.
 
Z

ZeeGeek

It seems that the decode_header function in email.Header fails when
the string is in the following form,

That's when a non-encoded string follows the encoded string without
any whitespace. In this case, decode_header function treats the whole
string as non-encoded. Is there a work around for this problem?

That header does not comply with RFC2047 (MIME Part Three: Message Header
Extensions for Non-ASCII Text)

Section 5 (1)
An 'encoded-word' may replace a 'text' token (as defined by RFC 822)
in any Subject or Comments header field, any extension message
header field, or any MIME body part field for which the field body
is defined as '*text'. [...]
Ordinary ASCII text and 'encoded-word's may appear together in the
same header field. However, an 'encoded-word' that appears in a
header field defined as '*text' MUST be separated from any adjacent
'encoded-word' or 'text' by 'linear-white-space'.

Section 5 (3)
As a replacement for a 'word' entity within a 'phrase', for example,
one that precedes an address in a From, To, or Cc header. [...]
An 'encoded-word' that appears within a
'phrase' MUST be separated from any adjacent 'word', 'text' or
'special' by 'linear-white-space'.

Thank you very much, Gabriel.
 
G

Gabriel Genellina

Thank you very much, Gabriel.

The above just says "why" decode_header refuses to decode it, and why it's
not a bug. But if you actually have to deal with those malformed headers,
some heuristics may help. By example, if you *know* your mails typically
specify gb2312 encoding, or iso-8859-1, you may look for things that look
like the example above and "fix" it.
 
Z

ZeeGeek

The above just says "why" decode_header refuses to decode it, and why it's
not a bug. But if you actually have to deal with those malformed headers,
some heuristics may help. By example, if you *know* your mails typically
specify gb2312 encoding, or iso-8859-1, you may look for things that look
like the example above and "fix" it.

Right now what I'm doing is to use re.sub(r'(=\?([^\?]*\?){3}=)', r'
\1 ', orig_string) to detect and place an extra white space before and
after every occurrence of an encoded string. Then the whole string is
compliant with the standard and decode_header can decode it properly.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,169
Latest member
ArturoOlne
Top