Building email threads from unix mailboxes

J

Jed Parsons

What headers to I have to know about to build thread trees from Unix
mailboxes?

Is it enough to get the In-Reply-To header for each message and build a
dictionary of { Message-ID: message } pairs? Or is it more complicated
than that?

If there isn't already a module to do this (and apologies if there is
one and I don't know about it), are the current tools of choice the
'email' and 'mailbox' modules? (And I guess I'd want to use the mime
decoding tools in 'email' to deal with messages that come with
attachments or html or other stuff.)

Thanks for any tips,

Jed
 
J

Josiah Carlson

Is it enough to get the In-Reply-To header for each message and build a
dictionary of { Message-ID: message } pairs? Or is it more complicated
than that?

To be RFC 2822 compliant, In-Reply-To and References are sufficient.
Other clients may add more headers, and not all clients are RFC 2822
compliant.

- Josiah
 
J

Jed Parsons

Thanks.

Is the References header a running list of all the In-Reply-To headers
so far in the thread?
 
E

Erik Max Francis

Jed said:
Is the References header a running list of all the In-Reply-To headers
so far in the thread?

It depends on the service. Some only keep a few of the last references,
some only one, some retain the full list from the very beginning (at
least as far as the RFC will allow).

Probably if you wanted to handle robust threading, you'd want to go by
In-Reply-To and References, backtracking manually (rather than relying
on any given References list to be complete), and then, for systems like
mail-to-news gateways which may break the In-Reply-To/References chain,
group by similar subjects posted around the same time.
 
M

Mark Rowe

What headers to I have to know about to build thread trees from Unix
mailboxes?

Is it enough to get the In-Reply-To header for each message and build a
dictionary of { Message-ID: message } pairs? Or is it more complicated
than that?

<http://www.jwz.org/doc/threading.html> has a good write-up about the
threading algorithm used by Netscape Mail and News 2.0 and 3.0, and
If there isn't already a module to do this (and apologies if there is
one and I don't know about it), are the current tools of choice the
'email' and 'mailbox' modules? (And I guess I'd want to use the mime
decoding tools in 'email' to deal with messages that come with
attachments or html or other stuff.)

A.M. Kuchling has made a Python implementation of JWZ's algorithm
available at said:
Thanks for any tips,

Jed

Regards,

Mark Rowe
<http://bdash.net.nz/>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,871
Messages
2,569,919
Members
46,172
Latest member
JamisonPat

Latest Threads

Top