parsing long `To' and 'Cc' from email

G

Gerardo Herzig

Hi all. Im trying to develop yet another email filter. Just for fun for
now. Im having a little trouble parsing long 'To' and 'Cc' headers.
Sometimes p.e. the 'To' header comes like

'(e-mail address removed), (e-mail address removed)'
others comes like
'"My self" <[email protected]>, "My brother" <[email protected]>',
other times a \r\t comes inside the `To' header. And any combination of
the above mentioned (and shurely more) can ocur.

the email.* package dont seems to parse that kind of headers
`correctly'. What i want is to get a list with all the email address in
the `To' header.

Someone know if there is a more sofisticated parser for doing this?

Thanks!
Gerardo
 
J

Jonathan Gardner

the email.* package dont seems to parse that kind of headers
`correctly'. What i want is to get a list with all the email address in
the `To' header.

Someone know if there is a more sofisticated parser for doing this?

If you're not interested in parsing the entire email message, you may
just want to run a regex on the message itself, looking for the "to"
header.

Here's a good start:

r"^to:\s*(.*)$"

You'll want to use the multi-line and case-insensitive options when
you use it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,608
Members
45,241
Latest member
Lisa1997

Latest Threads

Top