Best way to parse email recipient lists?

S

Shanti Braford

Hey all,

So... I'm trying to parse email recipient lists (entered by hand into
the "to", "cc" and "bcc" fields of a mail app by users).

These can obviously come in a wild variety of formats, and I'd like to
support as many as possible.

The other gotcha - is that I'd like to keep as much name metadata
available as possible.

Using TMail's parser - I was under the impression that the name portion
in the "to", "cc", "bcc" fields gets stripped, down to an array of email
addresses. (i.e. otherwise we could use just TMail - please let me know
if this is incorrect or if there's a work around)

Here are a few example scenarios (from relatively easy to a little
harder):
(e-mail address removed)
(e-mail address removed)
<[email protected]>
"Bob Smith" <[email protected]>
Bob Smith <[email protected]>
"Jones, Craig" <[email protected]>
"Summer Thomas" <[email protected]>; "Al Franken" <[email protected]>
"Clinton, Bill" <[email protected]>; "Obama, Barack"
<[email protected]>; "Jenny McCarthy" <[email protected]>
Bob <[email protected]>, <[email protected]>, James Blunt
<[email protected]>

etc...

Any ideas?

I've been working up RegEx's like crazy but my RegEx foo isn't quite
what it used to be. Are there any shortcuts, or do I need one big RegEx
many specific ones to match the various scenarios?

We're currently using this RegEx to detect when we have a single
properly formatted address (w/o a name attached):
http://tfletcher.com/lib/rfc822.rb
...but that's only one small portion of the problem.

- Shanti
 
M

Michael Fleet

Shanti,

Try:

/(\W?([\w\s]+)\W+)?(\w[\w\+\-\.]+@[\w\-\.]+)\W?/i
(with "+" signs in mailbox (like (e-mail address removed)), which are
invalid)

/(\W?([\w\s]+)\W+)?(\w[\w\\-\.]+@[\w\-\.]+)\W?/i
(without "+" signs)

These should break the addresses down into arrays of matches that you
can parse into:
display name
mailbox
domain

Let me know if this doesn't pass the tests. Better yet, send me a unit
test and i'll make it work. :)

also: http://www.zenspider.com/Languages/Ruby/QuickRef.html#11


Michael Fleet
Disinnovate
http://www.disinnovate.com/
 
A

ara.t.howard

Hey all,

So... I'm trying to parse email recipient lists (entered by hand into
the "to", "cc" and "bcc" fields of a mail app by users).

These can obviously come in a wild variety of formats, and I'd like to
support as many as possible.

The other gotcha - is that I'd like to keep as much name metadata
available as possible.

Using TMail's parser - I was under the impression that the name portion
in the "to", "cc", "bcc" fields gets stripped, down to an array of email
addresses. (i.e. otherwise we could use just TMail - please let me know
if this is incorrect or if there's a work around)

Here are a few example scenarios (from relatively easy to a little
harder):
(e-mail address removed)
(e-mail address removed)
<[email protected]>
"Bob Smith" <[email protected]>
Bob Smith <[email protected]>
"Jones, Craig" <[email protected]>
"Summer Thomas" <[email protected]>; "Al Franken" <[email protected]>
"Clinton, Bill" <[email protected]>; "Obama, Barack"
<[email protected]>; "Jenny McCarthy" <[email protected]>
Bob <[email protected]>, <[email protected]>, James Blunt
<[email protected]>

harp:~ > cat a.rb
require 'tmail'
require 'yaml'

tmail = TMail::Mail::parse <<-msg
From (e-mail address removed) Thu Nov 9 08:55:15 2006
Date: Fri, 10 Nov 2006 00:52:17 +0900
From: Shanti Braford <[email protected]>
Reply-To: (e-mail address removed)
To: ruby-talk ML <[email protected]>
Newsgroups: comp.lang.ruby
Subject: Best way to parse email recipient lists?

Hey all,

So... I'm trying to parse email recipient lists (entered by hand into
the "to", "cc" and "bcc" fields of a mail app by users).
msg

%w( to from cc bcc ).each do |field|
list = tmail.send("#{ field }_addrs") || []
phrases = list.map{|a| a.phrase}

y field => phrases.zip(list.map{|a| a.to_s})
end



harp:~ > ruby a.rb
to:
- - ruby-talk ML
- ruby-talk ML <[email protected]>
from:
- - Shanti Braford
- Shanti Braford <[email protected]>
cc: []

bcc: []





-a
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top