S
Shanti Braford
Hey all,
So... I'm trying to parse email recipient lists (entered by hand into
the "to", "cc" and "bcc" fields of a mail app by users).
These can obviously come in a wild variety of formats, and I'd like to
support as many as possible.
The other gotcha - is that I'd like to keep as much name metadata
available as possible.
Using TMail's parser - I was under the impression that the name portion
in the "to", "cc", "bcc" fields gets stripped, down to an array of email
addresses. (i.e. otherwise we could use just TMail - please let me know
if this is incorrect or if there's a work around)
Here are a few example scenarios (from relatively easy to a little
harder):
(e-mail address removed)
(e-mail address removed)
<[email protected]>
"Bob Smith" <[email protected]>
Bob Smith <[email protected]>
"Jones, Craig" <[email protected]>
"Summer Thomas" <[email protected]>; "Al Franken" <[email protected]>
"Clinton, Bill" <[email protected]>; "Obama, Barack"
<[email protected]>; "Jenny McCarthy" <[email protected]>
Bob <[email protected]>, <[email protected]>, James Blunt
<[email protected]>
etc...
Any ideas?
I've been working up RegEx's like crazy but my RegEx foo isn't quite
what it used to be. Are there any shortcuts, or do I need one big RegEx
many specific ones to match the various scenarios?
We're currently using this RegEx to detect when we have a single
properly formatted address (w/o a name attached):
http://tfletcher.com/lib/rfc822.rb
...but that's only one small portion of the problem.
- Shanti
So... I'm trying to parse email recipient lists (entered by hand into
the "to", "cc" and "bcc" fields of a mail app by users).
These can obviously come in a wild variety of formats, and I'd like to
support as many as possible.
The other gotcha - is that I'd like to keep as much name metadata
available as possible.
Using TMail's parser - I was under the impression that the name portion
in the "to", "cc", "bcc" fields gets stripped, down to an array of email
addresses. (i.e. otherwise we could use just TMail - please let me know
if this is incorrect or if there's a work around)
Here are a few example scenarios (from relatively easy to a little
harder):
(e-mail address removed)
(e-mail address removed)
<[email protected]>
"Bob Smith" <[email protected]>
Bob Smith <[email protected]>
"Jones, Craig" <[email protected]>
"Summer Thomas" <[email protected]>; "Al Franken" <[email protected]>
"Clinton, Bill" <[email protected]>; "Obama, Barack"
<[email protected]>; "Jenny McCarthy" <[email protected]>
Bob <[email protected]>, <[email protected]>, James Blunt
<[email protected]>
etc...
Any ideas?
I've been working up RegEx's like crazy but my RegEx foo isn't quite
what it used to be. Are there any shortcuts, or do I need one big RegEx
many specific ones to match the various scenarios?
We're currently using this RegEx to detect when we have a single
properly formatted address (w/o a name attached):
http://tfletcher.com/lib/rfc822.rb
...but that's only one small portion of the problem.
- Shanti