Complicated email parse or text extraction and database insertion

C

code_worthy

I am trying to strip some data out of numerous emails and place it in
my database. I know that this seems as if it has been done before.
But, this is a little different. First, the numerous emails all have a
set of data that needs to be extracted and inserted into the database.
Some of the data in the email is id, name, address, city, state, zip,
company, etc. The catch is that the date is formated and presented
differently in each email. Take into consideration the following email
examples:
- excert from email #1
ID:.............. 12345
Name:............ JOHN DOE
Address:......... PO BOX 9999
City:............ Somecity
State:........... CA
Zip Code:........ 90210
===============================================================

Company Information:

1.:-
Company Name:....... Perl N PHP Scripts Welcome

- excert from email #2
Full Name -- Doe, John
Address -- PO BOX 9999
City -- Somecity St -- California
Zip -- 90210
Company Name -- Perl N PHP Scripts Welcome
ID -- 12345

- excert from email #3
Name.....Address.....City.....State.....Zip.....Identification
Number.....Company
John Doe.....PO Box
9999.....Somecity.....CA.....90210.....12345.....Perl N PHP Scripts
Welcome

- excert from email #4

Name.........Address.........City.........State.....Zip.......Identification
Number.....Company
JOHN DOE.....PO BOX
9999.....SOMECITY.....CA........90210.....12345.....................Perl
N PHP Scripts Welcome

Can anyone help me with either scripts that have already been
developed or suggestions on how to go about striping out the needed
information from emails with out knowing their format or order of the
data? THANKS IN ADVANCE.
 
G

Gunnar Hjalmarsson

Can anyone help me with either scripts that have already been
developed or suggestions on how to go about striping out the needed
information from emails with out knowing their format or order of the
data?

No.
 
M

Matt Garrish

I am trying to strip some data out of numerous emails and place it in
my database. I know that this seems as if it has been done before.
But, this is a little different. First, the numerous emails all have a
set of data that needs to be extracted and inserted into the database.
Some of the data in the email is id, name, address, city, state, zip,
company, etc. The catch is that the date is formated and presented
differently in each email.

You're asking to find patterns where there are none (or you haven't looked
hard enough yet to distinguish them). The two options that spring to mind
are: 1) to write a script that can process the most common formats and use
it to batch process as many emails as you can; and/or 2) clean up the data
manually first (e.g., convert to xml).

Matt
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top