Heuristically processing documents

B

BJörn Lindqvist

I have a large set of documents in various text formats. I know that
each document contains its authors name, email and phone number.
Sometimes it also contains the authors home address.

The task is to find out the name, email and phone of as many documents
as possible. Since the documents are not in a specific format, you
have to do a lot of guessing and getting approximate results is fine.

For example, to find the email you can use a simple regexp. If there
is a match you can be certain that that is the authors email. But what
algorithms can you use to figure out the other information?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top