Words to numbers

W

william

I'm writing perl scripts to retrieve data from email messages. Here
are two .txt files.
ACNI050124_05_04_59.txt

received fifteen thousand dollars from
an unaffiliated third party

Section 27A of the Securities Act of 1933 and Section 21E of the
Securities Exchange Act of 1934,

involve a number of risks
and uncertainties which could cause actual results to differ
materially from those presently anticipated.

ZLDV060318_19_32_11.txt
We have received one hundred thirty five thousand free trading shares
from a
third party not an officer, director or affiliate shareholder for our
services. We intend to
sell all these shares now, which could cause the stock to go down,
resulting in losses for you.
Do your due diligence before you invest.


I want to achieve the following output to an excel table.

filename
dollars shares
ACNI050124_05_04_59.txt 15000 -9
ZLDV060318_19_32_11.txt -9 135000

-9 simply means that we don't find any information related to shares
or dollars in the file.

It seems to be a simple task at first. But I realize that it is quite
complicated when I start to write the script. Any suggestions from you
will be highly appreciated.

William
 
J

Jim Gibson

william said:
I'm writing perl scripts to retrieve data from email messages. Here
are two .txt files.
ACNI050124_05_04_59.txt

received fifteen thousand dollars ...

ZLDV060318_19_32_11.txt
We have received one hundred thirty five thousand ...
I want to achieve the following output to an excel table.

filename
dollars shares
ACNI050124_05_04_59.txt 15000 -9
ZLDV060318_19_32_11.txt -9 135000

-9 simply means that we don't find any information related to shares
or dollars in the file.

It seems to be a simple task at first. But I realize that it is quite
complicated when I start to write the script. Any suggestions from you
will be highly appreciated.

It doesn't seem simple at all. You are trying to parse free-form
English written by various people and extract numerical data from
alphabetic number names. My suggestion is to give it up before you
start.
 
T

Ted Zlatanov

JG> In article
JG> <820d8d96-2839-45ed-8ca9-1bed871bb5f0@k37g2000hsf.googlegroups.com>,

(the comments are for the OP mainly)

Have you considered empty fields instead of special values to denote
absence of value? Specifically, you may need negative numbers for
shares later if you want to indicate buy/sell modes.

JG> It doesn't seem simple at all. You are trying to parse free-form
JG> English written by various people and extract numerical data from
JG> alphabetic number names. My suggestion is to give it up before you
JG> start.

It's not impossible, and certainly it's interesting. Perhaps
http://web.media.mit.edu/~hugo/montylingua/ will be useful; it has Java
and Python interfaces and a Perl interface may be doable. At the very
least you can parse the montylingua analyzer output.

Ted
 
W

william

Thank you all for the suggestions. Nevertheless, I've accomplished the
number extraction with Perl script. I first build a library of
possible misspellings and convert them to correct ones. Then I use
perl to do a certain pattern search and convert the english numbers to
arabic numbers. Finally I can extract the numbers using kind of fuzzy
logic. As to the -9, because only positive numbers are needed in my
research design. So I use -9 to indicate all non-positive numbers or
cannot find the appropriate number.

Using perl to do natural language processing is really very
interesting. Thank you all again for you inputs.

William
 
S

sln

Thank you all for the suggestions. Nevertheless, I've accomplished the
number extraction with Perl script. I first build a library of
possible misspellings and convert them to correct ones. Then I use
perl to do a certain pattern search and convert the english numbers to
arabic numbers. Finally I can extract the numbers using kind of fuzzy
logic. As to the -9, because only positive numbers are needed in my
research design. So I use -9 to indicate all non-positive numbers or
cannot find the appropriate number.

Using perl to do natural language processing is really very
interesting. Thank you all again for you inputs.

William

-9-1-1
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top