Words to numbers

Discussion in 'Perl Misc' started by william, Sep 25, 2008.

  1. william

    william Guest

    I'm writing perl scripts to retrieve data from email messages. Here
    are two .txt files.
    ACNI050124_05_04_59.txt

    received fifteen thousand dollars from
    an unaffiliated third party

    Section 27A of the Securities Act of 1933 and Section 21E of the
    Securities Exchange Act of 1934,

    involve a number of risks
    and uncertainties which could cause actual results to differ
    materially from those presently anticipated.

    ZLDV060318_19_32_11.txt
    We have received one hundred thirty five thousand free trading shares
    from a
    third party not an officer, director or affiliate shareholder for our
    services. We intend to
    sell all these shares now, which could cause the stock to go down,
    resulting in losses for you.
    Do your due diligence before you invest.


    I want to achieve the following output to an excel table.

    filename
    dollars shares
    ACNI050124_05_04_59.txt 15000 -9
    ZLDV060318_19_32_11.txt -9 135000

    -9 simply means that we don't find any information related to shares
    or dollars in the file.

    It seems to be a simple task at first. But I realize that it is quite
    complicated when I start to write the script. Any suggestions from you
    will be highly appreciated.

    William
     
    william, Sep 25, 2008
    #1
    1. Advertising

  2. william

    Jim Gibson Guest

    In article
    <>,
    william <> wrote:

    > I'm writing perl scripts to retrieve data from email messages. Here
    > are two .txt files.
    > ACNI050124_05_04_59.txt
    >
    > received fifteen thousand dollars ...
    >
    > ZLDV060318_19_32_11.txt
    > We have received one hundred thirty five thousand ...


    >
    >
    > I want to achieve the following output to an excel table.
    >
    > filename
    > dollars shares
    > ACNI050124_05_04_59.txt 15000 -9
    > ZLDV060318_19_32_11.txt -9 135000
    >
    > -9 simply means that we don't find any information related to shares
    > or dollars in the file.
    >
    > It seems to be a simple task at first. But I realize that it is quite
    > complicated when I start to write the script. Any suggestions from you
    > will be highly appreciated.


    It doesn't seem simple at all. You are trying to parse free-form
    English written by various people and extract numerical data from
    alphabetic number names. My suggestion is to give it up before you
    start.

    --
    Jim Gibson
     
    Jim Gibson, Sep 26, 2008
    #2
    1. Advertising

  3. william

    Ted Zlatanov Guest

    On Thu, 25 Sep 2008 17:51:48 -0700 Jim Gibson <> wrote:

    JG> In article
    JG> <>,
    JG> william <> wrote:

    >> I'm writing perl scripts to retrieve data from email messages. Here
    >> are two .txt files.
    >> ACNI050124_05_04_59.txt
    >>
    >> received fifteen thousand dollars ...
    >>
    >> ZLDV060318_19_32_11.txt
    >> We have received one hundred thirty five thousand ...


    >> I want to achieve the following output to an excel table.
    >>
    >> filename
    >> dollars shares
    >> ACNI050124_05_04_59.txt 15000 -9
    >> ZLDV060318_19_32_11.txt -9 135000
    >>
    >> -9 simply means that we don't find any information related to shares
    >> or dollars in the file.


    (the comments are for the OP mainly)

    Have you considered empty fields instead of special values to denote
    absence of value? Specifically, you may need negative numbers for
    shares later if you want to indicate buy/sell modes.

    >>
    >> It seems to be a simple task at first. But I realize that it is quite
    >> complicated when I start to write the script. Any suggestions from you
    >> will be highly appreciated.


    JG> It doesn't seem simple at all. You are trying to parse free-form
    JG> English written by various people and extract numerical data from
    JG> alphabetic number names. My suggestion is to give it up before you
    JG> start.

    It's not impossible, and certainly it's interesting. Perhaps
    http://web.media.mit.edu/~hugo/montylingua/ will be useful; it has Java
    and Python interfaces and a Perl interface may be doable. At the very
    least you can parse the montylingua analyzer output.

    Ted
     
    Ted Zlatanov, Sep 26, 2008
    #3
  4. william

    william Guest

    Thank you all for the suggestions. Nevertheless, I've accomplished the
    number extraction with Perl script. I first build a library of
    possible misspellings and convert them to correct ones. Then I use
    perl to do a certain pattern search and convert the english numbers to
    arabic numbers. Finally I can extract the numbers using kind of fuzzy
    logic. As to the -9, because only positive numbers are needed in my
    research design. So I use -9 to indicate all non-positive numbers or
    cannot find the appropriate number.

    Using perl to do natural language processing is really very
    interesting. Thank you all again for you inputs.

    William
     
    william, Oct 12, 2008
    #4
  5. william

    Guest

    On Sun, 12 Oct 2008 09:51:57 -0700 (PDT), william <> wrote:

    >Thank you all for the suggestions. Nevertheless, I've accomplished the
    >number extraction with Perl script. I first build a library of
    >possible misspellings and convert them to correct ones. Then I use
    >perl to do a certain pattern search and convert the english numbers to
    >arabic numbers. Finally I can extract the numbers using kind of fuzzy
    >logic. As to the -9, because only positive numbers are needed in my
    >research design. So I use -9 to indicate all non-positive numbers or
    >cannot find the appropriate number.
    >
    >Using perl to do natural language processing is really very
    >interesting. Thank you all again for you inputs.
    >
    >William


    -9-1-1
     
    , Oct 12, 2008
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Peter Strøiman
    Replies:
    1
    Views:
    2,092
    Peter Strøiman
    Aug 23, 2005
  2. Richard Heathfield
    Replies:
    7
    Views:
    365
    Barry Schwarz
    Oct 5, 2003
  3. utab

    Words Words

    utab, Feb 16, 2006, in forum: C++
    Replies:
    6
    Views:
    429
    Daniel T.
    Feb 16, 2006
  4. BerlinBrown
    Replies:
    6
    Views:
    4,507
  5. Lasse Edsvik

    replace words with bold words

    Lasse Edsvik, Oct 5, 2003, in forum: ASP General
    Replies:
    9
    Views:
    240
Loading...

Share This Page