Best way to extract numeric values from a report?

Discussion in 'Perl Misc' started by bobmct, May 7, 2009.

  1. bobmct

    bobmct Guest

    I'm reasonably familiar with perl as I have been programming with it
    for a number of years.

    I am trying to extract information from a text based report file and
    create a file of data records for use as input to some other perl

    The report file in question has about 13 columns of financial amounts
    such as -1,234,567.89 and I wish to extract each column and output to
    a record in a signed, efficient format for later use by these other
    perl programs.

    I am currently using unpack and pack but I am not having luck with the
    floating point values.

    What are some suggestions as to how to extract and export these
    values? Any good suggestions/recommendations/examples greatly

    signed: just another perl monger
    bobmct, May 7, 2009
    1. Advertisements

  2. bobmct

    J. Gleixner Guest

    What separates the columns?

    Use whatever that is, with split, and you should be on your way.
    Show us what you've tried.

    Treating it as a string, using a regular expression, would
    be much easier than pack, IMHO.
    J. Gleixner, May 7, 2009
    1. Advertisements

  3. How are the columns identified/separated? By fixed position within the
    line (i.e. characters 12-18 are always column 4) or by a known
    separator, e.g. a comma, a semicolon, or a space character?

    If the former then unpack is the tool of choice, if the latter then
    split() will work nicely.
    And if it's a CSV-format then one of the CSV-modules will work best,
    because they will deal with all the oddities like quoted values or
    escaped characters automatically.

    Jürgen Exner, May 8, 2009
  4. bobmct

    ccc31807 Guest

    First, I assume that you are dealing with some kind of delimited
    format. The algorithm is easy -- here is some pseudo-pseudo code:

    open INFILE, '<', 'text_based_report_file.txt';
    my ($c1, $c2, etc. all the way to $c13) = split /delimiter/, $_;
    #you can now access your values like $c1, $c2, ... $c13)
    close INFILE;
    It's not clear to me exactly how you want your output, but assuming
    that it's a signed float and all you need to do is delete the commas,
    this will do it:
    $c1 =~ s/,//g;
    How do you want to export the values? Seems to me that if you input
    them in a delimited file your work is done, that you don't need to
    export the values. If you need to 'export' the values into some data
    structure in memory, use an array of array references. You can then
    access each individual value like this:
    $arr_ref->[0][0] #gets the first column in the first row
    $arr_ref->[1][12] # gets the last column in the second row

    ccc31807, May 8, 2009
  5. For monetary values, you often want to store the amount in the smallest
    denomination. So -1,234,567.89 should be stored as -123456789 cents, not
    as -1234567.89 Dollars. The reason for this is that decimal fractions
    cannot be represented exactly in binary, so the amount is rounded to the
    nearest multiple of a power of two (which happens to be
    -1234567.88999999989755451679229736328125 in this case).

    If the amounts are always given with two digits after the decimal point,
    you can just strip the decimal point, too:

    $c1 =~ s/[,.]//g;
    $c1 =~ tr/.,//d;

    If the number of digits after the decimal point varies, you need to
    normalize them first. Be careful!

    Peter J. Holzer, May 8, 2009
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.