UTF-8 and Spreadsheet::ParseExcel

Discussion in 'Perl Misc' started by roberto0, Aug 17, 2005.

  1. roberto0

    roberto0 Guest

    Hello,

    I'm trying to parse a large number of multilingual Excel sheets such
    that I can load much of the data into an Oracle database. The problem
    is that there are a number of UTF-8 characters that are not recognized
    as "chars" by the DB and we need those fields to be searchable. The DB
    requirement is for my script to generate ASCII characters and/or
    transliterations from those UTF-8 characters. In other words, the DB
    people want "alpha" to replace the UTF-8 {GREEK SMALL LETTER ALPHA}.

    This is all fine and good and I have scripts that do this rather well
    for Unicode or other UTF-8 files. The problem arises when I use
    Spreadsheet::parseExcel to read MS Excel files. It seems that the
    parser only picks up the last half of the character. (last 4 bytes of
    the 8-byte character, I think) It then becomes impossible to
    differentiate between certain UTF8 characters since many have the same
    second half.

    for example the UTF8 symbols for {MICRO SYMBOL} and {GREEK SMALL LETTER
    EPSILON} are gleaned from ParseExcel as <B5>. When I parse the same
    symbols from a plain unicode text file, each character is reported as
    <A3><B5> and <21><B5> respectively.

    I know ParseExcel uses OLE::Storage as its interface. Could the
    problem lie there?
    roberto0, Aug 17, 2005
    #1
    1. Advertising

  2. roberto0

    roberto0 Guest

    acutally, the MICRO SIGN is just <B5> and and GREEK SMALL LETTER
    EPSILON is <CE><B5>.

    Someone suggested that the context of the files I'm parsing may be the
    key to determining the answer to my problem. However, the files I'm
    parsing aren't perfect, and the less I rely on the context, the better.


    Thanks in advance for any tips or advice,

    roberto0
    roberto0, Aug 17, 2005
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Hannes Wyss

    [ANN] Spreadsheet::ParseExcel

    Hannes Wyss, Feb 7, 2006, in forum: Ruby
    Replies:
    4
    Views:
    210
    Anu Sebastian
    Aug 4, 2009
  2. Marko Faldix

    ppm: ppm install Spreadsheet::ParseExcel

    Marko Faldix, Jan 26, 2004, in forum: Perl Misc
    Replies:
    3
    Views:
    423
    James Willmore
    Jan 27, 2004
  3. Jahagirdar Vijayvithal S

    [SpreadSheet::ParseExcel] How to get the Cell comment

    Jahagirdar Vijayvithal S, Feb 11, 2004, in forum: Perl Misc
    Replies:
    1
    Views:
    127
    Peter A. Krupa
    Feb 12, 2004
  4. Michael Preminger
    Replies:
    0
    Views:
    157
    Michael Preminger
    Sep 20, 2004
  5. goomania
    Replies:
    3
    Views:
    176
    bate_G
    Jun 2, 2006
Loading...

Share This Page