problems with charsets

Discussion in 'Perl Misc' started by peter pilsl, Sep 5, 2003.

  1. peter pilsl

    peter pilsl Guest

    I've a long csv-file that needs to be imported into a sql-database. My
    problem now is, that I dont know the charset this file is encoded in and
    afterwards I would not know how to convert it to what I need (latin1 for
    output and utf8 for storage).
    For current transformation unicode<->latin1 the Unicode::String-module is
    what I use but the file seems not to be latin1 after all. (It comes from a
    mac and I'm working on a linux-machine)

    I'm aware of the fact that my problem is not really a perl-problem, but I
    use perl to detect and convert the charset, so I hope its ok here.

    An example for the text I need to process is available at:
    http://www.goldfisch.at/temporary/text.cvs for download (its only one line
    with 276 bytes).


    thnx a lot for your help,
    peter





    --
    peter pilsl

    http://www.goldfisch.at
     
    peter pilsl, Sep 5, 2003
    #1
    1. Advertising

  2. On Fri, Sep 5, peter pilsl inscribed on the eternal scroll:

    > I've a long csv-file that needs to be imported into a sql-database. My
    > problem now is, that I dont know the charset this file is encoded in


    Text files are meaningless without the accompanying character coding
    (MIME terminology: "charset") meta-information, really. That isn't a
    Perl problem, no matter that you could use Perl as part of the
    solution.

    > afterwards I would not know how to convert it to what I need (latin1 for
    > output and utf8 for storage).


    Easy, once you identify the source coding.

    > For current transformation unicode<->latin1 the Unicode::String-module is
    > what I use but the file seems not to be latin1 after all. (It comes from a
    > mac and I'm working on a linux-machine)


    Sounds as if the coding is likely to be macRoman. Verdammt nochmal,
    das isses auch.

    > I'm aware of the fact that my problem is not really a perl-problem, but I
    > use perl to detect and convert the charset, so I hope its ok here.


    Not really, but by chance it happens to be one of my specialist
    subjects...

    > An example for the text I need to process is available at:
    > http://www.goldfisch.at/temporary/text.cvs for download (its only one line
    > with 276 bytes).


    What I did was simply to view it in Mozilla and play with the
    view->coding settings until it started to make sense.

    Now go to the Perl encoding pages to find out how to define the
    encoding layer (5.8.0+) or the explicit en/de/coding calls to handle
    it. After that it's a doddle (=Spaziergang, Kleinigkeit, or
    whatever).

    see also: http://www.perldoc.com/perl5.8.0/lib/Encode.html
    http://www.perldoc.com/perl5.8.0/lib/Encode/Supported.html

    good luck
     
    Alan J. Flavell, Sep 5, 2003
    #2
    1. Advertising

  3. peter pilsl

    peter pilsl Guest

    Alan J. Flavell wrote:

    >
    > Now go to the Perl encoding pages to find out how to define the
    > encoding layer (5.8.0+) or the explicit en/de/coding calls to handle
    > it. After that it's a doddle (=Spaziergang, Kleinigkeit, or
    > whatever).
    >
    > see also: http://www.perldoc.com/perl5.8.0/lib/Encode.html
    > http://www.perldoc.com/perl5.8.0/lib/Encode/Supported.html
    >


    You are one of the good ghosts in this group. Like many times before you
    helped me a lot with accurate information.
    Thnx a lot, (tausend dank und so)

    peter

    --
    peter pilsl

    http://www.goldfisch.at
     
    peter pilsl, Sep 7, 2003
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mike Lischke

    InputStreamReader and charsets

    Mike Lischke, Jul 6, 2004, in forum: Java
    Replies:
    5
    Views:
    634
    Dale King
    Jul 13, 2004
  2. Marius Waldal

    Servlets and charsets

    Marius Waldal, Feb 15, 2005, in forum: Java
    Replies:
    0
    Views:
    368
    Marius Waldal
    Feb 15, 2005
  3. vKp
    Replies:
    1
    Views:
    370
    Patrick TJ McPhee
    Jan 31, 2004
  4. jose

    Asp .Net and Charsets

    jose, Jul 18, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    364
  5. jose

    Asp .Net page scraping Charsets

    jose, Jul 18, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    414
Loading...

Share This Page