problems with charsets

P

peter pilsl

I've a long csv-file that needs to be imported into a sql-database. My
problem now is, that I dont know the charset this file is encoded in and
afterwards I would not know how to convert it to what I need (latin1 for
output and utf8 for storage).
For current transformation unicode<->latin1 the Unicode::String-module is
what I use but the file seems not to be latin1 after all. (It comes from a
mac and I'm working on a linux-machine)

I'm aware of the fact that my problem is not really a perl-problem, but I
use perl to detect and convert the charset, so I hope its ok here.

An example for the text I need to process is available at:
http://www.goldfisch.at/temporary/text.cvs for download (its only one line
with 276 bytes).


thnx a lot for your help,
peter
 
A

Alan J. Flavell

I've a long csv-file that needs to be imported into a sql-database. My
problem now is, that I dont know the charset this file is encoded in

Text files are meaningless without the accompanying character coding
(MIME terminology: "charset") meta-information, really. That isn't a
Perl problem, no matter that you could use Perl as part of the
solution.
afterwards I would not know how to convert it to what I need (latin1 for
output and utf8 for storage).

Easy, once you identify the source coding.
For current transformation unicode<->latin1 the Unicode::String-module is
what I use but the file seems not to be latin1 after all. (It comes from a
mac and I'm working on a linux-machine)

Sounds as if the coding is likely to be macRoman. Verdammt nochmal,
das isses auch.
I'm aware of the fact that my problem is not really a perl-problem, but I
use perl to detect and convert the charset, so I hope its ok here.

Not really, but by chance it happens to be one of my specialist
subjects...
An example for the text I need to process is available at:
http://www.goldfisch.at/temporary/text.cvs for download (its only one line
with 276 bytes).

What I did was simply to view it in Mozilla and play with the
view->coding settings until it started to make sense.

Now go to the Perl encoding pages to find out how to define the
encoding layer (5.8.0+) or the explicit en/de/coding calls to handle
it. After that it's a doddle (=Spaziergang, Kleinigkeit, or
whatever).

see also: http://www.perldoc.com/perl5.8.0/lib/Encode.html
http://www.perldoc.com/perl5.8.0/lib/Encode/Supported.html

good luck
 
P

peter pilsl

Alan said:
Now go to the Perl encoding pages to find out how to define the
encoding layer (5.8.0+) or the explicit en/de/coding calls to handle
it. After that it's a doddle (=Spaziergang, Kleinigkeit, or
whatever).

see also: http://www.perldoc.com/perl5.8.0/lib/Encode.html
http://www.perldoc.com/perl5.8.0/lib/Encode/Supported.html

You are one of the good ghosts in this group. Like many times before you
helped me a lot with accurate information.
Thnx a lot, (tausend dank und so)

peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top