Dr.Ruud said:
P schreef:
Yes, that is much clearer. I'll assume that you have
Windows and maybe Cygwin.
Have you read perllocale, perluniintro, perlunicode,
perlebcdic?
Yes, I have, and while I consider myself slightly more
intelligent than a garden gnome, I must admit that these
issues concerning character encoding are beyond my abilities
of comprehension (at least at present).
Use the command:
for /f "tokens=4" %w in ('chcp') do dir >text.%w
to create a file called "text.437" (if your chcp is 437)
with the dir-output for the current directory.
I assume this is a demonstration, rather than part of a
solution? Or are you saying I'll have to write a temporary
file in this way to solve my problem?
Under cygwin, you can use the command:
iconv -f CP437 -t UTF-8 text.437 > text.utf8
to convert the file from cp437 to utf8.
I don't have iconv.
But that second step can also be done with Perl.
(Almost) platform-independent way to see all available
encodings:
perl -MEncode -e "print join $/, Encode->encodings(':all')" |more
OK, this, and Mr King's reply tell me that Encode is capable
of doing this. I need 'cp437', 'cp850' and 'cp852'
(depending on which machine I'm using). For the rest of this
post I'll assume that I'll be using 'cp437'.
Now it is your turn to create some code and try to make it
work.
Here's the script (stripped for the purposes of this post)
*before* tackling the encoding issues:
----------
#!/usr/bin/perl
use warnings;
use strict;
opendir(DIR, '.') or die "Can't open input directory: $!";
my %files = map { $_ => 1 } grep { $_ !~ m/^\.\.?$/ } readdir(DIR);
while (<DATA>) {
chomp;
if ( exists $files{$_} ) {
print "$_ matches.\n";
}
else {
print "$_ doesn't match.\n";
}
}
__DATA__
Ðorde Bala-evic
----------
A file named "Ðorde Bala-evic" *does* exist in the CWD, yet
when I run this script I get:
ÄorÄ?e Bala-eviÄ? doesn't match.
So I tried the following fix:
----------
while (<DATA>) {
chomp;
my $key = decode('cp437', $_);
if ( exists $files{$key} ) {
print "$_ matches.\n";
}
else {
print "$_ doesn't match.\n";
}
}