G
Gary E. Ansok
One way to access the files in a directory is
opendir DH, $dir or die "opendir: $!";
while (my $file = readdir DH) {
next unless -f "$dir/$file";
# do whatever needs to be done with "$dir/$file";
}
However, this fails given the combination of two facts:
1) $dir is encoded internally in UTF8 (even if $dir doesn't
contain any non-ASCII characters)
2) $file contains non-ASCII characters
The string "$dir/$file" becomes UTF8-encoded, and while it
prints correctly, and compares equal to the same string not
UTF8-encoded, apparently the internal encoding is used
in a stat() (or open()) call, which then fails with $! being
"No such file".
Is there a way to work around this without needing to
transcode all strings that might be UTF8-encoded? $dir is
being read in from a config file using a module (XML::Simple),
so I don't have a lot of control over how it's initialized.
I know I could recast the code to chdir() to $dir, but that
would be a significant change given the current code structure.
This is on Solaris, using 5.8.0, though I've verified
similar behavior on Windows with 5.8.7. I've tried different
settings for LC_ALL, and it doesn't seem to make a difference.
Below is a more complete program to demonstrate the bug. It
assumes that a directory "t2" already exists, with
suitably-named file in it (I used "fil\351.txt")
Thanks,
Gary Ansok
#! /opt/perl/5.8.0/bin/perl
use strict;
use warnings;
my $show_bug = 1;
my $dir = 't2';
if ($show_bug) { # force $dir to be UTF8-encoded
$dir .= "\x{100}";
chop $dir;
}
print "Opening dir '$dir'\n";
opendir DH, $dir or die "opendir: $!";
while (my $file = readdir DH) {
print "Checking file '$dir/$file'\n";
next unless -f "$dir/$file";
print "Found file '$dir/$file'\n";
}
opendir DH, $dir or die "opendir: $!";
while (my $file = readdir DH) {
next unless -f "$dir/$file";
# do whatever needs to be done with "$dir/$file";
}
However, this fails given the combination of two facts:
1) $dir is encoded internally in UTF8 (even if $dir doesn't
contain any non-ASCII characters)
2) $file contains non-ASCII characters
The string "$dir/$file" becomes UTF8-encoded, and while it
prints correctly, and compares equal to the same string not
UTF8-encoded, apparently the internal encoding is used
in a stat() (or open()) call, which then fails with $! being
"No such file".
Is there a way to work around this without needing to
transcode all strings that might be UTF8-encoded? $dir is
being read in from a config file using a module (XML::Simple),
so I don't have a lot of control over how it's initialized.
I know I could recast the code to chdir() to $dir, but that
would be a significant change given the current code structure.
This is on Solaris, using 5.8.0, though I've verified
similar behavior on Windows with 5.8.7. I've tried different
settings for LC_ALL, and it doesn't seem to make a difference.
Below is a more complete program to demonstrate the bug. It
assumes that a directory "t2" already exists, with
suitably-named file in it (I used "fil\351.txt")
Thanks,
Gary Ansok
#! /opt/perl/5.8.0/bin/perl
use strict;
use warnings;
my $show_bug = 1;
my $dir = 't2';
if ($show_bug) { # force $dir to be UTF8-encoded
$dir .= "\x{100}";
chop $dir;
}
print "Opening dir '$dir'\n";
opendir DH, $dir or die "opendir: $!";
while (my $file = readdir DH) {
print "Checking file '$dir/$file'\n";
next unless -f "$dir/$file";
print "Found file '$dir/$file'\n";
}