Handling international characters in filenames on Win32

  • Thread starter Stéphane Bourdeaud
  • Start date
S

Stéphane Bourdeaud

Hi,

I am struggling with handling accented characters in Win32 long filenames
correctly in Perl.

If I try to get a recursive list of directories in a given path by doing
something like this:

my @list = `dir /a:d /b /s $path`;

Where $path is a user specified path (and yes, the path does exist, yes I
chomp @list before using it, and yes this does work when there are no
international characters in the sub directories names).

If I then look at the values stored in @list or try to use them, if they do
contain an international character (such as an accented vowel), then the
call fails because the path can't be found (even though it does exist).

Any ideas on how I could get Perl to store the path names correctly in that
array?

Any help would be appreciated.

Cheers,

S. Bourdeaud
 
S

Sisyphus

Stéphane Bourdeaud said:
Hi,

I am struggling with handling accented characters in Win32 long filenames
correctly in Perl.

If I try to get a recursive list of directories in a given path by doing
something like this:

my @list = `dir /a:d /b /s $path`;

Where $path is a user specified path (and yes, the path does exist, yes I
chomp @list before using it, and yes this does work when there are no
international characters in the sub directories names).

If I then look at the values stored in @list or try to use them, if they do
contain an international character (such as an accented vowel), then the
call fails because the path can't be found (even though it does exist).

Any ideas on how I could get Perl to store the path names correctly in that
array?

Any help would be appreciated.

Sounds like one of those codeset conversion problems. DOS uses cp850 and
windows uses cp1252 which is the same for 'normal' characters but
differs wrt wide characters. It's a fairly simple task using Text::Iconv
to convert from one to the other - which would be one way to get around
the problem. It's probably just as simple to convert from one to the
other using the Encode module which is part of the perl core with perl
5.8 (though I've not used it).
Alternatively, if you use perl functions (rather than a sytem command)
to fill @list then the problem might go away. (I think that will work
because it will keep you within the one codeset - but I'm unsure :)

Cheers,
Rob
 
M

Mihai N.

Some notes, additions and corrections:
Sounds like one of those codeset conversion problems. DOS uses cp850 and
windows uses cp1252
DOS and the Windows actual console (which "symulates" the DOS prompt)
The DOS/console code page is also called OEM code page.
The Windows code page is also called (not 100% correct) ANSI code page.
Also, 850 was the OEM used for most Western European languages.
For English U.S. systems the OEM cp is 437.
which is the same for 'normal' characters but
differs wrt wide characters.
Not wide characters, just "high ascii" or "accented characters"
It's a fairly simple task using Text::Iconv
to convert from one to the other - which would be one way to get around
the problem. It's probably just as simple to convert from one to the
other using the Encode module which is part of the perl core with perl
5.8 (though I've not used it).
Main problem here: there are characters in 1252 that are not present
in 850/437.
 
S

Stéphane Bourdeaud

Rob,

Thanks for the feedback.
I worked on it all day today and came to the same conclusion as you.
I now need to find a perl function that will let me get the same result as
the dir command, that is a recursive list of all directories within a given
path...

Any ideas?

Glob is fine for listing file in a directory, but (and I am a beginner at
Perl so I am probably wrong) I haven't found a way to make it recursive.

Thanks again for the help.

Regards,

S. Bourdeaud.
 
J

J. Gleixner

Stéphane Bourdeaud said:
Rob,

Thanks for the feedback.
I worked on it all day today and came to the same conclusion as you.
I now need to find a perl function that will let me get the same result as
the dir command, that is a recursive list of all directories within a given
path...

Any ideas?

Glob is fine for listing file in a directory, but (and I am a beginner at
Perl so I am probably wrong) I haven't found a way to make it recursive.

Thanks again for the help.

Check CPAN for File::Find

http://search.cpan.org/
 
J

Jürgen Exner

Stéphane Bourdeaud said:
I now need to find a perl function that will let me get the same
result as the dir command, that is a recursive list of all
directories within a given path...

Hmmm, what's wrong with File::Find?

jue
 
S

Stéphane Bourdeaud

Thanks J.,

It looks like File::Find will do the job nicely for me.

Cheers,

S.Bourdeaud
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,053
Latest member
BrodieSola

Latest Threads

Top