Handling international characters in filenames on Win32

Discussion in 'Perl Misc' started by Stéphane Bourdeaud, Apr 15, 2004.

  1. Hi,

    I am struggling with handling accented characters in Win32 long filenames
    correctly in Perl.

    If I try to get a recursive list of directories in a given path by doing
    something like this:

    my @list = `dir /a:d /b /s $path`;

    Where $path is a user specified path (and yes, the path does exist, yes I
    chomp @list before using it, and yes this does work when there are no
    international characters in the sub directories names).

    If I then look at the values stored in @list or try to use them, if they do
    contain an international character (such as an accented vowel), then the
    call fails because the path can't be found (even though it does exist).

    Any ideas on how I could get Perl to store the path names correctly in that
    array?

    Any help would be appreciated.

    Cheers,

    S. Bourdeaud
     
    Stéphane Bourdeaud, Apr 15, 2004
    #1
    1. Advertisements

  2. Stéphane Bourdeaud

    Sisyphus Guest

    Sounds like one of those codeset conversion problems. DOS uses cp850 and
    windows uses cp1252 which is the same for 'normal' characters but
    differs wrt wide characters. It's a fairly simple task using Text::Iconv
    to convert from one to the other - which would be one way to get around
    the problem. It's probably just as simple to convert from one to the
    other using the Encode module which is part of the perl core with perl
    5.8 (though I've not used it).
    Alternatively, if you use perl functions (rather than a sytem command)
    to fill @list then the problem might go away. (I think that will work
    because it will keep you within the one codeset - but I'm unsure :)

    Cheers,
    Rob
     
    Sisyphus, Apr 16, 2004
    #2
    1. Advertisements

  3. Stéphane Bourdeaud

    Mihai N. Guest

    Some notes, additions and corrections:
    DOS and the Windows actual console (which "symulates" the DOS prompt)
    The DOS/console code page is also called OEM code page.
    The Windows code page is also called (not 100% correct) ANSI code page.
    Also, 850 was the OEM used for most Western European languages.
    For English U.S. systems the OEM cp is 437.
    Not wide characters, just "high ascii" or "accented characters"
    Main problem here: there are characters in 1252 that are not present
    in 850/437.
     
    Mihai N., Apr 16, 2004
    #3
  4. Rob,

    Thanks for the feedback.
    I worked on it all day today and came to the same conclusion as you.
    I now need to find a perl function that will let me get the same result as
    the dir command, that is a recursive list of all directories within a given
    path...

    Any ideas?

    Glob is fine for listing file in a directory, but (and I am a beginner at
    Perl so I am probably wrong) I haven't found a way to make it recursive.

    Thanks again for the help.

    Regards,

    S. Bourdeaud.
     
    Stéphane Bourdeaud, Apr 16, 2004
    #4
  5. Stéphane Bourdeaud

    J. Gleixner Guest

    Check CPAN for File::Find

    http://search.cpan.org/
     
    J. Gleixner, Apr 16, 2004
    #5
  6. Hmmm, what's wrong with File::Find?

    jue
     
    Jürgen Exner, Apr 17, 2004
    #6
  7. Thanks J.,

    It looks like File::Find will do the job nicely for me.

    Cheers,

    S.Bourdeaud
     
    Stéphane Bourdeaud, Apr 17, 2004
    #7
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.