Handling international characters in filenames on Win32

Discussion in 'Perl Misc' started by Stéphane Bourdeaud, Apr 15, 2004.

  1. Hi,

    I am struggling with handling accented characters in Win32 long filenames
    correctly in Perl.

    If I try to get a recursive list of directories in a given path by doing
    something like this:

    my @list = `dir /a:d /b /s $path`;

    Where $path is a user specified path (and yes, the path does exist, yes I
    chomp @list before using it, and yes this does work when there are no
    international characters in the sub directories names).

    If I then look at the values stored in @list or try to use them, if they do
    contain an international character (such as an accented vowel), then the
    call fails because the path can't be found (even though it does exist).

    Any ideas on how I could get Perl to store the path names correctly in that
    array?

    Any help would be appreciated.

    Cheers,

    S. Bourdeaud
     
    Stéphane Bourdeaud, Apr 15, 2004
    #1
    1. Advertising

  2. Stéphane Bourdeaud

    Sisyphus Guest

    Stéphane Bourdeaud wrote:
    > Hi,
    >
    > I am struggling with handling accented characters in Win32 long filenames
    > correctly in Perl.
    >
    > If I try to get a recursive list of directories in a given path by doing
    > something like this:
    >
    > my @list = `dir /a:d /b /s $path`;
    >
    > Where $path is a user specified path (and yes, the path does exist, yes I
    > chomp @list before using it, and yes this does work when there are no
    > international characters in the sub directories names).
    >
    > If I then look at the values stored in @list or try to use them, if they do
    > contain an international character (such as an accented vowel), then the
    > call fails because the path can't be found (even though it does exist).
    >
    > Any ideas on how I could get Perl to store the path names correctly in that
    > array?
    >
    > Any help would be appreciated.
    >


    Sounds like one of those codeset conversion problems. DOS uses cp850 and
    windows uses cp1252 which is the same for 'normal' characters but
    differs wrt wide characters. It's a fairly simple task using Text::Iconv
    to convert from one to the other - which would be one way to get around
    the problem. It's probably just as simple to convert from one to the
    other using the Encode module which is part of the perl core with perl
    5.8 (though I've not used it).
    Alternatively, if you use perl functions (rather than a sytem command)
    to fill @list then the problem might go away. (I think that will work
    because it will keep you within the one codeset - but I'm unsure :)

    Cheers,
    Rob

    --
    To reply by email u have to take out the u in kalinaubears.
     
    Sisyphus, Apr 16, 2004
    #2
    1. Advertising

  3. Stéphane Bourdeaud

    Mihai N. Guest

    Some notes, additions and corrections:
    > Sounds like one of those codeset conversion problems. DOS uses cp850 and
    > windows uses cp1252

    DOS and the Windows actual console (which "symulates" the DOS prompt)
    The DOS/console code page is also called OEM code page.
    The Windows code page is also called (not 100% correct) ANSI code page.
    Also, 850 was the OEM used for most Western European languages.
    For English U.S. systems the OEM cp is 437.

    > which is the same for 'normal' characters but
    > differs wrt wide characters.

    Not wide characters, just "high ascii" or "accented characters"

    > It's a fairly simple task using Text::Iconv
    > to convert from one to the other - which would be one way to get around
    > the problem. It's probably just as simple to convert from one to the
    > other using the Encode module which is part of the perl core with perl
    > 5.8 (though I've not used it).

    Main problem here: there are characters in 1252 that are not present
    in 850/437.


    --
    Mihai
    -------------------------
    Replace _year_ with _ to get the real email
     
    Mihai N., Apr 16, 2004
    #3
  4. Rob,

    Thanks for the feedback.
    I worked on it all day today and came to the same conclusion as you.
    I now need to find a perl function that will let me get the same result as
    the dir command, that is a recursive list of all directories within a given
    path...

    Any ideas?

    Glob is fine for listing file in a directory, but (and I am a beginner at
    Perl so I am probably wrong) I haven't found a way to make it recursive.

    Thanks again for the help.

    Regards,

    S. Bourdeaud.


    "Sisyphus" <> wrote in message
    news:407f2127$0$16598$...
    > Sounds like one of those codeset conversion problems. DOS uses cp850 and
    > windows uses cp1252 which is the same for 'normal' characters but
    > differs wrt wide characters. It's a fairly simple task using Text::Iconv
    > to convert from one to the other - which would be one way to get around
    > the problem. It's probably just as simple to convert from one to the
    > other using the Encode module which is part of the perl core with perl
    > 5.8 (though I've not used it).
    > Alternatively, if you use perl functions (rather than a sytem command)
    > to fill @list then the problem might go away. (I think that will work
    > because it will keep you within the one codeset - but I'm unsure :)
    >
    > Cheers,
    > Rob
    >
    > --
    > To reply by email u have to take out the u in kalinaubears.
    >
     
    Stéphane Bourdeaud, Apr 16, 2004
    #4
  5. Stéphane Bourdeaud

    J. Gleixner Guest

    Stéphane Bourdeaud wrote:
    > Rob,
    >
    > Thanks for the feedback.
    > I worked on it all day today and came to the same conclusion as you.
    > I now need to find a perl function that will let me get the same result as
    > the dir command, that is a recursive list of all directories within a given
    > path...
    >
    > Any ideas?
    >
    > Glob is fine for listing file in a directory, but (and I am a beginner at
    > Perl so I am probably wrong) I haven't found a way to make it recursive.
    >
    > Thanks again for the help.


    Check CPAN for File::Find

    http://search.cpan.org/
     
    J. Gleixner, Apr 16, 2004
    #5
  6. Stéphane Bourdeaud wrote:
    > I now need to find a perl function that will let me get the same
    > result as the dir command, that is a recursive list of all
    > directories within a given path...


    Hmmm, what's wrong with File::Find?

    jue
     
    Jürgen Exner, Apr 17, 2004
    #6
  7. Thanks J.,

    It looks like File::Find will do the job nicely for me.

    Cheers,

    S.Bourdeaud


    "J. Gleixner" <> wrote in message
    news:VOXfc.296$...
    > Check CPAN for File::Find
    >
    > http://search.cpan.org/
     
    Stéphane Bourdeaud, Apr 17, 2004
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Nicholas Clarke

    filenames with non ascii characters

    Nicholas Clarke, Jan 14, 2004, in forum: Java
    Replies:
    1
    Views:
    409
    Michiel Konstapel
    Jan 15, 2004
  2. B.J.
    Replies:
    4
    Views:
    772
    Toby Inkster
    Apr 23, 2005
  3. =?ISO-8859-1?Q?Marian_Aldenh=F6vel?=

    Printing Filenames with non-Ascii-Characters

    =?ISO-8859-1?Q?Marian_Aldenh=F6vel?=, Feb 1, 2005, in forum: Python
    Replies:
    13
    Views:
    701
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Feb 8, 2005
  4. rtilley
    Replies:
    22
    Views:
    680
    Christos Georgiou
    Mar 2, 2006
  5. David Barri
    Replies:
    3
    Views:
    141
    Austin Ziegler
    Nov 14, 2006
Loading...

Share This Page