File.unlink(nonwestern_filename) ---> Error on Windows

Discussion in 'Ruby' started by johan556@gmail.com, Jul 5, 2007.

  1. Guest

    Hi!

    I use Ruby on Windows, and tried to remove all files in a directory
    with the code given below. But if the directory contains files with
    filenames having non-western characters the operation fails.

    I first encountered this problem when using FileUtils.rm_r, and that
    method also fails (for the same reason I guess). This makes FileUtils
    quite useless in some situations. We have for example Subversion
    projects that contain files with Japanese characters (for testing that
    our product works with such characters), and I also tried with Arabic
    characters (stored in Unicode in NTFS in both cases).

    Is it possible to get Ruby to work with filenames containing
    non-western characters at all on Windows? If so, what should I do?

    /Johan Holmberg

    --------------
    Dir.chdir "nonwestern-files"

    for entry in Dir.entries(".")
    next if entry == "."
    next if entry == ".."
    n = File.unlink(entry)
    puts "failed to delete #{entry}" if n == 0
    end
    --------------
     
    , Jul 5, 2007
    #1
    1. Advertising

  2. John Joyce Guest

    On Jul 5, 2007, at 8:00 AM, wrote:

    > Hi!
    >
    > I use Ruby on Windows, and tried to remove all files in a directory
    > with the code given below. But if the directory contains files with
    > filenames having non-western characters the operation fails.
    >
    > I first encountered this problem when using FileUtils.rm_r, and that
    > method also fails (for the same reason I guess). This makes FileUtils
    > quite useless in some situations. We have for example Subversion
    > projects that contain files with Japanese characters (for testing that
    > our product works with such characters), and I also tried with Arabic
    > characters (stored in Unicode in NTFS in both cases).
    >
    > Is it possible to get Ruby to work with filenames containing
    > non-western characters at all on Windows? If so, what should I do?
    >
    > /Johan Holmberg
    >
    > --------------
    > Dir.chdir "nonwestern-files"
    >
    > for entry in Dir.entries(".")
    > next if entry == "."
    > next if entry == ".."
    > n = File.unlink(entry)
    > puts "failed to delete #{entry}" if n == 0
    > end
    > --------------
    >


    First make sure you set the KCODE

    Try using the chars class from ActiveSupport (yes it is a gem that is
    part of Rails but it provides a great deal of utf-8 processing)
     
    John Joyce, Jul 5, 2007
    #2
    1. Advertising

  3. Guest

    On 7/5/07, John Joyce <> wrote:
    > On Jul 5, 2007, at 8:00 AM, wrote:
    > >
    > > I use Ruby on Windows, and tried to remove all files in a directory
    > > with the code given below. But if the directory contains files with
    > > filenames having non-western characters the operation fails.
    > >

    >
    > First make sure you set the KCODE
    >


    Using KCODE does not change anything. I have tried:

    $ ruby -Ke rm-files.rb
    $ ruby -Ks rm-files.rb
    $ ruby -Ku rm-files.rb
    $ ruby -Ka rm-files.rb
    $ ruby -Kn rm-files.rb

    The problematic files are stored with a name that is a 16-bit
    character string in NTFS (what I called Unicode in my earlier mail,
    perhaps one should call it "almost UTF-16" or UCS-2, I don't know the
    finer details). Anyway, I don't think setting KCODE solves my problem.

    > Try using the chars class from ActiveSupport (yes it is a gem that is
    > part of Rails but it provides a great deal of utf-8 processing)
    >


    See above. I don't think NTFS stores Unicode filenames in UTF-8.

    My assumption when starting to look at this problem was: that a
    filename that I got from one function (Dir.entries) would be directly
    usable in another function (File.unlink). That was quite naive I
    realize :)

    But it is still a real problem. As it is now, FileUtils.rm_r does not
    work on an arbitrary file-tree. As soon as it contains a file with
    "wrong" filename it fails. Maybe this is just a consequence of the way
    Ruby is ported to Windows.

    /Johan Holmberg
     
    , Jul 5, 2007
    #3
  4. John Joyce Guest

    On Jul 5, 2007, at 11:32 AM, wrote:

    > On 7/5/07, John Joyce <> wrote:
    >> On Jul 5, 2007, at 8:00 AM, wrote:
    >> >
    >> > I use Ruby on Windows, and tried to remove all files in a directory
    >> > with the code given below. But if the directory contains files with
    >> > filenames having non-western characters the operation fails.
    >> >

    >>
    >> First make sure you set the KCODE
    >>

    >
    > Using KCODE does not change anything. I have tried:
    >
    > $ ruby -Ke rm-files.rb
    > $ ruby -Ks rm-files.rb
    > $ ruby -Ku rm-files.rb
    > $ ruby -Ka rm-files.rb
    > $ ruby -Kn rm-files.rb
    >
    > The problematic files are stored with a name that is a 16-bit
    > character string in NTFS (what I called Unicode in my earlier mail,
    > perhaps one should call it "almost UTF-16" or UCS-2, I don't know the
    > finer details). Anyway, I don't think setting KCODE solves my problem.
    >


    Translation from utf-16 and utf-8 shouldn't be a problem.
    Check out unicode.org for more on this than you really want to, or
    there is a nice blog article at joelonsoftware

    >> Try using the chars class from ActiveSupport (yes it is a gem that is
    >> part of Rails but it provides a great deal of utf-8 processing)
    >>

    >
    > See above. I don't think NTFS stores Unicode filenames in UTF-8.
    >
    > My assumption when starting to look at this problem was: that a
    > filename that I got from one function (Dir.entries) would be directly
    > usable in another function (File.unlink). That was quite naive I
    > realize :)
    >
    > But it is still a real problem. As it is now, FileUtils.rm_r does not
    > work on an arbitrary file-tree. As soon as it contains a file with
    > "wrong" filename it fails. Maybe this is just a consequence of the way
    > Ruby is ported to Windows.
    >
    > /Johan Holmberg
    >


    Some file utilities are specifically non-windows. That may be part of
    the problem you are having.
    Many of those file utilities out there are Ruby versions of utilities
    found on *nix systems. Sorry about that.
    Much of that is documented in the pickaxe book (v.2) in the second
    half of the book. (sorry again, I'm not saying RTFM, just that it is
    noted there.)

    The win32utils will hopefully do the job. Let us know what works!
    This kind of problem is common for lots of people.
     
    John Joyce, Jul 6, 2007
    #4
  5. Paul Battley Guest

    Hi,

    On 05/07/07, <> wrote:
    > The problematic files are stored with a name that is a 16-bit
    > character string in NTFS (what I called Unicode in my earlier mail,
    > perhaps one should call it "almost UTF-16" or UCS-2, I don't know the
    > finer details). Anyway, I don't think setting KCODE solves my problem.


    I haven't used Windows for a long while, but unless something has
    changed in the newest releases, Ruby uses the Windows legacy code page
    for interacting with the system, which is by default Windows-1252 on
    English systems, Shift_JIS on Japanese systems, etc.

    Internally, Windows is all Unicode, as is NTFS (I think it's UTF-16,
    but that's not really important for this discussion), but applications
    using legacy code pages can't communicate strings outside that code
    page to the OS.

    That means that if you set the legacy code page to Shift_JIS, you can
    read and write Japanese file names, but not Arabic ones. If you set it
    to Windows-1252, you can use acute accents, but can't touch Japanese
    files.

    I am led to believe that there is a UTF-8 code page in Windows, and it
    is possible to set the legacy code page on an
    application-by-application basis, at least on XP (though you might
    need a separate Power Toy or similar to do it). If you can get that to
    work, it might be possible to manipulate files via the UTF-8
    representation of their name. I've never seen it done, though, so this
    is entirely hypothetical.

    Paul.
     
    Paul Battley, Jul 6, 2007
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Paul Urbanus
    Replies:
    0
    Views:
    2,255
    Paul Urbanus
    Apr 7, 2006
  2. Coffee Pot

    How can I unlink/delete an open file in Windows?

    Coffee Pot, Oct 18, 2008, in forum: C Programming
    Replies:
    12
    Views:
    717
    CBFalconer
    Oct 20, 2008
  3. Thomas Jollans

    Re: os.unlink on Windows

    Thomas Jollans, Aug 7, 2010, in forum: Python
    Replies:
    5
    Views:
    852
    Lawrence D'Oliveiro
    Aug 9, 2010
  4. Nicholas Manning

    tempfile.rb and unlink on windows

    Nicholas Manning, May 19, 2009, in forum: Ruby
    Replies:
    2
    Views:
    228
    Nicholas Manning
    May 19, 2009
  5. Paul Urbanus

    perlcc error - 'Can't unlink error file...'

    Paul Urbanus, Apr 7, 2006, in forum: Perl Misc
    Replies:
    2
    Views:
    199
    Mothra
    Apr 7, 2006
Loading...

Share This Page