File.unlink(nonwestern_filename) ---> Error on Windows

J

johan556

Hi!

I use Ruby on Windows, and tried to remove all files in a directory
with the code given below. But if the directory contains files with
filenames having non-western characters the operation fails.

I first encountered this problem when using FileUtils.rm_r, and that
method also fails (for the same reason I guess). This makes FileUtils
quite useless in some situations. We have for example Subversion
projects that contain files with Japanese characters (for testing that
our product works with such characters), and I also tried with Arabic
characters (stored in Unicode in NTFS in both cases).

Is it possible to get Ruby to work with filenames containing
non-western characters at all on Windows? If so, what should I do?

/Johan Holmberg

--------------
Dir.chdir "nonwestern-files"

for entry in Dir.entries(".")
next if entry == "."
next if entry == ".."
n = File.unlink(entry)
puts "failed to delete #{entry}" if n == 0
end
--------------
 
J

John Joyce

Hi!

I use Ruby on Windows, and tried to remove all files in a directory
with the code given below. But if the directory contains files with
filenames having non-western characters the operation fails.

I first encountered this problem when using FileUtils.rm_r, and that
method also fails (for the same reason I guess). This makes FileUtils
quite useless in some situations. We have for example Subversion
projects that contain files with Japanese characters (for testing that
our product works with such characters), and I also tried with Arabic
characters (stored in Unicode in NTFS in both cases).

Is it possible to get Ruby to work with filenames containing
non-western characters at all on Windows? If so, what should I do?

/Johan Holmberg

--------------
Dir.chdir "nonwestern-files"

for entry in Dir.entries(".")
next if entry == "."
next if entry == ".."
n = File.unlink(entry)
puts "failed to delete #{entry}" if n == 0
end

First make sure you set the KCODE

Try using the chars class from ActiveSupport (yes it is a gem that is
part of Rails but it provides a great deal of utf-8 processing)
 
J

johan556

First make sure you set the KCODE

Using KCODE does not change anything. I have tried:

$ ruby -Ke rm-files.rb
$ ruby -Ks rm-files.rb
$ ruby -Ku rm-files.rb
$ ruby -Ka rm-files.rb
$ ruby -Kn rm-files.rb

The problematic files are stored with a name that is a 16-bit
character string in NTFS (what I called Unicode in my earlier mail,
perhaps one should call it "almost UTF-16" or UCS-2, I don't know the
finer details). Anyway, I don't think setting KCODE solves my problem.
Try using the chars class from ActiveSupport (yes it is a gem that is
part of Rails but it provides a great deal of utf-8 processing)

See above. I don't think NTFS stores Unicode filenames in UTF-8.

My assumption when starting to look at this problem was: that a
filename that I got from one function (Dir.entries) would be directly
usable in another function (File.unlink). That was quite naive I
realize :)

But it is still a real problem. As it is now, FileUtils.rm_r does not
work on an arbitrary file-tree. As soon as it contains a file with
"wrong" filename it fails. Maybe this is just a consequence of the way
Ruby is ported to Windows.

/Johan Holmberg
 
J

John Joyce

Using KCODE does not change anything. I have tried:

$ ruby -Ke rm-files.rb
$ ruby -Ks rm-files.rb
$ ruby -Ku rm-files.rb
$ ruby -Ka rm-files.rb
$ ruby -Kn rm-files.rb

The problematic files are stored with a name that is a 16-bit
character string in NTFS (what I called Unicode in my earlier mail,
perhaps one should call it "almost UTF-16" or UCS-2, I don't know the
finer details). Anyway, I don't think setting KCODE solves my problem.

Translation from utf-16 and utf-8 shouldn't be a problem.
Check out unicode.org for more on this than you really want to, or
there is a nice blog article at joelonsoftware
See above. I don't think NTFS stores Unicode filenames in UTF-8.

My assumption when starting to look at this problem was: that a
filename that I got from one function (Dir.entries) would be directly
usable in another function (File.unlink). That was quite naive I
realize :)

But it is still a real problem. As it is now, FileUtils.rm_r does not
work on an arbitrary file-tree. As soon as it contains a file with
"wrong" filename it fails. Maybe this is just a consequence of the way
Ruby is ported to Windows.

/Johan Holmberg

Some file utilities are specifically non-windows. That may be part of
the problem you are having.
Many of those file utilities out there are Ruby versions of utilities
found on *nix systems. Sorry about that.
Much of that is documented in the pickaxe book (v.2) in the second
half of the book. (sorry again, I'm not saying RTFM, just that it is
noted there.)

The win32utils will hopefully do the job. Let us know what works!
This kind of problem is common for lots of people.
 
P

Paul Battley

Hi,

The problematic files are stored with a name that is a 16-bit
character string in NTFS (what I called Unicode in my earlier mail,
perhaps one should call it "almost UTF-16" or UCS-2, I don't know the
finer details). Anyway, I don't think setting KCODE solves my problem.

I haven't used Windows for a long while, but unless something has
changed in the newest releases, Ruby uses the Windows legacy code page
for interacting with the system, which is by default Windows-1252 on
English systems, Shift_JIS on Japanese systems, etc.

Internally, Windows is all Unicode, as is NTFS (I think it's UTF-16,
but that's not really important for this discussion), but applications
using legacy code pages can't communicate strings outside that code
page to the OS.

That means that if you set the legacy code page to Shift_JIS, you can
read and write Japanese file names, but not Arabic ones. If you set it
to Windows-1252, you can use acute accents, but can't touch Japanese
files.

I am led to believe that there is a UTF-8 code page in Windows, and it
is possible to set the legacy code page on an
application-by-application basis, at least on XP (though you might
need a separate Power Toy or similar to do it). If you can get that to
work, it might be possible to manipulate files via the UTF-8
representation of their name. I've never seen it done, though, so this
is entirely hypothetical.

Paul.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,540
Members
45,024
Latest member
ARDU_PROgrammER

Latest Threads

Top