Iconv weirdness on Windows XP

W

Wilson Bilkovich

Is anyone else having this problem?
This is a One-Click install of Ruby, with the iconv package installed.
charset.dll and iconv.dll are in c:\windows\system32
iconv.so is in the appropriate place deep within the Ruby folder hierarchy.

Here's a sample IRB session (sorry about the wrapping):
Microsoft Windows XP [Version 5.1.2600] (C) Copyright 1985-2001 Microsoft C=
orp.

irb(main):001:0> require 'iconv' =3D> true
irb(main):002:0> Iconv.iconv('utf-8', 'X-UNKNOWN', 'Hello, world')
Errno::ENOENT: No such file or directory - iconv("utf-8", "X-UNKNOWN")
from (irb):2:in `iconv' from (irb):2
irb(main):003:0> Iconv.iconv('utf-8', 'X-UNKNOWN', 'Hello, world')
(irb):3: [BUG] rb_sys_fail(iconv("utf-8", "X-UNKNOWN")) - errno
=3D=3D 0 ruby 1.8.2 (2004-12-25) [i386-mswin32]

This application has requested the Runtime to terminate it in an
unusual way. Please contact the application's support team for more
information.

C:\Bin>

I first ran into this when running part of the TMail test suite, but I
can now easily duplicate it in IRB.

Thanks,
--Wilson.
 
D

Dave Burt

Wilson Bilkovich asked:
Is anyone else having this problem?
This is a One-Click install of Ruby, with the iconv package installed.
charset.dll and iconv.dll are in c:\windows\system32
iconv.so is in the appropriate place deep within the Ruby folder
hierarchy.

Here's a sample IRB session (sorry about the wrapping):
Microsoft Windows XP [Version 5.1.2600] (C) Copyright 1985-2001 Microsoft

irb(main):001:0> require 'iconv' => true
irb(main):002:0> Iconv.iconv('utf-8', 'X-UNKNOWN', 'Hello, world')
Errno::ENOENT: No such file or directory - iconv("utf-8", "X-UNKNOWN")
from (irb):2:in `iconv' from (irb):2

The first problem here is ENOENT. That's C's "not found", presumably
referring to the character set "X-UNKNOWN" (not any "file or directory").
I'm told [1], this can occur because the config.charset aliases are not
available. But, actually, I can't find x-unknown in the aliases file, and
don't know enough about iconv to know if it should be handling it as an
intrinsic type.

For x-unknown, maybe try substituting "char" or the empty string ""; this
means the locale-dependant default encoding (so isn't the same as
x-unknown).
irb(main):003:0> Iconv.iconv('utf-8', 'X-UNKNOWN', 'Hello, world')
(irb):3: [BUG] rb_sys_fail(iconv("utf-8", "X-UNKNOWN")) - errno
== 0 ruby 1.8.2 (2004-12-25) [i386-mswin32]

This application has requested the Runtime to terminate it in an
unusual way. Please contact the application's support team for more
information.

C:\Bin>

I'm kinda guessing, but it looks like this is a problem with the Iconv
library failing to set errno for the second failure.
I first ran into this when running part of the TMail test suite, but I
can now easily duplicate it in IRB.

Thanks for putting the effort in; it makes responding easy.

Cheers,
Dave

[1] Nobu Nakada,
http://www.ruby-talk.org/cgi-bin/scat.rb/ruby/ruby-talk/158960
 
W

Wilson Bilkovich

The first problem here is ENOENT. That's C's "not found", presumably
referring to the character set "X-UNKNOWN" (not any "file or directory").
I'm told [1], this can occur because the config.charset aliases are not
available. But, actually, I can't find x-unknown in the aliases file, and
don't know enough about iconv to know if it should be handling it as an
intrinsic type.

For x-unknown, maybe try substituting "char" or the empty string ""; this
means the locale-dependant default encoding (so isn't the same as
x-unknown).

When I switch 'X-UNKNOWN' to 'char' in the test fixture, everything works f=
ine.
Specifically, the code that freaks out is:
def convert_to(text, to, from)
return text unless to && from
text ? Iconv.iconv(to, from, text).first : ""
rescue Iconv::IllegalSequence, Errno::EINVAL
# the 'from' parameter specifies a charset other than what the text
# actually is...not much we can do in this case but just return the
# unconverted text.
#
# Ditto if either parameter represents an unknown charset, like
# X-UNKNOWN.
text
end

Given that, it looks like you're right about the library not setting
the proper error environment, because the method that invokes it is
expecting it to throw an error.
Oddly, though, ENOENT isn't on the list of exceptions to rescue there.
Adding it to the list doesn't stop the crash bug, unfortunately.

I tried putting config.charset in both c:/windows/system32, and in
c:/ruby/lib/ruby/1.8/i386-mswin32 (where iconv.so goes), to no avail.
 
N

nobuyoshi nakada

Hi,

At Wed, 14 Dec 2005 00:17:39 +0900,
Dave Burt wrote in [ruby-talk:170489]:
irb(main):003:0> Iconv.iconv('utf-8', 'X-UNKNOWN', 'Hello, world')
(irb):3: [BUG] rb_sys_fail(iconv("utf-8", "X-UNKNOWN")) - errno
== 0 ruby 1.8.2 (2004-12-25) [i386-mswin32]

This application has requested the Runtime to terminate it in an
unusual way. Please contact the application's support team for more
information.

C:\Bin>

I'm kinda guessing, but it looks like this is a problem with the Iconv
library failing to set errno for the second failure.

This failure often occurs if msvcrt DLL versions mismatch.
e.g., ruby and iconv.so use msvcr71.dll whereas iconv.dll uses
msvcrt.dll.
 
D

Dave Burt

Wilson Bilkovich wrote:
When I switch 'X-UNKNOWN' to 'char' in the test fixture, everything works
fine.
Cool.

Specifically, the code that freaks out is:
def convert_to(text, to, from)
...
rescue Iconv::IllegalSequence, Errno::EINVAL
...
Given that, it looks like you're right about the library not setting
the proper error environment, because the method that invokes it is
expecting it to throw an error.
Oddly, though, ENOENT isn't on the list of exceptions to rescue there.
Adding it to the list doesn't stop the crash bug, unfortunately.

It shouldn't be Iconv::IllegalSequence - that's a "bad character" in the
"from" string. There's an exception defined in the library,
Iconv::InvalidEncoding - that makes sense, but apparently isn't used here.
The solution may be to use that instead of Errno.

I haven't seen Errno::EINVAL thrown by iconv, although that's what
ruby-mswin32-1.8.2 does in this situation. And it doesn't crash.

So you've uncovered a bug in my iconv package for the One-Click Installer.
My best guess is that it's an incompatibility problem between One-Click's
ruby core and iconv.so from the ruby-mswin32 package.
I tried putting config.charset in both c:/windows/system32, and in
c:/ruby/lib/ruby/1.8/i386-mswin32 (where iconv.so goes), to no avail.

Nobu's post seemed to indicate the aliases file wasn't going to work on
Windows. It may need an option compiled in to the binary or something. I
don't know.

Cheers,
Dave
 
D

Dave Burt

I wrote just before:
My best guess is that it's an incompatibility problem between One-Click's
ruby core and iconv.so from the ruby-mswin32 package.

OK, so Nobu's confirmed this. (Thanks, Nobu.)

Meanwhile:
It shouldn't be Iconv::IllegalSequence - that's a "bad character" in the
"from" string. There's an exception defined in the library,
Iconv::InvalidEncoding - that makes sense, but apparently isn't used here.
The solution may be to use that instead of Errno.

I haven't seen Errno::EINVAL thrown by iconv, although that's what
ruby-mswin32-1.8.2 does in this situation. And it doesn't crash.

ruby-mswin32 1.8.4 preview 1 actually throws InvalidEncoding in this case,
and this version of iconv.so doesn't demonstrate this crashing bug with
One-Click Installer 1.8.2-15 (which I assume we're all using).

Should I update my Iconv for One-Click package to use this new version? I'm
concerned about potential incompatibility issues between it and Ruby 1.8.2.

In any case, you can get this version of iconv.so from in this zip:
ftp://ftp.ruby-lang.org/pub/ruby/binaries/mswin32/ruby-1.8.4-preview1-i386-mswin32.zip

So this test fixture is still going to fail. It will need to be updated like
so:
-rescue Iconv::IllegalSequence, Errno::EINVAL
+rescue Iconv::IllegalSequence, Iconv::InvalidEncoding

Cheers,
Dave
 
C

Curt Hibbs

I wrote just before:

OK, so Nobu's confirmed this. (Thanks, Nobu.)

I'm currently working on the One-Click Installer for 1.8.4, and this
is one of the things I want to make sure I get right. Any help/advice
would be more than welcome (especially since I know next-to-nothing
about iconv).

Thanks,
Curt
 
W

Wilson Bilkovich

I wrote just before:

OK, so Nobu's confirmed this. (Thanks, Nobu.)

Meanwhile:


ruby-mswin32 1.8.4 preview 1 actually throws InvalidEncoding in this case= ,
and this version of iconv.so doesn't demonstrate this crashing bug with
One-Click Installer 1.8.2-15 (which I assume we're all using).

Should I update my Iconv for One-Click package to use this new version? I= 'm
concerned about potential incompatibility issues between it and Ruby 1.8.= 2.

In any case, you can get this version of iconv.so from in this zip:
ftp://ftp.ruby-lang.org/pub/ruby/binaries/mswin32/ruby-1.8.4-preview1-i38= 6-mswin32.zip

So this test fixture is still going to fail. It will need to be updated l= ike
so:
-rescue Iconv::IllegalSequence, Errno::EINVAL
+rescue Iconv::IllegalSequence, Iconv::InvalidEncoding
Cool. Thank you for this. I replaced my iconv.so file with the one
from the zip file above, and as you predicted, things now work this
way:
C:\ruby\lib\ruby\gems\1.8\gems\actionmailer-1.1.5\test>ruby mail_service_te=
st.rb
Loaded suite mail_service_test
Started
.......F......................................
Finished in 0.453 seconds.

1) Failure:
test_decode_message_with_unknown_charset(ActionMailerTest)
[mail_service_test.rb:718]:
Exception raised:
Class: <Iconv::InvalidEncoding>
Message: <"invalid encoding (\"utf-8\", \"X-UNKNOWN\")">
---Backtrace---
/../lib/action_mailer/vendor/tmail/quoting.rb:82:in `iconv'
/../lib/action_mailer/vendor/tmail/quoting.rb:82:in `convert_to'
/../lib/action_mailer/vendor/tmail/quoting.rb:17:in `unquoted_body'
/../lib/action_mailer/vendor/tmail/quoting.rb:43:in `body'
mail_service_test.rb:718:in `test_decode_message_with_unknown_charset'
mail_service_test.rb:718:in `assert_nothing_raised'
mail_service_test.rb:718:in `test_decode_message_with_unknown_charset'
---------------

46 tests, 137 assertions, 1 failures, 0 errors

Time for a change to that 'rescue' clause, I'd say.

Thanks,
--Wilson.
 
W

Wilson Bilkovich

I'm currently working on the One-Click Installer for 1.8.4, and this
is one of the things I want to make sure I get right. Any help/advice
would be more than welcome (especially since I know next-to-nothing
about iconv).
It's working fine for me now with the 1.8.4 object file, and I'm
running OneClick1.8.2-15.
Are there any other Iconv test suites out there I can run, to make
sure it didn't mangle anything up?

Thanks,
--Wilson.
 
D

Dave Burt

Wilson said:
Cool. Thank you for this. I replaced my iconv.so file with the one
from the zip file above, and as you predicted, things now work this
way:
......F......................................
...
Time for a change to that 'rescue' clause, I'd say.

Nobu informs us that this replacement of Errno -> Exception is one of two
minor changes in iconv between 1.8.2 and 1.8.4. I've updated my zip to
include the super-versioned iconv.so.

I'm glad it's working for you now, and thanks for raising the issue and
helping solve it.

Cheers,
Dave
 
D

Dave Burt

Curt said:
I'm currently working on the One-Click Installer for 1.8.4, and this
is one of the things I want to make sure I get right. Any help/advice
would be more than welcome (especially since I know next-to-nothing
about iconv).

Well, the core of the problem seems to be a difference in compiler versions
between your One-Click binaries and the "borrowed" mswin32 binary. The
problem shows itself in this case when a C errno is used instead of a Ruby
Exception.

Nobu says in [ruby-core:06889] you should compile it from source as part of
the One-Click build:
IMHO, it would be no longer a good idea to assemble pre-built
binaries now, on Windows. I suspect packagers may have to
build all binaries from sources by themselves.

Wilson said:
It's working fine for me now with the 1.8.4 object file, and I'm
running OneClick1.8.2-15.
Are there any other Iconv test suites out there I can run, to make
sure it didn't mangle anything up?

I don't know of any, sorry.

Cheers,
Dave
 
C

Curt Hibbs

Thanks Dave, I already had you on my list of people to contact if/when
I run into trouble.

I'm planning to go through all the extensions included with One-Click
installer (many of which I pick up in binary form) to see how there
were compiled.

Curt

Curt said:
I'm currently working on the One-Click Installer for 1.8.4, and this
is one of the things I want to make sure I get right. Any help/advice
would be more than welcome (especially since I know next-to-nothing
about iconv).

Well, the core of the problem seems to be a difference in compiler versio= ns
between your One-Click binaries and the "borrowed" mswin32 binary. The
problem shows itself in this case when a C errno is used instead of a Rub= y
Exception.

Nobu says in [ruby-core:06889] you should compile it from source as part = of
the One-Click build:
IMHO, it would be no longer a good idea to assemble pre-built
binaries now, on Windows. I suspect packagers may have to
build all binaries from sources by themselves.

Wilson said:
It's working fine for me now with the 1.8.4 object file, and I'm
running OneClick1.8.2-15.
Are there any other Iconv test suites out there I can run, to make
sure it didn't mangle anything up?

I don't know of any, sorry.

Cheers,
Dave
 
W

Wilson Bilkovich

Thanks Dave, I already had you on my list of people to contact if/when
I run into trouble.

I'm planning to go through all the extensions included with One-Click
installer (many of which I pick up in binary form) to see how there
were compiled.

Cool. This issue has bugged me, so I'm writing a test suite for Iconv,
in order to better understand it.
 
W

Wilson Bilkovich

Cool. This issue has bugged me, so I'm writing a test suite for Iconv,
in order to better understand it.
Hrm. The Win32 version of iconv seems to support far fewer encodings.
There are only 296 encodings that are valid, out of a total of 959
supported by iconv according to iconv -l.
That same test script yields 936 encodings on a SuSE Linux box.
Oddly, there are some encodings that work on Win32, but not on the
Linux system. This one is an example:
Iconv.new('ISO_646.IRV:1991','ISO_646.IRV:1991').iconv('')
Works in Windows, raises Errno::EINVAL in Linux.
On the other hand:
Iconv.new('500','500').iconv('')
raises Iconv::InvalidEncoding in Win32, works fine in Linux

Where does the Win32 version get its list of encodings?

Fun stuff.
--Wilson.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top