Possible bug, or just confusing documentation of class Icon in the Ruby standard library.

T

tirkal

Hello

The Iconv class (inside the stdlib iconv package, rdoc:
http://www.ruby-doc.org/stdlib/libdoc/iconv/rdoc/classes/Iconv.html)
docs lead one to believe that Iconv.conv is equivalent. As I'm going
to demonstrate, this is not the case. Furthermore, the behavior of
both Iconv.iconv and Iconv#iconv seems to be strange, surprising, and
quite possibly also buggy.

Here is an IRB demonstration of this ( ruby 1.8.2 (2005-04-11) [i386-linux]=
):

irb(main):007:0> Iconv.conv('utf-8', 'windows-1255', "\xe0")
"\327\220" # appropriate output for the given input
irb(main):008:0> Iconv.conv('utf-8', 'windows-1255', "\xe0\xe1")
"\327\220\327\221" # appropriate output for the given input

irb(main):009:0> Iconv.iconv('utf-8', 'windows-1255', "\xe0")
["", "\327\220"] # strange output. why an array? why the
empty-string first element?
irb(main):010:0> Iconv.iconv('utf-8', 'windows-1255', "\xe0\xe1")
["\327\220", "\327\221"] # again, why an array? and why split the string=
?
irb(main):011:0> Iconv.iconv('utf-8', 'windows-1255', "\xe0\xe1\xe2")
["\327\220\327\221", "\327\222"]
irb(main):012:0> Iconv.iconv('utf-8', 'windows-1255', "\xe0\xe1\xe2\xe3")
["\327\220\327\221\327\222", "\327\223"]

irb(main):016:0> Iconv.new('utf-8', 'windows-1255').iconv("\xe0")
"" # last character of the string dropped
irb(main):017:0> Iconv.new('utf-8', 'windows-1255').iconv("\xe0\xe1")
"\327\220"
irb(main):018:0> Iconv.new('utf-8', 'windows-1255').iconv("\xe0\xe1\xe2")
"\327\220\327\221"
irb(main):019:0> Iconv.new('utf-8', 'windows-1255').iconv("\xe0\xe1\xe2\xe3=
")
"\327\220\327\221\327\222"

Adding a newline char at the end of the converted text solves the
problem with Iconv#iconv, and partly solves the problem with
Iconv.conv

irb(main):020:0> Iconv.iconv('utf-8', 'windows-1255', "\xe0\xe1\xe2\xe3" + =
"\n")
["\327\220\327\221\327\222\327\223\n"] # still an array, but at
least it contains the correct, non-split conversion.
irb(main):021:0> Iconv.new('utf-8',
'windows-1255').iconv("\xe0\xe1\xe2\xe3" + "\n"\
)
"\327\220\327\221\327\222\327\223\n" # correct output, including the
last character.

Hope that helps,
Tirkal
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top