N
Nikolai Weibull
Hi!
As some of you know the character-encodings library is a bit stale.
It currently can=E2=80=99t be used from Ruby 1.9 (you may ask yourself why =
you
would, I suppose) because of the Encoding namespace being taken, there
have been some compilation problems where gcc on Cygwin/MingW doesn=E2=80=
=99t
support the visibility attribute, and the tests depend on an ancient
version of RSpec. I am in the process of fixing these wrongs, but I
need your help.
The big problem for me is figuring out how to namespace it. But
before anyone tries to come up with a solution, let me describe my
vision of this libraries future.
Character-encodings will be a library that allows you to deal with
UTF-8-encoded Strings in Ruby 1.8 and with collation, normalization,
Unicode-table lookup and other Unicode-specific tasks in Ruby 1.[89].
My original vision was that this library would support many more
encodings, but the internet has spoken and UTF-8 is the future. (I
also had a hope that Ruby programmers were going to begin namespacing
their projects a bit better, but Ruby programmers prefer libraries
called =E2=80=9CHpricot=E2=80=9D over libraries called =E2=80=9CParsers::HT=
ML=E2=80=9D.) Ruby 1.9 adds
support for a range of encodings that I=E2=80=99m not at all interested in =
and
I think that this library needs to be more focused to have any sort of
future.
Therefore, I would like to rename the library and its namespaces to
reflect this change. The apt name =E2=80=9CUnicode=E2=80=9D is, sadly, alr=
eady taken.
I was thinking of =E2=80=9CRunicode=E2=80=9D, but that=E2=80=99s perhaps a=
bit lame.
A second question is one of API design. How should you, from Ruby
1.8, be able to create a UTF-8-aware String? Currently you write
either u"=C3=A4bc" or +"=C3=A4bc". I don=E2=80=99t like this style anymore=
I don=E2=80=99t
want to pollute Kernel or String unnecessarily. I would like to be
able to provide an API that would allow you to run the same .rb file
in both 1.8 and 1.9 and get the same results. This is, perhaps, not
possible, given that 1.9 uses a dizzying array of methods to determine
the encoding of a String. One could, of course, make Kernel#u a no-op
for 1.9. Could any of the users of this library please provide me
with some input on this point.
I=E2=80=99m looking forward to receiving your input!
As some of you know the character-encodings library is a bit stale.
It currently can=E2=80=99t be used from Ruby 1.9 (you may ask yourself why =
you
would, I suppose) because of the Encoding namespace being taken, there
have been some compilation problems where gcc on Cygwin/MingW doesn=E2=80=
=99t
support the visibility attribute, and the tests depend on an ancient
version of RSpec. I am in the process of fixing these wrongs, but I
need your help.
The big problem for me is figuring out how to namespace it. But
before anyone tries to come up with a solution, let me describe my
vision of this libraries future.
Character-encodings will be a library that allows you to deal with
UTF-8-encoded Strings in Ruby 1.8 and with collation, normalization,
Unicode-table lookup and other Unicode-specific tasks in Ruby 1.[89].
My original vision was that this library would support many more
encodings, but the internet has spoken and UTF-8 is the future. (I
also had a hope that Ruby programmers were going to begin namespacing
their projects a bit better, but Ruby programmers prefer libraries
called =E2=80=9CHpricot=E2=80=9D over libraries called =E2=80=9CParsers::HT=
ML=E2=80=9D.) Ruby 1.9 adds
support for a range of encodings that I=E2=80=99m not at all interested in =
and
I think that this library needs to be more focused to have any sort of
future.
Therefore, I would like to rename the library and its namespaces to
reflect this change. The apt name =E2=80=9CUnicode=E2=80=9D is, sadly, alr=
eady taken.
I was thinking of =E2=80=9CRunicode=E2=80=9D, but that=E2=80=99s perhaps a=
bit lame.
A second question is one of API design. How should you, from Ruby
1.8, be able to create a UTF-8-aware String? Currently you write
either u"=C3=A4bc" or +"=C3=A4bc". I don=E2=80=99t like this style anymore=
I don=E2=80=99t
want to pollute Kernel or String unnecessarily. I would like to be
able to provide an API that would allow you to run the same .rb file
in both 1.8 and 1.9 and get the same results. This is, perhaps, not
possible, given that 1.9 uses a dizzying array of methods to determine
the encoding of a String. One could, of course, make Kernel#u a no-op
for 1.9. Could any of the users of this library please provide me
with some input on this point.
I=E2=80=99m looking forward to receiving your input!