The future of the character-encodings library

N

Nikolai Weibull

Hi!

As some of you know the character-encodings library is a bit stale.
It currently can=E2=80=99t be used from Ruby 1.9 (you may ask yourself why =
you
would, I suppose) because of the Encoding namespace being taken, there
have been some compilation problems where gcc on Cygwin/MingW doesn=E2=80=
=99t
support the visibility attribute, and the tests depend on an ancient
version of RSpec. I am in the process of fixing these wrongs, but I
need your help.

The big problem for me is figuring out how to namespace it. But
before anyone tries to come up with a solution, let me describe my
vision of this libraries future.

Character-encodings will be a library that allows you to deal with
UTF-8-encoded Strings in Ruby 1.8 and with collation, normalization,
Unicode-table lookup and other Unicode-specific tasks in Ruby 1.[89].
My original vision was that this library would support many more
encodings, but the internet has spoken and UTF-8 is the future. (I
also had a hope that Ruby programmers were going to begin namespacing
their projects a bit better, but Ruby programmers prefer libraries
called =E2=80=9CHpricot=E2=80=9D over libraries called =E2=80=9CParsers::HT=
ML=E2=80=9D.) Ruby 1.9 adds
support for a range of encodings that I=E2=80=99m not at all interested in =
and
I think that this library needs to be more focused to have any sort of
future.

Therefore, I would like to rename the library and its namespaces to
reflect this change. The apt name =E2=80=9CUnicode=E2=80=9D is, sadly, alr=
eady taken.
I was thinking of =E2=80=9CRunicode=E2=80=9D, but that=E2=80=99s perhaps a=
bit lame.

A second question is one of API design. How should you, from Ruby
1.8, be able to create a UTF-8-aware String? Currently you write
either u"=C3=A4bc" or +"=C3=A4bc". I don=E2=80=99t like this style anymore=
I don=E2=80=99t
want to pollute Kernel or String unnecessarily. I would like to be
able to provide an API that would allow you to run the same .rb file
in both 1.8 and 1.9 and get the same results. This is, perhaps, not
possible, given that 1.9 uses a dizzying array of methods to determine
the encoding of a String. One could, of course, make Kernel#u a no-op
for 1.9. Could any of the users of this library please provide me
with some input on this point.

I=E2=80=99m looking forward to receiving your input!
 
N

Nikolai Weibull

Eric, could you please reply to all in the future? I have =E2=80=9Cskip=E2=
=80=9D set
for this mailing list as, as you point out below, it=E2=80=99s rather high =
in
noise. It makes it rather hard to stitch things together when I can=E2=80=
=99t
easily reply to your reply.

There don't appear to be many users of character-encodings:

https://rubygems.org/gems/character-encodings

I don=E2=80=99t see how this is relevant, but thank you for pointing out my
failure in selling and maintaining my library.
 
E

Eric Hodel

Eric, could you please reply to all in the future?

No. I don't know two of the email addresses in your To header so I =
can't judge if my response is topical for them.

The third appears to be a mailing list to which I am not subscribed. I =
don't wish to fend off possible "you must subscribe" bounces.
I have =93skip=94 set for this mailing list

I don't know what this means.

I think it means that you don't want to see messages from this mailing =
list. If this is true why did you post to it?
as, as you point out below, it=92s rather high in noise.

I don't see where I made this assertion.
It makes it rather hard to stitch things together when I can=92t =
easily reply to your reply.

I don't see why I should be inconvenienced to make it easier for you to =
see responses you do not want to see.
=20
I don=92t see how this is relevant, but thank you for pointing out my
failure in selling and maintaining my library.

I was attempting to suggest that since there aren't many downloads for =
your gem maybe there's no need for you to continue to maintain it in its =
current form (if at all).

Some of the functionality of your gem has been taken up by ruby 1.9. =
Anyone seriously considering handling encodings other than US ASCII =
should move to 1.9. I would rebuild character-encodings atop 1.9 if I =
were in the maintainer and had such a need.

Due to the low number of downloads you have an excellent opportunity to =
throw out your existing API and rebuild your library to integrate well =
with the encoding features of ruby 1.9.

I don't see why you would consider a low number of downloads to be any =
failure on your part. I simply made a statement of fact. I have many, =
many gems that nobody uses and I no longer maintain. It would be =
ridiculous for me to attempt to attach any judgements to such a fact.=
 
N

Nikolai Weibull

No. =C2=A0I don't know two of the email addresses in your To header so I =
can't judge if my response is topical for them.

But I do and I made the judgment call for you.
The third appears to be a mailing list to which I am not subscribed. =C2=
=A0I don't wish to fend off
possible "you must subscribe" bounces.

That is a valid point. I should have cross-posted my request for help inst=
ead.
I think it means that you don't want to see messages from this mailing li= st.

Correct.

If this is true why did you post to it?

Because I wanted this to reach as many (interested) people as
possible. If I=E2=80=99m going to make a big change here I want as many to
know about it as possible.

I know that people have used the library in the past, especially in
back-ends, which makes it a lot harder to know how many users I
actually have. I have, believe it or not, even been paid (minute
amounts) to work on this library. I figured that perhaps there were
some hidden users that I didn=E2=80=99t know about that were still using it
and I therefore posted to the most public Ruby forum that I know of.
I don't see where I made this assertion.

You implicitly made (I thought at the time, see below) it by saying
that the library in question doesn=E2=80=99t have that many users and, as
such, my posting wasn=E2=80=99t relevant to the majority of the readers of
this list. This low level of relevancy is something that I have
judged to be the case for many topics on this list.
asily reply to your reply.
I don't see why I should be inconvenienced to make it easier for you to s=
ee responses you do not want to see.

The inconvenience that you would have to endure by pressing Reply to
all and removing the char-encodings list from the Cc list must surely
not be as great as that which you have put me through by not including
me in the Cc list so that I would receive your response to the posting
that I made (that I, of course, do want).

Either way, this is a moot point, as I=E2=80=99ve now set noskip. (I was
hoping that either those that replied would include me or that the
mailing list software would be intelligent enough to not skip replies
to my postings. I was wrong.)
I was attempting to suggest that since there aren't many downloads for yo=
ur gem maybe there's no need for you to continue to maintain it in its curr=
ent form (if at all).

Then, for my sake, please say so. A short =E2=80=93 easily interpreted as
snide =E2=80=93 remark like that can easily be misinterpreted.
Some of the functionality of your gem has been taken up by ruby 1.9. =C2=
=A0Anyone seriously considering handling encodings other than US ASCII shou=
ld move to 1.9. =C2=A0I would rebuild character-encodings atop 1.9 if I wer=
e in the maintainer and had such a need.

To what need are you referring?

The whole point of the library was to provide UTF-8 support for 1.8.

I now want to shift focus to both providing support for UTF-8 for
those of us stuck with 1.8 (due to 1.9=E2=80=99s horrendous I/O and require
performance on Windows) and as an extension to 1.9=E2=80=99s built-in Unico=
de
support.

Looking at 1.9 it is now (because it sure wasn=E2=80=99t in 2006 when I beg=
an
developing this library) clear that Ruby won=E2=80=99t be supporting a lot =
of
features that would be desirable. You can, for example, not easily
perform collation, normalization, or character-class lookup. Even
such a thing as String#upcase doesn=E2=80=99t seem to be able do the right
thing. I might be doing something wrong, but

# -*- coding: utf-8 -*-

puts "=C3=A4bc".upcase

prints =E2=80=9C=C3=A4BC=E2=80=9D, not =E2=80=9C=C3=84BC=E2=80=9D.
Due to the low number of downloads you have an excellent opportunity to t=
hrow out your existing API and rebuild your library to integrate well with =
the encoding features of ruby 1.9.

I know that this type of behavior is popular in the Ruby community,
but I wanted to give my, albeit few, users a chance to have their say
on this matter.
I don't see why you would consider a low number of downloads to be any fa=
ilure on your part. =C2=A0I simply made a statement of fact.

There are many statements of fact that one can make that are often
best not made.

As I already noted above, you need to contextualize such a statement
so that it=E2=80=99s not open for interpretation. Had you written somethin=
g
along the lines of =E2=80=9CJudging from the statistics at rubygems.org,
perhaps you can get away with your proposed changes without too many
users becoming upset?=E2=80=9D, I would have known what you were trying to =
get
at. As you wrote it it only stands as a pointless remark.
I have many, many gems that nobody uses and I no longer maintain.

I actually wish to continue maintaining this library and I actually do
have active users.
It would be ridiculous for me to attempt to attach any judgements to such=
a fact.

Am I ridiculous for not sharing your level of detachment from your work?

I don=E2=80=99t know if you actually looked at the source code, but it=E2=
=80=99s
actually quite a few lines of (sometimes rather complex) code, and for
me to throw it away without at least considering its future utility is
not something that I could easily do.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top