Investigating Ruby - key limitations ?

Gyoung-Yoon Noh · Nov 8, 2005

I've done a bit of work with Unicode and there are two primary
objections to the Unicode standard and representations by Japanese
developers and users (I am generalising to CJK below):

1. The politics. The Eastern scripts were treated rather badly initially
and were painfully under-represented in Unicode 1.0 and probably
Unicode 2.0. I don't think that CJK were adequately represented until
Unicode 3.0, to be honest.

2. The political impact on representation. The existing CJK encodings
were/are relatively efficient for storing CJK, although some forms
used shift-in/shift-out byte markers, increasing the total number of
symbols that could be represented in a relatively efficent storage
format, even if processing time was a little slower.

In contrast, UTF-8 and UTF-16 are relatively inefficient. IIRC, most
Japanese encodings will use between 1 and 2 bytes per glyph. UTF-8
will use between 3 and 4 bytes per glyph. UTF-16 uses either 2 or 4
bytes per glyph.

-austin

Good summarization. But I'd like to add some comments.

In most case, inefficiency of storage and text-processing of unicode
encodings is a minor issue compared to other crucial factors. Computer
world is enoughly modernized.

As you also mentioned, most CJK(especially Japanese) people dislike the
great stupid process of Han Unification[1] by Unicode Consortium first.
Another big obstacle is incompatibility with other legacy systems
elsewhere. Indeed, there's no strong/compelling reasons to change their
charset to common people. But it can be benefit to some developers
getting 'I18N' more conveniently.

Robert Klemme · Nov 8, 2005

Daniel said:
Ruby does have support for native threads in the form of Proc.new
However, it is recomended that you don't use them except for in
situations where you're doing long running tasks in native code. This
is because Ruby's lightweight threads spawn *much* faster than the
native Proc threads.

Either this is a misunderstanding or I missed something fundamental. From
what I know

1. Proc.new has nothing to do with native threads.

2. Modern OS's can spawn new threads at a high rate so IMHO the difference
between native thread creation and ruby thread creation to pay off for the
disadvantage of not using multiple CPUs. Here are some stats with a
current 1.4 Java VM:

max t11 - start time in thread: 32
avg t11 - start time in thread: 0.8901
max t2 - creation time : 32
avg t2 - creation time : 0.0187
max t3 - start time in main : 32
avg t3 - start time in main : 0.2452

t11 is the time it takes to reach the first line of code inside the new
thread.
t2 is the time it takes to create the thread instance (in the calling
thread)
t3 is the time it takes to create and start the thread (in the calling
thread)

Times are in milliseconds!

Kind regards

robert

gregarican · Nov 8, 2005

Kev said:
FYI Komodo from activestate has recently added ruby support, the windows
beta version is available for download, I may look at it at some point

I am downloading the Windows trial version as I'm typing this. From
what I read it sounds like it would be a nice IDE. After years of
programming in different languages I still fall back to basic text
editors for most of my work. That's why the old IDE argument against
Ruby falls on deaf ears in my case. It's like a good literary author.
They could have a sharpened #2 pencil and a spiral notepad, an old
Smith Corona typewriter, an Apple Powerbook, or a secretary sitting
beside them taking dictation. The work ultimately will be the work. An
fancy IDE that performs autocompletions, suggestions, syntax
highlighting, etc. seems to be analogous to having all of these
writer's dictionaries, thesaurii, etc. at their beck and call...

mortench · Nov 8, 2005

Thanks for all the replies. Since I can't possibly reply to all, I will
just reply here.

1. Since early Java VM's also used a simple interpreter I think a
comparison is perfectly ok despite some reservations in some replies.
It is correct that algortihms may be more important than speed of VM
but that is not the issue here. I understand that Ruby may be fast
enough for a lot of things but in order to be able to use it for
all/most things the overhead need to go down. Java has developed a log
and I find that carefully(!) crafted java code can compare with C++ in
performance... Thanks for the tip about YARV, I will follow it (however
as I hinted in my blog, it may be a smarter decision of the Ruby
builders to base a new Ruby version on existing highly tuned VM's like
the Java Hotspot VM).

2. I am glad that multi-thread support is getting worked on. As I noted
in my blog there is of cause also JRuby which has multi-thread support
because it uses the Java VM.

3. I think lack of first-class unicode support for what ever the
reason, is a bad business-decision. Because of that a lot of people
like me (with limited Ruby experience) will forever wonder if they will
run into unforseen problems if they use Ruby.

4. I don't buy the argument often mentioned that some languages are so
compact/productive that they don't require IDE's. Better languges and
technolgies raises the bar so I am sure that compact/productive
languges like Ruby will just mean that the requirements are raised
accordingly. Whatever the langugage we will end up with really big
projects anyway. Even for small projects, IDE's with background
compilation, completion, refactoring, debuggin, profiling,
code-coverage, help ... is of great benefit. Iy may be right that
novice developers may be harmed by a IDE because a few uninformed ones
become clueless about the way things work, but this not a serious
issue. Emacs may be useful for hackers and occational programmers but I
can't understand why an EXPERIENCED developer/architect would want to
develop without an productiviy-amplifying IDE.

- Morten Christensen

Austin Ziegler · Nov 8, 2005

Thanks for all the replies. Since I can't possibly reply to all, I
will just reply here.

1. Since early Java VM's also used a simple interpreter

This is not true. Java has *always* been statically compiled to
bytecode. Hotspot VMs are simply refinements to the existing bytecode
interpreter. It has never, however, been "simple." The rest of your
paragraph about the Java VM is a bit nonsensical because of this error.

Thanks for the tip about YARV, I will follow it (however
as I hinted in my blog, it may be a smarter decision of the Ruby
builders to base a new Ruby version on existing highly tuned VM's like
the Java Hotspot VM).

Except that the Java Hotspot VM is *not* highly tuned for anything that
doesn't work *exactly* *like* *Java*. A better target will be the .NET
3.0 platform which is getting dynamic language tuning. However, no
existing VM supports the features that Ruby requires, which is exactly
why YARV/Rite is being done. Before you make grand pronouncements, I
strongly recommend that you do your research, first.

2. I am glad that multi-thread support is getting worked on. As I
noted in my blog there is of cause also JRuby which has multi-thread
support because it uses the Java VM.

Hopefully, though, it will not eliminate the green threads the Ruby has
which are sufficient for most programs.

3. I think lack of first-class unicode support for what ever the
reason, is a bad business-decision. Because of that a lot of people
like me (with limited Ruby experience) will forever wonder if they
will run into unforseen problems if they use Ruby.

This is simply your not understanding Unicode. There's no such thing as
"first-class Unicode" support. If you want UTF-8 strings, then say so.
(You'll be getting them, after a fashion, in Ruby 2.0, because Matz is
adding encoding-aware M17N strings.) But UTF-8 strings aren't
necessarily Unicode -- it's a string that has been encoded in a Unicode
format. "Unicode strings" to the ICU library are UTF-16 encoded. What
about UCS-32? "First-class Unicode support" is marketing nonsense
perpetuated by people who don't know any better.

4. I don't buy the argument often mentioned that some languages are so
compact/productive that they don't require IDE's.

You may not buy, but it's largely true.

Better languges and technolgies raises the bar so I am sure that
compact/productive languges like Ruby will just mean that the
requirements are raised accordingly. Whatever the langugage we will
end up with really big projects anyway.

Not if you're smart about it. One of the things that I'm doing at my job
is pushing for ever smaller libraries that can then be combined into
larger projects with better granularity. This matters because not only
are your pieces then easier to replace in bug-fixing, but you've reduced
code coupling and those individual pieces are themselves easier to
manage.

Even for small projects, IDE's with background compilation,
completion, refactoring, debuggin, profiling, code-coverage, help ...
is of great benefit.

Nope. Background compilation doesn't help me (especially with Ruby, but
even with C/C++). Code completion helps sometimes, but I usually end up
browsing either the documentation (if it's not my code or open source)
or the source (if it's my code or open source) anyway. I haven't dealt
with a refactoring browser/IDE yet, so I can't speak toward that. I
have, however, done *dozens* of refactorings in both Ruby and C/C++ with
nothing more than vim. Profiling and code coverage is ... of limited use
in an IDE, I find. I'm actually finding that DevPartner Studio *gets in
my way*, whereas other tools that don't try to integrate themselves into
the IDE work better.

Iy may be right that novice developers may be harmed by a IDE because
a few uninformed ones become clueless about the way things work, but
this not a serious issue.

This is not what I said. I said that IDEs are harmful in general. I'm
not talking about novice developers. I have yet to meet a developer
using an IDE who is as proficient as I am with vim. In fact, I have a
lot of folks who come by my desk ask me to *slow down* because they
can't see what I'm doing when I'm making changes and folding code faster
than they could possibly do in Visual Studio.

Emacs may be useful for hackers and occational programmers but I can't
understand why an EXPERIENCED developer/architect would want to
develop without an productiviy-amplifying IDE.

Because there's no such thing as a "productivity-amplifying IDE"? At
least, I have yet to meet one. I mean, with Java, it'd be hard for an
IDE *not* to enhance one's productivity, because you'd cut the amount of
typing in half by using auto-generation. But by and large, IDEs end up
getting in the way. I do 90% of my development -- even on Windows -- in
vim. I end up looking up most Win32 method calls on the msdn website
rather than in VS.NET.

You've obviously convinced yourself that Ruby isn't ready for prime
time. I think that simultaneously disappoints and pleases people here.
It disappoints because you've convinced yourself based on bad
information and assumptions, and that you're conveying that bad
information and assumptions to a wider audience who may be less informed
than you. It pleases because it means that more of the overall Ruby
consulting pie will be available to those who think that Ruby *is*
ready.

-austin

gwtmp01 · Nov 8, 2005

but that is not the issue here. I understand that Ruby may be fast
enough for a lot of things but in order to be able to use it for
all/most things the overhead need to go down.

How should one interpret this statement? Clearly lots of people
are using Ruby for a wide variety of projects. It may be true
that for *your* application the run-time overhead is a problem
but it seems presumptuous to extend that claim to "all/most"
projects. In some cases, throwing hardware at the problem can
solve the overhead issues and can often be cheaper if the
development costs can be reduced by use of the "slower" language.

Gary Wright

mortench · Nov 8, 2005

Re: comments by Austin Ziegler

1. You confuses what I said about the VM for the job of the compiler.
The VM executes compiled java code. Early Java VM's was based on a
simple interpreter of bytecode. Hence the comparison is in perfect
order. And hotspot works fine on everything that can be compiled to
bytecode, like Jython or any of the several scripting languges for the
java vm. The next verison of java after mustang will even have special
optimized bytecode instructions for (scripting) languages.

3. "First class" is not marketing nonsense but means integrated,
build-in language support (for unicode) at a level comparable to other
features. From what I have read ruby does not have that. Maybe I am
wrong because I don't know Ruby that well?

4. I guess we will never agree on the IDE issue - but I will admit that
it MAY be a matter of taste. However, I hope you do understand that it
is an issue for some (and properly for most people taking into account
how many professionals that uses IDEs).

Finally, don't jump to the conclusion that I don't like Ruby or do not
think it is appropiate. I have investigated Ruby because I found it
very interesting (mainly because of the metaprogramming capabilities
and ROR). However I have identified some current shortcommings listed
here that are serious. This does not mean I won't do development in
Ruby or recommend it (in fact I will do the opposite). Rather I will be
careful about what I do and what I recommend Ruby for. I wrote this
because I would welcome any problems/errors in my analysis not to "talk
Ruby down".

I thank you for the feedback.

Austin Ziegler · Nov 8, 2005

Re: comments by Austin Ziegler

1. You confuses what I said about the VM for the job of the compiler.
The VM executes compiled java code. Early Java VM's was based on a
simple interpreter of bytecode. Hence the comparison is in perfect
order. And hotspot works fine on everything that can be compiled to
bytecode, like Jython or any of the several scripting languges for the
java vm. The next verison of java after mustang will even have special
optimized bytecode instructions for (scripting) languages.

Um. (1) Early Java VMs were still bytecode based. Even "interpreted",
bytecodes are more optimizable -- in general -- than ASTs. The full
HotSpot VMs weren't around for a few more years, but early Java VMs were
still more than simple interpreters. (2) It is unlikely that the
bytecode instructions will actively help Ruby -- and if it's The Next
Version After, that's completely useless in terms of reality.

3. "First class" is not marketing nonsense but means integrated,
build-in language support (for unicode) at a level comparable to other
features. From what I have read ruby does not have that. Maybe I am
wrong because I don't know Ruby that well?

Okay, you're still not listening. I've done Unicode work. "First-class
Unicode Support" is marketing bullshit if you're not willing to call it
nonsense. What do *you* mean by "First-class Unicode Support"? That
description will be the basis of the feature.

Do you want UTF-8 strings? Ruby supports them. Do you want to
*manipulate* UTF-8 strings on a character level? Ruby *will* support
that. Even now, with Oniguruma (at least; the other regex engine may
support it), you can split by character using regex. Do you want to
convert between ISO-8859-1 and UTF-8? Ruby supports that, too.

Just saying you want "First-class Unicode Support" is nonsense. Ruby's
strings *are not* any Unicode encoding. At the same time, they're not
ISO-8859-1 encoding, either. Ruby's strings are -- in many ways -- what
other languages might call a ByteVector. In Ruby 2.0, Matz has indicated
that Ruby's strings will carry around an encoding flag that explicitly
indicates how they should be interpreted. This is, in fact, far
*superior* to what Java and Python do -- which are limited to UTF-8
string representations (AFAIK). Nothing will prevent Ruby from carrying
around full UTF-16 if that's appropriate -- and converting between
UTF-16 and EUC-JP if that's what's needed.

"First-class Unicode Support" is and always will be nonsensical. Tell us
what features you want for Unicode handling in Ruby. Most of the time,
if you're not doing, say, per-character analysis, you'll never even care
that Ruby 1.8's String values are raw byte streams.

4. I guess we will never agree on the IDE issue - but I will admit
that it MAY be a matter of taste. However, I hope you do understand
that it is an issue for some (and properly for most people taking into
account how many professionals that uses IDEs).

Sure, it's mostly an issue for PHBs, though. I, for one, am unconcerned
about IDE vendors or PHBs that depend on them. I will be trying
ActiveState Komodo, though.

IME, IDEs promote excessively large programs.

-austin

gabriele renzi · Nov 8, 2005

Austin Ziegler ha scritto:

Just saying you want "First-class Unicode Support" is nonsense. Ruby's
strings *are not* any Unicode encoding. At the same time, they're not
ISO-8859-1 encoding, either. Ruby's strings are -- in many ways -- what
other languages might call a ByteVector. In Ruby 2.0, Matz has indicated
that Ruby's strings will carry around an encoding flag that explicitly
indicates how they should be interpreted. This is, in fact, far
*superior* to what Java and Python do -- which are limited to UTF-8
string representations (AFAIK).

I don't recall how java handles encodings (apart that it should be using
utf-16) but I can assure you that python is not limited to utf8 strings,
it has objects of class "unicode" which are like arrays of codepoints
and it ships transformation tables from an encoding to another.
Actually, ruby is borrowing something from it (the ugly encoding header

For the rest, I agree that it would be better for the OP to clearly
analize what he needs since probably ruby+uconv could be enough.

Martin DeMello · Nov 9, 2005

Devin Mullins said:
Personally, I need basically three things in an IDE:
1. Indentation management. Auto-indents when you hit enter, and allows
you to indent/unindent blocks of text.
2. Syntax highlighting.
3. "Folder drawer." TextMate does this perfectly for me -- no bells, no
whistles, just an interactive view of my directory tree. This is
especially important for Rails, where you start off with a pretty fancy
tree from Day 0.

I'm pretty sure it's possible to get all that in vi (or, at least, vim),
but I'm way too lazy to set it up.

That's my top 3 too - I do it by using konqueror in tree mode and
associating everything with gvim. Works fairly decently, though
Textmate's project tree support looks a lot better.

martin

jussij · Nov 10, 2005

(Exuberant)? ctags might help with that. It's a fairly

general 'search for the file/line that defined this
method'-type thing. That's all I know about it.

The Zeus for Windows IDE uses the ctags output to drive
it's class browsing, tag searching and intellisensing
features.

1. Indentation management.
2. Syntax highlighting.
3. "Folder drawer."
..
I'm pretty sure it's possible to get all that in vi (or,
at least, vim), but I'm way too lazy to set it up.

FWIW Zeus has all these features and more:

http://www.zeusedit.com/features.html

Note: Zeus is shareware (45 day trial).

Jussi Jumppanen
Author: Zeus for Windows IDE

John W. Kennedy · Nov 12, 2005

Austin said:
This is, in fact, far
*superior* to what Java and Python do -- which are limited to UTF-8
string representations (AFAIK).

Java has /always/ used UTF-16 internally, and currently has the ability
to read and write US-ASCII, ISO-8859-1, UTF-8, UTF-16BE (big-endian)
UTF-16LE (little-endian), and UTF-16 (byte-order marked) at a minimum,
plus whatever other encodings the implementor chooses to add. (Sun Java
for Windows includes a total of 148.)

Austin Ziegler · Nov 12, 2005

Java has /always/ used UTF-16 internally, and currently has the ability
to read and write US-ASCII, ISO-8859-1, UTF-8, UTF-16BE (big-endian)
UTF-16LE (little-endian), and UTF-16 (byte-order marked) at a minimum,
plus whatever other encodings the implementor chooses to add. (Sun Java
for Windows includes a total of 148.)

That's good to know.

-austin

markjreed · Nov 18, 2005

Ruby's strings are -- in many ways -- what other languages might call a ByteVector.

And that's the problem. A String should be composed of *characters*,
not bytes. I shouldn't need to care what bytes are used to store them
unless I'm dealing with a binary file format. The advantage of Unicode
is that it's a superset of almost all other encodings, with a globally
unique identifier number for each of the characters in its repertoire
(with room for almost 1.5 million of them).

I understand the objections to Han unification, but compromises had to
be made to fit everything into a single codespace of a reasonable size.
The resulting need for metadata indicating language is really no
different than it is across the many languages which use the Roman
alphabet. And there are already extra Han Zi characters outside the
unified set in the BMP; I wouldn't be surprised to see more national
variants recognized that way over time.

As far as storage space, it's rapidly becomin a non-issue. Two bytes
per character just isn't that bad, and most Unicode apps use UTF-16 in
memory; UTF-16 is also the native on-disk representation in Mac OS X.
Heck, I wouldn't be surprised to see things starting to use UTF-32 in
the near future. There is a lot of UTF-8 online, and it is biased
toward Western characters, but that's not necessarily the worst choice
for source code in a programming language whose keywords are all
composed entirely of ASCII characters. But here's always SCSU or
BOCU-1 if you want more efficient encoding of Eastern languages.

This is, in fact, far *superior* to what Java and Python do -- which are limited to UTF-8
string representations (AFAIK).

Not true of Java or Python. Or Perl, for that matter; I don't know if
the Perl5 interpreter will read source code in anything other than
ASCII/Latin-1/UTF-8, but a Perl program can certainly read and write
strings in just about any encoding you can think of via the "use
encoding" pragma. Including Shift-JIS, EUC-JP, Big5, or anything else
.. . . while the interface presented to the Perl programmer always uses
Unicode. That's what I think Ruby should do as well.

gwtmp01 · Nov 18, 2005

And that's the problem. A String should be composed of *characters*,
not bytes.

I'm no expert on character encodings but as far as I know, the concept
of "character" can be pretty complicated. I'd rather have a simple,
clean,
obvious implementation of a byte vector and build other more complicated
concepts on top of that (i.e. other classes/modules).
Right now, Ruby happens to package up byte vector functionality in a
class
called "String".

If you are simply saying that the choice of names was unfortunate,
well then
OK. But if you are asking to insert all the complications of encodings,
glyphs, characters, code-points, multi-byte encoding, and so on, into
the
same class that we use to manipulate byte vectors, then I vote no.

As far as I understand, the trajectory on this issue in Ruby is a
small change
to associate an "encoding" sigil with the byte vector. More complicated
facilities could then be constructed on top of this foundation. That
seems
like a reasonable approach that doesn't really change the current
semantics
of String.

Gary Wright

Curt Sampson · Nov 19, 2005

Personally, I need basically three things in an IDE:
1. Indentation management. Auto-indents when you hit enter, and allows you to
indent/unindent blocks of text.
2. Syntax highlighting.
3. "Folder drawer." TextMate does this perfectly for me -- no bells, no
whistles, just an interactive view of my directory tree.
...
I'm pretty sure it's possible to get all that in vi (or, at least, vim), but
I'm way too lazy to set it up.

1. Standard vi. ":set autoindent", ":set shiftwidth=4" (or whatever
you prefer), and the ">>" and "<<" commands, combined with appropriate
movement commands. (E.g., ">}" to shift from cursor position to the end
of the paragraph.)

2. One file to download, and ":syntax on". I'd let you know which file,
but I don't use it myself, so google is your friend.

3. You want the "bufexplorer" plugin:
http://lanzarotta.tripod.com/vim/plugin/6/bufexplorer.vim.zip

(And I thought that *I* was lazy!)

cjs

Yukihiro Matsumoto · Nov 19, 2005

Hi,

In message "Re: Investigating Ruby - key limitations ?"
on Sat, 19 Nov 2005 01:47:18 +0900, (e-mail address removed) writes:

|> This is, in fact, far *superior* to what Java and Python do -- which are limited to UTF-8
|> string representations (AFAIK).
|
|Not true of Java or Python. Or Perl, for that matter; I don't know if
|the Perl5 interpreter will read source code in anything other than
|ASCII/Latin-1/UTF-8, but a Perl program can certainly read and write
|strings in just about any encoding you can think of via the "use
|encoding" pragma. Including Shift-JIS, EUC-JP, Big5, or anything else
|.. . . while the interface presented to the Perl programmer always uses
|Unicode.

As far as I know, physical string representation in Perl/Python/Java
is only UTF (8 or 16). As you've stated, they can read/write strings
in any encoding but they are converted to/from Unicode at the surface.
Correct me if I'm wrong.

matz.

Curt Sampson · Nov 20, 2005

As far as I know, physical string representation in Perl/Python/Java
is only UTF (8 or 16). As you've stated, they can read/write strings
in any encoding but they are converted to/from Unicode at the surface.
Correct me if I'm wrong.

For Java, at least, you are wrong. A Java String object is a sequence of
16-bit-wide characters; some library routines assume these to be encoded
in UTF-16, but you can process, e.g., Shift-JIS without any translation
by declaring all input and output encodings to be ISO-8859-1 (which is
"transparent" for 8-bit characters).

cjs

Investigating Ruby - key limitations ?

Gyoung-Yoon Noh

Robert Klemme

gregarican

mortench

Austin Ziegler

gwtmp01

mortench

Austin Ziegler

gabriele renzi

Martin DeMello

jussij

John W. Kennedy

Austin Ziegler

markjreed

gwtmp01

Curt Sampson

Yukihiro Matsumoto

Curt Sampson

Members online

Forum statistics

Latest Threads