If you could add anything you want

J

John Gagon

Ed said:
Hi, John,

I'm not entirely sure what you mean by, "Guarantee certain metrics." Do
you mean, "Are the tool's metrics guaranteed to be correct?"

I believe yes. I notice you include a lot of the standard metrics that
I use in the various analysis views. I'm assuming that once one
achieves the Fractal Class Composition score of 1.0, that the metrics
for instability, for example would be zero (afferent/efferent
couplings) and cyclomatic complexity would all be at a certain optimum
or value. I'm guessing some might come in as perfect while others are
"fairly close" to an ideal value (like distance and abstractness). In
any case, I wonder if some metric limits are reached by achieving a
score of 1.0 perhaps as a function of number of packages and classes.
Well, we
use them in our work, and I know of two other shops that use them; but
would I sign an iron-clad, financially-punitive contract declaring that
the tool is free from all bugs and so the metrics are guaranteed to be
correct for all inputs?" Sadly, I would not. To date, however, there
have been few major complaints.

It's always hard to tell that one. Amount of money to risk seems to me
proportionate to perceived stability, I would think it would depend on
the amount in the contract.
I get the feeling, however, that this is not what you meant.

Do you still get the feeling? (as I lost what your pronoun 'this'
(above) might refer to other than generally my question about
guaranteeing of certain metrics)
I would be delighted to receive your corrections. Engineering has
withered my English language skills to the point where they must cart
around their own bottled oxygen with them, and still they wheeze and
splutter at the slightest grammatical exertion; I would appreciate any
comments you have. It's rare indeed that anyone volunteers so
surprisingly important a service.

I have sent them to you personally in a separate email.
and I like

Excellent point. I should make the source examples available as
downloads. Examples, in fact, are probably not enough; so as a gesture
of appreciation for your offer above, I'll open-source a full
application with a fractal index of a perfect 1.0. Give me a couple of
weeks to cobble together a program description; I'll post notification here.

Yes, that would be *very* useful. Very good idea there. I'll search for
it periodically in the future. Feel free to CC my email if you would
like me to look at it too. ;-)
but I noticed you have it

I'm fortunately unafflicted by perfectionism: I hear it can be tiresome. :)

It sure can be. Well, a somewhat perfectionist, to be pedantic, is not
quite as bad as an absolute perfectionist though is it?
As with all metrics and as I'm sure you're aware, metrics should be
viewed with healthy caution. I'm not sure how much value can be gleaned
from pouring code that was designed without the fractal class
composition in mind into the Fractality code analyser because there are
many different methodologies that people use to maximise the OOness of
their system.

I use principles of keeping packages, classes and method sizes in a
certain range and I try to organize dependencies and in the past, I've
used a more bandaid approach using an open source tool call depfind
that searched the code for dependencies and spat out megabytes of xml
or html. As a maintainer of a mature codebase, this was more crucial
because stability was a primary goal at that point. Every code change
needed impact analysis and I would use the dependency checker run in an
ant script to find out the current dependencies and find the number of
other classes affected by the change. This preventative approach
reduces that need quite a bit. I used to work on this at HP before
offshoring and reduction of workforce occured with the incoming CEO
replacing Carly.
A fractal index of 0.58 does indeed suggest that a module was not,
"Programmed to an interface repository," and did not, "Eliminate
descendant dependencies;" but if these concepts were not used in the
construction of that module, then we're viewing the code from an angle
not considered by the designer: it's then perhaps no surprise that it
looks a little askew, but that doesn't imply that the code is poorly
designed; it's just designed in a way unfamiliar to an unbending code
analyser.

The code started out cleaner but then became more ratsnesty / spaghetti
and even with just myself programming it, it grew out of control since
I would work on it weekend to weekend since it was my own, on the side,
skunkwork/moonlighting project.
If, however, a module is designed from scratch with the fractal class
composition in mind, and yet still scores badly in Fractality, then we
can ask some drilling questions.

Yes. I plan on refactoring to this standard if only for future
maintenance. It will be a guiding principle for all others working on
my free open version. I'm writing a tool which I will soon publish on
Sourceforge and java.net and later, I will finish a commercial grade
version with extra features. My tool is something more related to
prototyping and quick model driven development similar to projects like
trails/ruby on rails etc but with one other design goal in mind besides
"do not repeat yourself". It's been a long journey but I've got about
60% completion right now. (I'm also working on a personal tracking tool
like xplanner but supporting more calendar and recurring functions)

John Gagon
 
J

John Gagon

John said:
I believe yes. I notice you include a lot of the standard metrics that
I use in the various analysis views. I'm assuming that once one
achieves the Fractal Class Composition score of 1.0, that the metrics
for instability, for example would be zero (afferent/efferent
couplings) and cyclomatic complexity would all be at a certain optimum
or value. I'm guessing some might come in as perfect while others are
"fairly close" to an ideal value (like distance and abstractness). In
any case, I wonder if some metric limits are reached by achieving a
score of 1.0 perhaps as a function of number of packages and classes.

(of course, metrics are more often type level metrics as Fractal Class
Composition is more a package level metric of its own. maybe it's not
so relevant per se but an additional item that is almost independant)

John Gagon
 
J

John Gagon

The one and only true Design by Contract, as defined by its inventor.

This should imply also *removing* all those unnecessary gotos from
the language (ie the over-abused and misused exceptions) and leave
exceptions, well, for really exceptional conditions (in Eiffel that
happens only when some part breaks a contract).

I've been following closely stuff like JML, Nice, etc. and more
recently Contract4J.

But real DbC integrating nicely in my IDE and having that IDE
reporting possible broken contract in real-time would be gorgeous
(that can be seen on demos of Microsoft's Spec# "specsharp"
research language and it is *really* impressive).

Link to an overview here (works fine under OpenOffice):

http://research.microsoft.com/~leino/papers/SpecSharp-MPI-SS.ppt


For the moment I'm stuck with IntelliJ IDEA's @NotNull Java 1.5
annotation... It's already a good addition to the language (some
would say it's just a fix for a language defect ;)

http://www.jetbrains.com/idea/features/newfeatures.html#nullable

I can't wait being able to specify real contracts on my abstract
data types!

Of course YMMV,

Alex

I don't see where my message went so I'll summarize this time.. I do
like Nice a lot. I will look at Spec Sharp. Looks good though. Well
done presentation.

John Gagon
 
C

Chris Uppal

John said:
If you could add anything you wanted to the java language, what would
it be?

Hmm... where does one start ?

I'd predict some would say the non-imperative stuff ie: closures or the
LISP like abilities to work almost purely functionally or do macros.

Without trying to change Java into a better /kind/ of language, here are a few
things which (IMO) stay within the spirit of Java but would have saved me
time/effort in the past.

Java would have unsigned integers (but no automatic coercion between
signed/unsigned of the same width).

Right or left shifting by an impossible constant would provoke a compile-time
warning.

char would be 32-bit.

String would be an abstract type with (the option of) different concrete
subclasses.

Auto-boxing would provoke a compile-time warning.

The [] notation would be available whenever the object implements some
interface, Indexable perhaps. java.util.List would inherit that interface.

Operator overloading would be permitted in some disciplined manner. Again,
probably a small group of interfaces -- Field, MultiplicativeGroup,
AdditiveGroup, perhaps. Classes would be required to implement the whole set
of related operations, not just cherry-pick. Assignment operators like ++ and
*= would be translated by the compiler into x = x + MyClass.unity(), rather
than being available for roll-your-own overriding. The argument types and
returned value of overloaded operators would be required to follow the pattern
established by the existing operators (i.e. you can't redefine << to mean
System.out.print()).

Objects would allow the clone() operation by default (which would probably be
renamed to copy(), leaving a protected clone() which was a JVM-implemented
shallow copy). The default implementation of copy() would call a protected
postCopy() method. The default implementation of postCopy() would be empty.
There would be a marker interface or annotation to forbid clone(). I.e.
classes would opt-out of being copyable, not be forced to opt in.

Generics would vanish.

The definition of interfaces would be changed so that a method needed to
satisfy the contract implied by the interface need be no more visible than the
interface itself. (E.g. package-private methods would satisfy package-private
interfaces.)

There would be a means of telling the compiler: "yes I know I'm calling a
method that you don't know about, but /trust me/, it'll be there by the time
this code is executed". Perhaps that would be allowed wherever there's an
explicit handler for NoSuchMethodException. The same for fields.

There would be a method, java.lang.System.getPlatformVersion().

In Java, references to final fields initialised to a compile-time constant, are
replaced by the constant itself. That's OK, but the generated classfiles would
retain a per-method reference to the field so that dependencies can be tracked.

There would be some way to define compile-time constants on the javac
command-line. (Since it's possible to do this anyway with only a little
hacking, there seems no valid justification for not allowing it in a
disciplined form.)

There would be a kind of Classloader which understood that you can put several
JARfiles in one directory. The application Classloader would be of this type.

The people at Sun would be immersed upside-down in a huge vat of sex-crazed
cane toads until they agreed to change their bloody awful layout conventions.

That list is by no means complete, but I've grown bored typing it in.
Probably everyone's sick of reading by now too...

-- chris
 
E

Ed Kirwan

John said:
I believe yes. I notice you include a lot of the standard metrics that
I use in the various analysis views. I'm assuming that once one
achieves the Fractal Class Composition score of 1.0, that the metrics
for instability, for example would be zero (afferent/efferent
couplings) and cyclomatic complexity would all be at a certain optimum
or value. I'm guessing some might come in as perfect while others are
"fairly close" to an ideal value (like distance and abstractness). In
any case, I wonder if some metric limits are reached by achieving a
score of 1.0 perhaps as a function of number of packages and classes.

I had the same question myself, which is actually the reason for
including our good friend Robert C Martin's metrics in the analysis
tool. I was hoping that a system with a fractal index of 1.0 would show
a very low Distance metric. It's difficult to compare the two metrics,
of course, as the fractal index is system-wide, but the Distance metric
is per-package (why doesn't Uncle Bob develop a system-wide variant?),
but in those applications I've seen with a fractal index of 1.0, I've
not seen any packages with a Distance metric of higher than 0.5.

Certainly, "Program to an interface repository, not an implementation
repository," should align well with the Distance metric, but the
correlation is still suspect; it's just too easy get an accidentally
high Distance metric. (And I remember reading somewhere, sometime, that
someone else had made a slight alteration to the Distance metric ...
must check for that again.)

On the other hand, cyclomatic complexity is certainly well managed by,
"Eliminate descendant dependencies;" and indeed the only place where
cycles can occur is between two peer interface repositories, which are
in themselves quite rare (one interface repository is usually sufficient
to serve a package branch).
I have sent them to you personally in a separate email.

Received and thank you, sir!
 
O

Oliver Wong

[snipped some good suggestions]
char would be 32-bit.

Conceptually, char should not have a width or size at all. Every char
value should identify exactly one unicode character. The underlying
implementation is free to use UTF-8, UTF-16, UTF-32 or any other encoding it
likes to convert from char to bytes, but from the programmer's perspective,
you can store any unicode character into a single char (i.e. none of this
"surrogate pair" nonsense).

However, I'm not that familiar with the inner workings of the virtual
machine, so I don't know what kind of havoc a "variable-length primitive"
might cause.

[snipped some good suggestions]
Generics would vanish.

!!! I thought Java would be better with more generics, rather than less (or
none at all).

The definition of interfaces would be changed so that a method needed to
satisfy the contract implied by the interface need be no more visible than
the
interface itself. (E.g. package-private methods would satisfy
package-private
interfaces.)

I don't understand this one.
There would be a means of telling the compiler: "yes I know I'm calling a
method that you don't know about, but /trust me/, it'll be there by the
time
this code is executed". Perhaps that would be allowed wherever there's an
explicit handler for NoSuchMethodException. The same for fields.

I've never seen a need for this. Can you elaborate?

[snipped some good suggestions]

- Oliver
 
H

Hendrik Maryns

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Chris Uppal schreef:
Auto-boxing would provoke a compile-time warning.

Eclipse offers this possibility.
Generics would vanish.

Hm, no.

H.
- --
Hendrik Maryns

==================
http://aouw.org
Ask smart questions, get good answers:
http://www.catb.org/~esr/faqs/smart-questions.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFEa1QGe+7xMGD3itQRAhOFAJ9ovFcBXnKNqdY5Pm4HdN/N2M3akACff9JW
tdwgKv8yFY5rAAnUuVLX9WE=
=6bCA
-----END PGP SIGNATURE-----
 
C

Chris Uppal

Oliver Wong wrote:

[me:]
Conceptually, char should not have a width or size at all.

I don't see the need for that level of abstraction. Unicode is limited to < 24
bits by the UTF-16 hack. The Unicode consortium states that code points will
never be allocated outside the range representable in UTF-16. The equivalent
ISO work has limited itself (as I understand it) to 31 bits, but since they and
the Unicode people are committed to staying in lock-step, it's hard to see why
that is more than an academic point.

So the only reason I can think of for /not/ choosing 32-bits is that you might
suspect that now or in the future an implementation might want to restrict
itself to 24 bits per char. I don't find that too plausible myself, but...

Of course, Strings might well use UTF-8 or UTF-16 encoded binary as their
internal representation, or maybe UTF-32 for applications needing constant-time
access. (The requirement I perceive for flexibility in this matter is why I
would want to turn String into an abstract class). But that's /Strings/, I
don't see a reason for /char/ to be anything other than an integer type with
known width.

!!! I thought Java would be better with more generics, rather than less
(or none at all).

Java might be better off with a proper implementation of generics (as opposed
to the mess we've had dumped on us). I don't, myself, see that there's much
to be gained by such a feature, and I don't really think that it's in the
"spirit of Java" -- so (at least for this discussion) I'd just drop them.

You may be thinking of a more C++-like feature which provides compile-time
metaprogramming. I'd certainly agree that a language which supports
metaprogramming is much to be preferred over one that does not (unless, of
course, the "metaprogramming" is something as gross as C++ templates). But
that wouldn't fit with my self-imposed restriction:
Without trying to change Java into a better /kind/ of language, [...]
The definition of interfaces would be changed so that a method needed to
satisfy the contract implied by the interface need be no more visible
than the
interface itself. (E.g. package-private methods would satisfy
package-private
interfaces.)

I don't understand this one.

E.g. I have a package which uses internal interfaces to give order to the
structure of the private code. I want one of the public classes in that
package to implement one or more of those internal interfaces. I am forced to
make the relevant methods public. I.e. I have to /publish/ them, and thus
commit to keeping them unchanged in future developments. Bad. An interface is
a /promise/, but you have to ask who the promise is made to. In this case I
want to be able to use promises internally but am impeded because the language
definition assumes that the only promises I will ever want to make are to
client code.

I've never seen a need for this. Can you elaborate?

Maybe a simple example would help:

...
aMethod()
{
double start = now();
someLongishOperation();
double end = now();
System.out.printf("It took %f seconds%n", end-start)
}

/**
* return the time in seconds since an arbitrary (but fixed) start-time.
* Resolution is dependent on the version of the Java platform
*/
private long
now()
{
try
{
if (System.out.getPlatformVersion() >= 5)
return System.out.nanoTime() / 1.0e9;
}
catch (NoSuchMethodException e)
{
// log it, or something
}
return System.out.getTimeMillis() * 1.0e3;
}

That, or something like it, should compile on pre 1.5 platforms, but it won't
because the compiler is too damned fond of early-binding. Note that it is the
/compiler/ that's doing this, the equivalent bytecode would run just fine
(legal /and/ safe) on any JVM.

-- chris
 
O

Oliver Wong

Chris Uppal said:
Oliver Wong wrote:

[me:]
Conceptually, char should not have a width or size at all.

I don't see the need for that level of abstraction. Unicode is limited to
< 24
bits by the UTF-16 hack. The Unicode consortium states that code points
will
never be allocated outside the range representable in UTF-16. The
equivalent
ISO work has limited itself (as I understand it) to 31 bits, but since
they and
the Unicode people are committed to staying in lock-step, it's hard to see
why
that is more than an academic point.

So the only reason I can think of for /not/ choosing 32-bits is that you
might
suspect that now or in the future an implementation might want to restrict
itself to 24 bits per char. I don't find that too plausible myself,
but...

Of course, Strings might well use UTF-8 or UTF-16 encoded binary as their
internal representation, or maybe UTF-32 for applications needing
constant-time
access. (The requirement I perceive for flexibility in this matter is why
I
would want to turn String into an abstract class). But that's /Strings/,
I
don't see a reason for /char/ to be anything other than an integer type
with
known width.

I like the flexibility of adding new characters. If we define a size on
char, then we either have a finite number of character we can define, or we
have something like surrogate pairs (or triplets, or quadruplets, etc.)
where you don't have a 1 to 1 correspondence between the "concept of a
character", the that "char data type in Java".

Potential sources for new characters (in approximate order of
probability):

* More domain-specific characters. E.g. musical notation for percussive
instruments, symbols for obscure operators in math, physics, etc.
* Integrating more popular, though "fictional" character sets, into
Unicode e.g. Klingon.
* Invention of a new language like Esperanto.
* New discovery by archeologists of ancient writing systems.
* Contact with alien civilization which use a different character set.

[snipped more explanations from Chris where I asked for them]

Ah, makes sense. Thanks.

- Oliver
 
R

Roedy Green

* More domain-specific characters. E.g. musical notation for percussive
instruments, symbols for obscure operators in math, physics, etc.

One of the "character" sets I saw on a prototype IBM colour terminal
was "geography". They had a character shaped like the tip of the boot
of Italy. You could put these together with solid blobs to make up a
map that took far fewer bits than a full bit map. Memory was very
expensive back then and bandwidth was typically 9600 baud max with
many terminals sharing the "high speed" line.

Other possible places for character expansion:

1. airport symbol language expanding to a full international language
to be used on signs and emergency instructions.

2. ASL symbols for the deaf, showing gestures in symbolic form.

3. symbols for choreography.

4. weather symbols (might be in there already. I did not notice them).

5. ligatures and fancy forms needed for precise typesetting even if
they are inserted by rule.

6. Symbols for the visually impaired. Alphabets and symbols easy to
discriminate.

7. Symbols to record the oral-only niche languages rapidly
disappearing.
 
C

Chris Uppal

Oliver said:
I like the flexibility of adding new characters.

I presume you mean that you like the flexibility the ISO and Unicode consortium
have to add new characters, rather than you would like to be free to define
your own (if do mean the latter then there is always the private-use area to
play in).

If we define a size
on char, then we either have a finite number of character we can define,
or we have something like surrogate pairs (or triplets, or quadruplets,
etc.) where you don't have a 1 to 1 correspondence between the "concept
of a character", the that "char data type in Java".

Me, I prefer to be able to manipulate characters as integers. Which requires
(for sanity) knowing how wide the integer is. Unicode isn't going to add
characters which don't fit into UTF-16, so there's a definite limit to how wide
the integer needs to be. Even if they /did/ scrap UTF-16 (hardly likely when
it would break Windows, .NET, /and/ Java ;-) there is still a unimaginably huge
amount of space available in the 31-bits that ISO limits itself to. It would
need several thousand "alphabets" the size of the unified HAN stuff to exhaust
that (and where are those writing systems lurking ?).

Potential sources for new characters (in approximate order of
probability):

* More domain-specific characters. E.g. musical notation for
percussive instruments, symbols for obscure operators in math, physics,

/Plenty/ of space is already available for that.

etc. * Integrating more popular, though "fictional" character sets,
into Unicode e.g. Klingon.

Ugh! Bloody sci-fi soap opera. (I /do/ like SF, I just don't like Star
Treck -- in any of its manifestations). IMO, adding that kind of thing (say
Tolkein's scripts) to Unicode would be a pathetic abuse of power.

And there's plenty of space anyway.

* Invention of a new language like Esperanto.

But would any sane new language use a writing system like Chinese ? And, if it
did, why would anyone want to take it seriously enough to add it to Unicode.
Let's say I design a language which, by definition, uses /all/ the Unicode
glyphs, in pairs, to denote a fixed but large set of words. That /can't/ fit
into any possible Unicode-like scheme since it has been deliberately designed
to break any finite scheme. So why should the scheme be extended to support
it ?

* New discovery by archeologists of ancient writing systems.

Certainly possible, and I would even call it probable. But why should that
require more space than is already available ?

* Contact with alien civilization which use a different character set.

Since Unicode is designed around /human/ writing schemes, reflecting /human/
perceptual processes and /human/ cultural history(ies), I don't think it would
be legitimate (and almost certainly impossible) to use Unicode to represent
another species' communication systems. Far better to adopt /their/ version of
Unicode for representing their communications.

And anyway, I sort of doubt whether there are any alien civilisations -- the
universe is far to big.

-- chris
 
K

Kent Paul Dolan

Roedy said:
Other possible places for character expansion:
1. airport symbol language expanding to a full
international language to be used on signs and
emergency instructions.

Makes sense.
2. ASL symbols for the deaf, showing gestures in symbolic form.

This won't work in general. ASL symbols are done
moving in space, sometimes changing handshapes while
the hands move. Video recordings work better. There
_was_, by the way, a very comprehensive, very arcane
written notation for ASL created well over a decade
ago, but it never caught on with either the ASL
communitity, or the research community studying
them.
3. symbols for choreography.

Such already exist, (LAB-Annotation, for one), but
IIUC, they are pretty rich symbol sets, subject to
idiosyncratic extentions by each choreographer, and
might not map to "alphabets" well.
4. weather symbols (might be in there already. I
did not notice them).

That would be mostly doable, but weather symbols
tend to be laid out on a two dimensional surface,
not used in typesetting, so the utility of such an
"alphabet" would be limited. Also, some weather
symbols, like storm front "curves with triangular
teeth", are extended graphical objects, with no base
point from which to draw them with an alphabet.
5. ligatures and fancy forms needed for precise
typesetting even if they are inserted by rule.

I've seen at least some of those in there, but
others, that I expected to see, are instead done by
overstriking, and could usefully have unique
representations, which would usually be more
accurate, instead. Notice that ligatures of the
"ffl" type are really font choices, not alphabet
choices, and so perhaps not suitable for Unicode
codes, since "ffl" in one font might be a ligature,
while in another it would not. Similarly, an
ellipsis is a single character or three characters,
depending on font support, so it is a kind of
"ligature" too (and, is in Unicode already). I don't
think a consistent treatment here is possible, which
will give standards committees great sway to do
mischief if they attempt the deed anyway.
6. Symbols for the visually impaired. Alphabets
and symbols easy to discriminate.

Mostly this is just accomplished by use of large
type, since the symbols have to be comprehensible to
the population with which they interact. Also,
notice that Unicode _doesn't_ include fonts or font
styles, just alphabet generic glyph identifiers and
ideograph generic glyph identifiers. Thus, some
"visually impaired" equivalent of the optical
character recognition fonts (which were for scanners
of the day that were "visually impaired" compared to
today's) wouldn't need codespace in the Unicode
standard, they'd just be other fonts or font styles,
with glyphs identified by Unicode "codes" for the
generic glyphs of which they are instances.
7. Symbols to record the oral-only niche languages
rapidly disappearing.

A nice idea, but they're going away far too fast for
saving. The world has, IIRC, some 3,000 languages,
very few of which have long term viability, and many
of which are reduced to a handful of proficient
speakers today. There simply aren't enough linguists
and linguistically adept missionaries to save most
of them.

And, in the rush to save what can be saved, use of
the International Phonetic Alphabet (perhaps
extended) as the base script would sure be a lot
smarter than inventing a whole new alphabet per
language.

FWIW

xanthian.
 
D

Dale King

If you see ones that are missing here you should point them out to the
Unicode Consortium. They already have a fairly complete set with many
obscure symbols.
One of the "character" sets I saw on a prototype IBM colour terminal
was "geography". They had a character shaped like the tip of the boot
of Italy. You could put these together with solid blobs to make up a
map that took far fewer bits than a full bit map. Memory was very
expensive back then and bandwidth was typically 9600 baud max with
many terminals sharing the "high speed" line.

That's not appropriate for Unicode. If you wanted something like that
for a specific project there are always private use areas of Unicode
that you can use for you own private use.
Other possible places for character expansion:

1. airport symbol language expanding to a full international language
to be used on signs and emergency instructions.

I can't imagine any symbols here appropriate for Unicode in general.
Examples?
2. ASL symbols for the deaf, showing gestures in symbolic form.

3. symbols for choreography.

Not appropriate as these are motions, not symbols, but if there are
symbols that are commonly used they should be proposed.
4. weather symbols (might be in there already. I did not notice them).

There are a few in this code page:

http://www.unicode.org/charts/PDF/U2600.pdf
5. ligatures and fancy forms needed for precise typesetting even if
they are inserted by rule.

Many of these exist for common ones.
6. Symbols for the visually impaired. Alphabets and symbols easy to
discriminate.

Definitely not appropriate for Unicode. This is a presentation/font issue.
7. Symbols to record the oral-only niche languages rapidly
disappearing.

Isn't a symbolic alphabet for an oral-only language an oxymoron? ;-)

If the language doesn't currently have an alphabet and one is being
assigned, it would make a lot more sense to use existing alphabets than
creating brand new ones.
 
P

Patricia Shanahan

Roedy Green wrote:
....
7. Symbols to record the oral-only niche languages rapidly
disappearing.

Why not use the International Phonetic Alphabet, which is already
represented in Unicode?

Patricia
 
O

Oliver Wong

Dale King said:
If you see ones that are missing here you should point them out to the
Unicode Consortium. They already have a fairly complete set with many
obscure symbols.

Well, I took a look at http://www.unicode.org/charts/PDF/U1D100.pdf
("Western Musical Symbols"), and they don't seem to have a notation for
indicating that the drummer should ease off the hihat pedal for the next few
notes, and then dampen the sound by applying pressure again. The notation
looks something like:

<asciiArt>
|-- O ----> -- (o) -|
| |
</asciiArt>

And is drawn above the staff of five lines where the notes are usually
drawn. There are others missing as well (e.g. repeat the following section,
but apply this ending the first time, and that ending the second time;
repeat the previous four measures; the following 3 notes should be played in
the time of 2 notes; apply the wah-wah pedal when playing guitars; let the
strings of the guitar ring openly; etc.) I haven't suggested this to the
consortium because:

(1) I didn't realize you could (but I've seen discovered
http://www.unicode.org/pending/proposals.html)
(2) I don't know the terminology or official names for these musical
symbols, being only an amateur musician. I figure there must be someone else
out there more qualified to make these submissions than me, but perhaps the
intersection of the set of all musicians and the set of all people who care
about Unicode is rather small.

I can't imagine any symbols here appropriate for Unicode in general.
Examples?

Well, they have some symbols which are, AFAIK, internationally
recognized. In http://www.unicode.org/charts/PDF/U2600.pdf there's the
recycling symbol, the biohazard symbol, and the poison symbol. Perhaps you
could have internationally recognized road signs as well (yield, stop, left
lane merge, etc.)

- Oliver
 
O

Oliver Wong

Chris Uppal said:
I presume you mean that you like the flexibility the ISO and Unicode
consortium
have to add new characters, rather than you would like to be free to
define
your own (if do mean the latter then there is always the private-use area
to
play in).

Yes, I meant that I like the idea that the consortium can add letters as
needed.
Me, I prefer to be able to manipulate characters as integers. Which
requires
(for sanity) knowing how wide the integer is. Unicode isn't going to add
characters which don't fit into UTF-16, so there's a definite limit to how
wide
the integer needs to be. Even if they /did/ scrap UTF-16 (hardly likely
when
it would break Windows, .NET, /and/ Java ;-) there is still a unimaginably
huge
amount of space available in the 31-bits that ISO limits itself to. It
would
need several thousand "alphabets" the size of the unified HAN stuff to
exhaust
that (and where are those writing systems lurking ?).

I don't think it should make sense to manipulate characters as integers,
just like it doesn't make sense to manipulate Strings which coincidentally
have length 1 as integers.

If you're doing some sort of ASCII manipulation stuff, then you're not
actually dealing with the characters themselves, but the byte-encoding of
those characters in the ASCII encoding system, for example. So you'd take
your characters, convert them to integers or bytes or whatever using an
ASCII encoder, and then manipulate those integers, then convert them back to
characters using an ASCII decoder, for example.

At any rate, I don't think we should impose an upper limit on the number
of useful symbols or characters that we allow to define for ourselves. It
reminds me of that "Nobody needs more than 640KB of RAM" (mis-)quote.
Potential sources for new characters (in approximate order of
probability): [snip]
* Contact with alien civilization which use a different character
set.

Since Unicode is designed around /human/ writing schemes, reflecting
/human/
perceptual processes and /human/ cultural history(ies), I don't think it
would
be legitimate (and almost certainly impossible) to use Unicode to
represent
another species' communication systems. Far better to adopt /their/
version of
Unicode for representing their communications.

Is this "humans-only" requirement actually documented anywhere? I mean,
if we found out that, for example, spiders encoded some communicative
information within the patterns of their webs and we managed to decode it,
would it be "against policy" to add symbols from this spider-language to
Unicode? Or would we say "well, now since we, as humans, have decoded it, it
becomes a human writing scheme, and so is apt to be used in Unicode"?

I don't know what assumptions Unicode makes, but it seems to me that if
it's possible to add characters to it to support alien languages, it
certainly would be worthwhile to do so upon encountering those languages.

I guess Unicode assumes that there exists a definite ordering of the
character streams (e.g. right to left, top to bottom). Or maybe it's not
Unicode which makes that assumption, but rather our Strings which do so. If
an alien civilization's natural language ressembled BeFunge, I'm not sure
how well our concepts of strings could cope, though we could certainly add
each symbol within that language to Unicode.

- Oliver
 
B

Bent C Dalager

Is this "humans-only" requirement actually documented anywhere?

It is probably an emergent property of the system.
I mean,
if we found out that, for example, spiders encoded some communicative
information within the patterns of their webs and we managed to decode it,
would it be "against policy" to add symbols from this spider-language to
Unicode? Or would we say "well, now since we, as humans, have decoded it, it
becomes a human writing scheme, and so is apt to be used in Unicode"?

You would get into trouble if it turns out that the exact stickiness
(however stickiness is measured) of the strands involved in the symbol
are vital to the correct interpretation of the message.

How do you represent stickiness in Unicode?

Or perhaps pheromones add vital information to the picture.
I don't know what assumptions Unicode makes, but it seems to me that if
it's possible to add characters to it to support alien languages, it
certainly would be worthwhile to do so upon encountering those languages.

Our ability to do so presumably depends upon the aliens having the
exact same concept as we do as to what a glyph is. The simplest
variation (depending on 3D glyphs perhaps, or animated ones)* is
likely to throw off Unicode.

* - For all I know, Unicode may support this, but you get the idea.

Cheers
Bent D
 
D

Dale King

Chris said:
I presume you mean that you like the flexibility the ISO and Unicode consortium
have to add new characters, rather than you would like to be free to define
your own (if do mean the latter then there is always the private-use area to
play in).



Me, I prefer to be able to manipulate characters as integers.

I would prefer that you could *NOT* manipulate characters as integers. I
have no trouble with the ability to convert between char and int using
*explicit* casting. Any manipulation could be done as integers.
Ugh! Bloody sci-fi soap opera. (I /do/ like SF, I just don't like Star
Treck -- in any of its manifestations). IMO, adding that kind of thing (say
Tolkein's scripts) to Unicode would be a pathetic abuse of power.

There have already been proposals for Klingon, Ferrengi, and Tolkein
runes to be added. I have no problem with "fictional" character sets
being added as long as there is some realistic common usage. Because
there was not a lot of common usage for them they were rejected. But
there is the ConScript Unicode Registry
(http://www.evertype.com/standards/csur) that has assigned them to the
private usage area which is where they belong.

Esperanto is covered and it is best to reuse existing characters for new
languages.
 
O

Oliver Wong

Bent C Dalager said:
You would get into trouble if it turns out that the exact stickiness
(however stickiness is measured) of the strands involved in the symbol
are vital to the correct interpretation of the message.

How do you represent stickiness in Unicode?

Or perhaps pheromones add vital information to the picture.

I don't think Unicode says anything about the representation of such
characters. There's a platonic ideal representing the concept of the first
character in the lowercase alphabet, 'a'. Unicode assignes a number to that
character (\u0061), but it doesn't say anything about what that character
looks like, or how it should be drawn. There's another character, \u0430,
which is visually indistinguishable from \u0061 in all fonts I've seen, and
yet it's "obviously" a different Unicode character by virtue of having a
different number.

So this spider-character-set would have different code points for each
character. It's up to the font designers to worry about how to represent
stickiness or pheromones in their fonts (if they chose to do so at all).

Note that unicode text isn't nescessarily displayed visually; it could
be displayed via speech readers, or braille devices. It's almost a natural
fit to represent stickness via any tactical output device like braille.
Similar devices could be constructed to accurately represent pheromones
olfactorally.
Our ability to do so presumably depends upon the aliens having the
exact same concept as we do as to what a glyph is. The simplest
variation (depending on 3D glyphs perhaps, or animated ones)* is
likely to throw off Unicode.

* - For all I know, Unicode may support this, but you get the idea.

Again, I believe unicode isn't interested in the actual representation
of these characters.

- Oliver
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,444
Messages
2,571,709
Members
48,796
Latest member
Greg L.
Top