Oh look, another language (ceylon)

Neal Becker · Nov 13, 2013

http://ceylon-lang.org/documentation/1.0/introduction/

Gregory Ewing · Nov 17, 2013

Neal said:
http://ceylon-lang.org/documentation/1.0/introduction/

The type system looks very interesting!

It's just a pity they based the syntax on C rather
than something more enlightened. (Why do people
keep doing that when they design languages?)

Chris Angelico · Nov 17, 2013

The type system looks very interesting!

It's just a pity they based the syntax on C rather
than something more enlightened. (Why do people
keep doing that when they design languages?)

Because in many ways it's an excellent syntactic structure, and - more
importantly - it's one that's familiar to a huge number of
programmers. That's pretty valuable.

ChrisA

jkn · Nov 17, 2013

Hi Stephen

[...]

It's just a pity they based the syntax on C rather than something more
enlightened. (Why do people keep doing that when they design languages?)

Click to expand...

When the only tool you've used is a hammer, every tool you design ends up
looking like a hammer.

true, and yet ... if were to design a hammer, would you be justified in assuming that that is the only tool I know about?

J^n

Gregory Ewing · Nov 17, 2013

Mark said:
As a rule of thumb people don't like change? This obviously assumes
that language designers are people

That's probably true (on both counts).

I guess this means we need to encourage more
Pythoneers to become language designers!

Tim Daneliuk · Nov 17, 2013

That's probably true (on both counts).

I guess this means we need to encourage more
Pythoneers to become language designers!

Ahem, I already commented on this in some detail"

https://mail.python.org/pipermail/python-list/2004-September/241055.html

Rick Johnson · Nov 18, 2013

The type system looks very interesting!

Indeed.

I went to the site assuming this would be another language
that i would never like, however, after a few minutes
reading the tour, i could not stop!

I read through the entire tour with excitement, all the while
actually yelling; "yes" and sometimes even "yes, yes, YES"

But not only is the language interesting, the web site
itself is phenomenal! This is a fine example of twenty first
century design at work.

I've always found the Python web site to be a cluttered
mess, but ceylon-lang.org is just the opposite! A clean and
simplistic web site with integrated console fiddling --
heck, they even took the time to place a button near every
example!

Some of the aspects of ceylons syntax i find interesting are:

Instead of using single, double, and triple quotes to
basically represent the same literals ceylon decided to
implement each uniquely. Also, back-tick interpolation
and Unicode embedding is much more elegant!

The use of a post-fix question mark to denote a
declared Type that can optionally be null.

The ceylon designers ACTUALLY understand what the
word "variable" means!

Immutable attributes, yes, yes, YES!

The multiplication operator can ONLY be used on
numerics. Goodbye subtle bug!

Explicit "return" required in methods/functions!

No "default initialization to null"

No omitting braces in control structures
(Consistency is the key!!!)

The assert statement is much more useful than
Python's

The "tagging" of iterable types using regexp
inspired syntax "*" and "+" is and interesting idea

Conditional logic is both concise and explicit using
"exists" and "nonempty" over the implicit "if value:"

Range objects are vastly superior to Python's lowly
range() func.

Comprehensions are ordered more logically than
Python IMO, since i want to know where i'm looking
BEFORE i find out what will be the return value

Ceylon: [for (p in people) p.name]
Python:[p.name for p in people]
Ruby: people.collect{|p| p.name}

Ceylon: for (i in 0..100) if (i%3==0) i
Python: [i for i in range(100) if i%3==0]
Ruby: (1..10).select{|x| x%3==0}

Funny thing is, out of all three languages,
Ruby's syntax is linear and therefor
easiest to read. Ruby is the language i
WANT to love but i can't

due to too many
inconsistencies. But this example shines!

It's just a pity they based the syntax on C rather
than something more enlightened. (Why do people
keep doing that when they design languages?)

What do you have in mind?

Please elaborate because we could use a good intelligent
conversation, instead of rampant troll posts.

Chris Angelico · Nov 18, 2013

That's probably true (on both counts).

I guess this means we need to encourage more
Pythoneers to become language designers!

Easy! Just make Python really bad in every way except syntax. Then
people will be constantly thinking "If only Python were more X and
less Y... great syntax but the language sucks in so many ways!" and
they'll borrow the syntax into their new languages.

If you're setting out to create a new language, you probably want it
to be "Foo, except X" for some Foo and X. So you'll keep everything
about Foo that doesn't conflict with your changes. I would expect to
see Python-like syntax in a language that's designed to be "Python,
except compilable to C for performance"... and whaddayaknow, Cython
fits that description. Thing is, Python is just so much better than
(C, C#, JavaScript, Java) that there's hardly as much impetus to
create a new language.

ChrisA

Gregory Ewing · Nov 18, 2013

Rick said:
The multiplication operator can ONLY be used on
numerics.

I'm not convinced about that part. I notice that
subtraction, multiplication and division are bundled
into a single interface Numeric, but there is a
separate one called Summable for addition --
apparently so that they could use + for string
concatenation.

This seems to be a case of one rule for the language
designers and a different one for everyone else.
If it's okay for '+' to be used on something that's
not a number, why not '*'?

Chris Angelico · Nov 18, 2013

I'm not convinced about that part. I notice that
subtraction, multiplication and division are bundled
into a single interface Numeric, but there is a
separate one called Summable for addition --
apparently so that they could use + for string
concatenation.

This seems to be a case of one rule for the language
designers and a different one for everyone else.
If it's okay for '+' to be used on something that's
not a number, why not '*'?

That's something Java did (using + for strings, but not supporting
operator overloading for custom classes, so you can't make your own
string-like or number-like class and use + with it), and IMO it's one
of the language's annoying flaws. Give people the power to use
whatever operator they choose in whatever way they choose, and accept
that occasionally you'll get less-than-stellar usage. It's a cost that
you pay happily when you let people name their own functions; why not
give the same freedom for operators?

ChrisA

wxjmfauth · Nov 18, 2013

character
Satisfied Interfaces: Comparable<Character>, Enumerable<Character>, Ordinal<Other>
A 32-bit Unicode character.
Satisfied Interfaces: Category, Cloneable<List<Element>>, Collection<Element>,
Comparable<String>, Correspondence<Integer,Element>, Iterable<Element,Null>,
List<Character>, Ranged<Integer,String>, Summable<String>

string
Satisfied Interfaces: Category, Cloneable<List<Element>>, Collection<Element>,
Comparable<String>, Correspondence<Integer,Element>, Iterable<Element,Null>,
List<Character>, Ranged<Integer,String>, Summable<String>
A string of characters. Each character in the string is a 32-bit Unicode
character. The internal UTF-16 encoding is hidden from clients.
A string is a Category of its Characters, and of its substrings:

Clean. Far, far away from a unicode handling which may require
18 bytes (!) more to encode a non ascii n-chars string than a
ascii n-chars string.
(With performances following expectedly "globally" the same logic)
44

jmf

Mark Lawrence · Nov 18, 2013

character
Satisfied Interfaces: Comparable<Character>, Enumerable<Character>, Ordinal<Other>
A 32-bit Unicode character.
Satisfied Interfaces: Category, Cloneable<List<Element>>, Collection<Element>,
Comparable<String>, Correspondence<Integer,Element>, Iterable<Element,Null>,
List<Character>, Ranged<Integer,String>, Summable<String>

string
Satisfied Interfaces: Category, Cloneable<List<Element>>, Collection<Element>,
Comparable<String>, Correspondence<Integer,Element>, Iterable<Element,Null>,
List<Character>, Ranged<Integer,String>, Summable<String>
A string of characters. Each character in the string is a 32-bit Unicode
character. The internal UTF-16 encoding is hidden from clients.
A string is a Category of its Characters, and of its substrings:

Clean. Far, far away from a unicode handling which may require
18 bytes (!) more to encode a non ascii n-chars string than a
ascii n-chars string.
(With performances following expectedly "globally" the same logic)

44

jmf

In [3]: sys.getsizeof(1)
Out[3]: 14

What a disaster, 13 bytes wasted storing 1. I'll just rush off to the
bug tracker and raise an issue to get the entire Cpython core rewritten
before Armaggeddon strikes.

Chris Angelico · Nov 18, 2013

string
Satisfied Interfaces: Category, Cloneable<List<Element>>, Collection<Element>,
Comparable<String>, Correspondence<Integer,Element>, Iterable<Element,Null>,
List<Character>, Ranged<Integer,String>, Summable<String>
A string of characters. Each character in the string is a 32-bit Unicode
character. The internal UTF-16 encoding is hidden from clients.
A string is a Category of its Characters, and of its substrings:

I'm trying to figure this out. Reading the docs hasn't answered this.
If each character in a string is a 32-bit Unicode character, and (as
can be seen in the examples) string indexing and slicing are
supported, then does string indexing mean counting from the beginning
to see if there were any surrogate pairs?

ChrisA

Ian Kelly · Nov 18, 2013

I'm trying to figure this out. Reading the docs hasn't answered this.
If each character in a string is a 32-bit Unicode character, and (as
can be seen in the examples) string indexing and slicing are
supported, then does string indexing mean counting from the beginning
to see if there were any surrogate pairs?

The string reference says:

"""Since a String has an underlying UTF-16 encoding, certain operations are
expensive, requiring iteration of the characters of the string. In
particular, size requires iteration of the whole string, and get(), span(),
and segment() require iteration from the beginning of the string to the
given index."""

The get and span operations appear to be equivalent to indexing and slicing.

Chris Angelico · Nov 18, 2013

The string reference says:

"""Since a String has an underlying UTF-16 encoding, certain operations are
expensive, requiring iteration of the characters of the string. In
particular, size requires iteration of the whole string, and get(), span(),
and segment() require iteration from the beginning of the string to the
given index."""

The get and span operations appear to be equivalent to indexing and slicing.

Right, that's what I was looking for and didn't find. (I was searching
the one-page reference manual rather than reading in detail.) So, yes,
they're O(n) operations. Thanks for hunting that down.

ChrisA

Steven D'Aprano · Nov 18, 2013

I'm trying to figure this out. Reading the docs hasn't answered this. If
each character in a string is a 32-bit Unicode character, and (as can be
seen in the examples) string indexing and slicing are supported, then
does string indexing mean counting from the beginning to see if there
were any surrogate pairs?

I can't figure out what that means, since it contradicts itself. First it
says *every* character is 32-bits (presumably UTF-32), then it says that
internally it uses UTF-16. At least one of these statements is wrong.
(They could both be wrong, but they can't both be right.)

Unless they have done something *really* clever, the language designers
lose a hundred million points for screwing up text strings. There is
*absolutely no excuse* for a new, modern language with no backwards
compatibility concerns to choose one of the three bad choices:

* choose UTF-16 or UTF-8, and have O(n) primitive string operations (like
Haskell and, apparently, Ceylon);

* or UTF-16 without support for the supplementary planes (which makes it
virtually UCS-2), like Javascript;

* choose UTF-32, and use two or four times as much memory as needed.

Chris Angelico · Nov 18, 2013

Unless they have done something *really* clever, the language designers
lose a hundred million points for screwing up text strings. There is
*absolutely no excuse* for a new, modern language with no backwards
compatibility concerns to choose one of the three bad choices:

Yeah, but this compiles to JS, so it does have that backward compat
issue - unless it's going to represent a Ceylon string as something
other than a JS string (maybe an array of integers??), which would
probably cost even more.

You're absolutely right, except in the premise that Ceylon is a new
and unshackled language. At least this way, if anyone actually
implements Ceylon directly in the browser, it can use something
smarter as its backend, without impacting code in any way (other than
performance). I'd much rather they go for O(n) string primitives than
maintaining the user-visible UTF-16 bug.

ChrisA

Steven D'Aprano · Nov 18, 2013

I can't figure out what that means, since it contradicts itself. First
it says *every* character is 32-bits (presumably UTF-32), then it says
that internally it uses UTF-16. At least one of these statements is
wrong. (They could both be wrong, but they can't both be right.)

Mystery solved: characters are only 32-bits in isolation, when plucked
out of a string.

http://ceylon-lang.org/documentation/tour/language-module/
#characters_and_character_strings

Ceylon strings are arrays of UTF-16 characters. However, the language
supports characters in the Supplementary Multilingual Plane by having
primitive string operations walk the string a code point at a time. When
you extract a character out of the string, Ceylon gives you four bytes.
Presumably, if you do something like like this:

# Python syntax, not Ceylon
mystring = "a\U0010FFFF"
c = mystring[0]
d = mystring[1]

c will consist of bytes 0000 0061 and d will consist of the surrogate
pair DBFF DFFF (the UTF-16BE encoding of code point U+10FFFF, modulo big-
endian versus little-ending). Or possibly the UTF-32 encoding, 0010 FFFF.

I suppose that's not terrible, except for the O(n) string operations
which is just dumb. Yes, it's better than buggy, broken strings. But
still dumb, because those aren't the only choices. For example, for the
sake of an extra two bytes at the start of each string, they could store
a flag and a length:

- one bit to flag whether the string contained any surrogate pairs or
not; if not, string ops could assume two-bytes per char and be O(1), if
the flag was set it could fall back to the slower technique;

- 15 bits for a length.

15 bits give you a maximum length of 32767. There are ways around that.
E.g. a length of 0 through 32766 means exactly what it says; a length of
32767 means that the next two bytes are part of the length too, giving
you a maximum of 4294967295 characters per string. That's an 8GB string.
Surely big enough for anyone

That gives you O(1) length for *any* string, and O(1) indexing operations
for those that are entirely in the BMP, which will be most strings for
most people. It's not 1970 anymore, it's time for strings to be treated
more seriously and not just as dumb arrays of char. Even back in the
1970s Pascal had a length byte. It astonishes me that hardly any low-
level language follows their lead.

Piet van Oostrum · Nov 18, 2013

Chris Angelico said:
Right, that's what I was looking for and didn't find. (I was searching
the one-page reference manual rather than reading in detail.) So, yes,
they're O(n) operations. Thanks for hunting that down.

ChrisA

It would be so much better to use the Flexible String Representation.

Steven D'Aprano · Nov 18, 2013

http://ceylon-lang.org/documentation/1.0/introduction/

I must say there are a few questionable design choices, in my opinion,
but I am absolutely in love with the following two features:

1) variables are constant by default;

2) the fat arrow operator.

By default, "variables" can only be assigned to once, and then not re-
bound:

String bye = "Adios"; //a value
bye = "Adeu"; //compile error

variable Integer count = 0; //a variable
count = 1; //allowed

(I'm not sure how tedious typing "variable" will get, or whether it will
encourage a more functional-programming approach. But I think that's a
very exciting idea and kudos to the Ceylon developers for running with
it!)

Values can be recalculated every time they are used, sort of like mini-
functions, or thunks:

String name { return firstName + " " + lastName; }

Since this is so common in Ceylon, they have syntactic sugar for it, the
fat arrow:

String name => firstName + " " + lastName;

If Python steals this notation, we could finally bring an end to the
arguments about early binding and late binding of default arguments:

def my_function(a=[early, binding, happens, once],
b=>[late, binding, happens, every, time]
):
...

Want!

These two features alone may force me to give Ceylon a try.

Looking for feedback on this markup language I developed and my website idea?	0	Jun 17, 2023
open office in another language?	8	Jan 10, 2012
Can't decide which language to get back into programming with	1	Mar 28, 2023
I look for a tool like webalizer in python language	0	Jan 10, 2012
How to get education and coding job coming from abroad starting new in the US? Advice of courses or places to look?	2	May 18, 2023
Generate one HTML from API based on the object key language and their value	2	Aug 19, 2022
Another JVM based language for preview	0	Nov 26, 2013
learning another programing language	2	Jul 24, 2011

Oh look, another language (ceylon)

Neal Becker

Gregory Ewing

Chris Angelico

jkn

Gregory Ewing

Tim Daneliuk

Rick Johnson

Chris Angelico

Gregory Ewing

Chris Angelico

wxjmfauth

Mark Lawrence

Chris Angelico

Ian Kelly

Chris Angelico

Steven D'Aprano

Chris Angelico

Steven D'Aprano

Piet van Oostrum

Steven D'Aprano

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads