Indentation and optional delimiters

bearophileHUGS · Feb 26, 2008

This is the best praise of semantic indentation I have read so far, by
Chris Okasaki:
http://okasaki.blogspot.com/2008/02/in-praise-of-mandatory-indentation-for.html

A quotation:

Imagine my surprise when I started teaching this language and found the students picking it up faster than any language I had ever taught before. As fond as I am of the language, I'm certainly under no illusions that it's the ultimate teaching language. After carefully watching the kinds of mistakes the students were and were not making, I gradually realized that the mandatory indentation was the key to why they were doing better.<

I have appreciated that article, and I have personally seen how fast
students learn Python basics compared to other languages, but I think
that it's way more than just indentation that makes the Python
language so quick to learn [see appendix].

I used to like indentation-based block delimiting years before finding
Python, and this article tells me that it may be a good thing for
other languages too, despite some disadvantages (but it's little
probable such languages will change, like the D language). Some people
have actually tried it in other languages:
http://people.csail.mit.edu/mikelin/ocaml+twt/
So I may try to write something similar for another language too.

One of the most common complaints about it is this written by James on
that blog:

I prefer explicit delimiters because otherwise the line wrapping of code by various email programs, web mail, mailing list digesters, newsgroup readers, etc., often results in code that no longer works.<

A possible solution to this problem is "optional delimiters". What's
the path of less resistance to implement such "optional delimiters"?
Is to use comments. For example: #} or #: or something similar.
If you use such pairs of symbols in a systematic way, you have cheap
"optional delimiters", for example:

def insort_right(a, x, lo=0, hi=None):
if hi is None:
hi = len(a)
#}
while lo < hi:
mid = (lo + hi) // 2
if x < a[mid]:
hi = mid
#}
else:
lo = mid+1
#}
#}
a.insert(lo, x)
#}

It looks a bit ugly, but a script is able to take such code even
flattened:

def insort_right(a, x, lo=0, hi=None):
if hi is None:
hi = len(a)
#}
while lo < hi:
mid = (lo + hi) // 2
if x < a[mid]:
hi = mid
#}
else:
lo = mid+1
#}
#}
a.insert(lo, x)
#}

And build the original Python code (it's possible to do the opposite
too, but it requires a bit more complex script). Such #} may even
become a "standard" (a convention. Not something enforced by the
compiler. What I'd like to see the Python compiler enforce is to raise
a syntax error if a module mixes tabs and spaces) for Python, so then
it's easy to find the re-indenter script when you find flattened code
in some email/web page, etc.

-------------------------------

Appendix:
I believe there can exist languages even faster than Python to learn
by novices. Python 3.0 makes Python even more tidy, but Python itself
isn't the most semantically clear language possible. I have seen that
the widespread reference semantics in Python is one of the things
newbies need more time to learn and understand. So it can be invented
a language (that may be slower than Python, but many tricks and a JIT
may help to reduce this problem) where

a = [1, 2, 3]
b = a
Makes b a copy-on-write copy of a, that is without reference
semantics.
Other things, like base-10 floating point numbers, and the removal of
other complexity allow to create a language more newbie-friendly. And
then I think I too can see myself using such simple to use but
practically useful language for very quick scripts, where running
speed is less important, but where most possible bugs are avoided
because the language semantics is rich but very clean. Is some one
else interested in such language?
Such language may even be a bit fitter than Python for an (uncommon)
practice called "real time programming" where an artist writes code
that synthesizes sounds and music on the fly ;-)

castironpi · Feb 26, 2008

This is the best praise of semantic indentation I have read so far, by
Chris Okasaki:http://okasaki.blogspot.com/2008/02/in-praise-of-mandatory-indentatio...

A quotation:

Imagine my surprise when I started teaching this language and found the students picking it up faster than any language I had ever taught before. As fond as I am of the language, I'm certainly under no illusions that it's the ultimate teaching language. After carefully watching the kinds of mistakes the students were and were not making, I gradually realized that the mandatory indentation was the key to why they were doing better.<

Click to expand...

I have appreciated that article, and I have personally seen how fast
students learn Python basics compared to other languages, but I think
that it's way more than just indentation that makes the Python
language so quick to learn [see appendix].

I used to like indentation-based block delimiting years before finding
Python, and this article tells me that it may be a good thing for
other languages too, despite some disadvantages (but it's little
probable such languages will change, like the D language). Some people
have actually tried it in other languages:http://people.csail.mit.edu/mikelin/ocaml+twt/
So I may try to write something similar for another language too.

One of the most common complaints about it is this written by James on
that blog:

I prefer explicit delimiters because otherwise the line wrapping of code by various email programs, web mail, mailing list digesters, newsgroup readers, etc., often results in code that no longer works.<

Click to expand...

A possible solution to this problem is "optional delimiters". What's
the path of less resistance to implement such "optional delimiters"?
Is to use comments. For example: #} or #: or something similar.
If you use such pairs of symbols in a systematic way, you have cheap
"optional delimiters", for example:

def insort_right(a, x, lo=0, hi=None):
if hi is None:
hi = len(a)
#}
while lo < hi:
mid = (lo + hi) // 2
if x < a[mid]:
hi = mid
#}
else:
lo = mid+1
#}
#}
a.insert(lo, x)
#}

It looks a bit ugly, but a script is able to take such code even
flattened:

def insort_right(a, x, lo=0, hi=None):
if hi is None:
hi = len(a)
#}
while lo < hi:
mid = (lo + hi) // 2
if x < a[mid]:
hi = mid
#}
else:
lo = mid+1
#}
#}
a.insert(lo, x)
#}

And build the original Python code (it's possible to do the opposite
too, but it requires a bit more complex script). Such #} may even
become a "standard" (a convention. Not something enforced by the
compiler. What I'd like to see the Python compiler enforce is to raise
a syntax error if a module mixes tabs and spaces) for Python, so then
it's easy to find the re-indenter script when you find flattened code
in some email/web page, etc.

-------------------------------

Appendix:
I believe there can exist languages even faster than Python to learn
by novices. Python 3.0 makes Python even more tidy, but Python itself
isn't the most semantically clear language possible. I have seen that
the widespread reference semantics in Python is one of the things
newbies need more time to learn and understand. So it can be invented
a language (that may be slower than Python, but many tricks and a JIT
may help to reduce this problem) where

a = [1, 2, 3]
b = a
Makes b a copy-on-write copy of a, that is without reference
semantics.

Why not b = copyonwrite( a )?

Other things, like base-10 floating point numbers, and the removal of
other complexity allow to create a language more newbie-friendly. And
then I think I too can see myself using such simple to use but
practically useful language for very quick scripts, where running
speed is less important, but where most possible bugs are avoided
because the language semantics is rich but very clean. Is some one
else interested in such language?
Such language may even be a bit fitter than Python for an (uncommon)
practice called "real time programming" where an artist writes code
that synthesizes sounds and music on the fly ;-)

Subclass the interpreter-- make your own session.

bearophileHUGS · Feb 26, 2008

(e-mail address removed):

Why not b = copyonwrite( a )?
Subclass the interpreter-- make your own session.

Your idea may work, but I am talking about a new language (with some
small differences, not a revolution). Making such language efficient
enough may require to add some complex tricks, copy-on-write is just
one of them, a JIT is probably useful, etc.

Thank you, bye,
bearophile

castironpi · Feb 26, 2008

(e-mail address removed):

Your idea may work, but I am talking about a new language (with some
small differences, not a revolution). Making such language efficient
enough may require to add some complex tricks, copy-on-write is just
one of them, a JIT is probably useful, etc.

Thank you, bye,
bearophile

It's Unpythonic to compile a machine instruction out of a script. But
maybe in the right situations, with the right constraints on a
function, certain chunks could be native, almost like a mini-
compilation. How much machine instruction do you want to support?

bearophileHUGS · Feb 26, 2008

(e-mail address removed):

It's Unpythonic to compile a machine instruction out of a script. But
maybe in the right situations, with the right constraints on a
function, certain chunks could be native, almost like a mini-
compilation. How much machine instruction do you want to support?

This language is meant for newbies, or for very quick scripts, or for
less bug-prone code, so optimizations are just a way to avoid such
programs run 5 times slower than Ruby ones ;-)

Bye,
bearophile

castironpi · Feb 26, 2008

(e-mail address removed):

This language is meant for newbies, or for very quick scripts, or for
less bug-prone code, so optimizations are just a way to avoid such
programs run 5 times slower than Ruby ones ;-)

Bye,
bearophile

My first thought is to accept ambiguities, and then disambiguate them
at first compile. Whether you want to record the disambiguations in
the script itself ("do not modify -here-"-style), or an annotation
file, could be optional, and could be both. Queueing an example... .
You could lose a bunch of the parentheses too, oy.

"It looks like you mean, 'if "jackson" exists in namesmap', but there
is also a 'namesmap' folder in the current working directory. Enter
(1) for dictionary, (2) for file system."

[snip]
if 'jackson' in namesmap:
->
if 'jackson' in namesmap: #namesmap.__getitem__
[snip]

automatically.

And while you're at it, get us Starcrafters a command-line interface.

build 3 new barracks at last click location
produce at capacity 30% marines, 20% seige tanks, 10% medics
attack hotspot 9 in attack formation d

def d( army, enemy, terrain ):. ha?

Steven D'Aprano · Feb 26, 2008

So it can be invented a language
(that may be slower than Python, but many tricks and a JIT may help to
reduce this problem) where

a = [1, 2, 3]
b = a
Makes b a copy-on-write copy of a, that is without reference semantics.

Usability for beginners is a good thing, but not at the expense of
teaching them the right way to do things. Insisting on explicit requests
before copying data is a *good* thing. If it's a gotcha for newbies,
that's just a sign that newbies don't know the Right Way from the Wrong
Way yet. The solution is to teach them, not to compromise on the Wrong
Way. I don't want to write code where the following is possible:

a = [gigabytes of data]
b = a
f(a) # fast, no copying takes place
g(b) # also fast, no copying takes places
.... more code here
.... and pages later
b.append(1)
.... suddenly my code hits an unexpected performance drop
.... as gigabytes of data get duplicated

wolfram.hinderer · Feb 26, 2008

A possible solution to this problem is "optional delimiters". What's
the path of less resistance to implement such "optional delimiters"?
Is to use comments. For example: #} or #: or something similar.
If you use such pairs of symbols in a systematic way, you have cheap
"optional delimiters", for example:

def insort_right(a, x, lo=0, hi=None):
if hi is None:
hi = len(a)
#}
while lo < hi:
mid = (lo + hi) // 2
if x < a[mid]:
hi = mid
#}
else:
lo = mid+1
#}
#}
a.insert(lo, x)
#}

It looks a bit ugly, but a script is able to take such code even
flattened:

def insort_right(a, x, lo=0, hi=None):
if hi is None:
hi = len(a)
#}
while lo < hi:
mid = (lo + hi) // 2
if x < a[mid]:
hi = mid
#}
else:
lo = mid+1
#}
#}
a.insert(lo, x)
#}

And build the original Python code (it's possible to do the opposite
too, but it requires a bit more complex script).

Have a look at Tools/Scripts/pindent.py

bearophileHUGS · Feb 26, 2008

Steven D'Aprano:

Usability for beginners is a good thing, but not at the expense of
teaching them the right way to do things. Insisting on explicit requests
before copying data is a *good* thing. If it's a gotcha for newbies,
that's just a sign that newbies don't know the Right Way from the Wrong
Way yet. The solution is to teach them, not to compromise on the Wrong
Way. I don't want to write code where the following is possible:
...
... suddenly my code hits an unexpected performance drop
... as gigabytes of data get duplicated

I understand your point of view, and I tend to agree.
But let me express my other point of view. Computer languages are a
way to ask a machine to do some job. As time passes, computers become
faster, and people find that it becomes possible to create languages
that are higher level, that is often more distant from how the CPU
actually performs the job, allowing the human to express the job in a
way closer to how less trained humans talk to each other and perform
jobs. Probably many years ago a language like Python was too much
costly in terms of CPU, making it of little use for most non-toy
purposes. But there's a need for higher level computer languages.
Today Ruby is a bit higher-level than Python (despite being rather
close). So my mostly alternative answers to your problem are:
1) The code goes slow if you try to perform that operation? It means
the JIT is "broken", and we have to find a smarter JIT (and the user
will look for a better language). A higher level language means that
the user is more free to ignore what's under the hood, the user just
cares that the machine will perform the job, regardless how, the user
focuses the mind on what job to do, the low level details regarding
how to do it are left to the machine. It's a job of the JIT writers to
allow the user to do such job anyway. So the JIT must be even smarter,
and for example it partitions the 1 GB of data in blocks, each one of
them managed with copy-on-write, so maybe it just copies few megabytes
or memory. Such language may need to be smart enough. Despite that I
think today lot of people that have a 3GHZ CPU that may accept to use
a language 5 times slower than Python, that for example uses base-10
floating point numbers (they are different from Python Decimal
numbers). Almost every day on the Python newsgroup a newbie asks if
the round() is broken seeing this:0.33000000000000002
A higher level language (like Mathematica) must be designed to give
more numerically correct answers, even if it may require more CPU. But
such language isn't just for newbies: if I write a 10 lines program
that has to print 100 lines of numbers I want it to reduce my coding
time, avoiding me to think about base-2 floating point numbers. If the
language use a higher-level numbers by default I can ignore that
problem, and my coding becomes faster, and the bugs decrease. The same
happens with Python integers: they don't overflow, so I may ignore lot
of details (like taking care of possible oveflows) that I have to
think about when I use the C language. C is faster, but such speed
isn't necessary if I need to just print 100 lines of output with a 3
GHz PC. What I need in such situation is a language that allows me to
ignore how numbers are represented by the CPU, and prints the correct
numbers on the file. This is just a silly example, but it may show my
point of view (another example is below).
2) You don't process gigabytes of data with this language, it's
designed to solve smaller problems with smaller datasets. If you want
to solve very big problems you have to use a lower level language,
like Python, or C, or assembly. Computers allow us to solve bigger and
bigger problems, but today the life is full of little problems too,
like processing a single 50-lines long text file.
3) You buy an even faster computer, where even copying 1 GB of data is
fast enough.

Wolfram:

Have a look at Tools/Scripts/pindent.py

Oh, that's it, almost. Thank you.
Bye,
bearophile

-----------------------

Appendix:

Another example, this is a little problem from this page:
http://www.faqs.org/docs/abs/HTML/writingscripts.html

Find the sum of all five-digit numbers (in the range 10000 - 99999) containing exactly two out of the following set of digits: { 4, 5, 6 }. These may repeat within the same number, and if so, they count once for each occurrence.<

I can solve it in 3.3 seconds on my old PC with Python like this:

print sum(n for n in xrange(10000, 100000) if len(set(str(n)) &
set("456")) == 2)

[Note: that's the second version of the code, the first version was
buggy because it contained:
.... & set([4, 5, 6])

So I have used the Python shell to see what set(str(12345))&set("456")
was, the result was an empty set. So it's a type bug. A statically
language like D often can't catch such bugs anyway, because chars are
seen as numbers.]

In Python I can write a low-level-style code like this that requires
only 0.4 seconds with Psyco (it's backported from the D version,
because it has allowed me to think at lower-level. I was NOT able to
reach such low level and high speed writing a progam just for Psyco):

def main():
digits = [0] * 10
tot = 0
for n in xrange(10000, 100000):
i = n
digits[4] = 0
digits[5] = 0
digits[6] = 0
digits[i % 10] = 1; i /= 10
digits[i % 10] = 1; i /= 10
digits[i % 10] = 1; i /= 10
digits[i % 10] = 1; i /= 10
digits[i % 10] = 1
if (digits[4] + digits[5] + digits[6]) == 2:
tot += n
print tot
import psyco; psyco.bind(main)
main()

Or I can solve it in 0.07 seconds in D language (and about 0.05
seconds in very similar C code with -O3 -fomit-frame-pointer):

void main() {
int tot, d, i;
int[10] digits;
for (uint n = 10_000; n < 100_000; n++) {
digits[4] = 0;
digits[5] = 0;
digits[6] = 0;
i = n;
digits[i % 10] = 1; i /= 10;
digits[i % 10] = 1; i /= 10;
digits[i % 10] = 1; i /= 10;
digits[i % 10] = 1; i /= 10;
digits[i % 10] = 1;
if ((digits[4] + digits[5] + digits[6]) == 2)
tot += n;
}
printf("%d\n", tot);
}

Assembly may suggest a bit lower level ways to solve the same problem
(using an instruction to compute div and mod at the same time, that
can go in EAX and EDX?), etc.

But if I just need to solve that "little" problem once, I may want to
reduce the sum of programming time + running time, so the in such
situation the first Python version wins (despite the quickly fixed
bug). That's why today people often use Python instead of C for small
problems. Similar things can be said about a possible language that is
a little higher level than Python.

Bye,
bearophile

castironpi · Feb 26, 2008

Steven D'Aprano:

Usability for beginners is a good thing, but not at the expense of
teaching them the right way to do things. Insisting on explicit requests
before copying data is a *good* thing. If it's a gotcha for newbies,
that's just a sign that newbies don't know the Right Way from the Wrong
Way yet. The solution is to teach them, not to compromise on the Wrong
Way. I don't want to write code where the following is possible:
...
... suddenly my code hits an unexpected performance drop
... as gigabytes of data get duplicated

Click to expand...

I understand your point of view, and I tend to agree.
But let me express my other point of view. Computer languages are a
way to ask a machine to do some job. As time passes, computers become
faster, and people find that it becomes possible to create languages
that are higher level, that is often more distant from how the CPU
actually performs the job, allowing the human to express the job in a
way closer to how less trained humans talk to each other and perform
jobs. Probably many years ago a language like Python was too much
costly in terms of CPU, making it of little use for most non-toy
purposes. But there's a need for higher level computer languages.
Today Ruby is a bit higher-level than Python (despite being rather
close). So my mostly alternative answers to your problem are:
1) The code goes slow if you try to perform that operation? It means
the JIT is "broken", and we have to find a smarter JIT (and the user
will look for a better language). A higher level language means that
the user is more free to ignore what's under the hood, the user just
cares that the machine will perform the job, regardless how, the user
focuses the mind on what job to do, the low level details regarding
how to do it are left to the machine. It's a job of the JIT writers to
allow the user to do such job anyway. So the JIT must be even smarter,
and for example it partitions the 1 GB of data in blocks, each one of
them managed with copy-on-write, so maybe it just copies few megabytes
or memory. Such language may need to be smart enough. Despite that I
think today lot of people that have a 3GHZ CPU that may accept to use
a language 5 times slower than Python, that for example uses base-10
floating point numbers (they are different from Python Decimal
numbers). Almost every day on the Python newsgroup a newbie asks if
the round() is broken seeing this:>>> round(1/3.0, 2)

0.33000000000000002
A higher level language (like Mathematica) must be designed to give
more numerically correct answers, even if it may require more CPU. But
such language isn't just for newbies: if I write a 10 lines program
that has to print 100 lines of numbers I want it to reduce my coding
time, avoiding me to think about base-2 floating point numbers. If the
language use a higher-level numbers by default I can ignore that
problem, and my coding becomes faster, and the bugs decrease. The same
happens with Python integers: they don't overflow, so I may ignore lot
of details (like taking care of possible oveflows) that I have to
think about when I use the C language. C is faster, but such speed
isn't necessary if I need to just print 100 lines of output with a 3
GHz PC. What I need in such situation is a language that allows me to
ignore how numbers are represented by the CPU, and prints the correct
numbers on the file. This is just a silly example, but it may show my
point of view (another example is below).
2) You don't process gigabytes of data with this language, it's
designed to solve smaller problems with smaller datasets. If you want
to solve very big problems you have to use a lower level language,
like Python, or C, or assembly. Computers allow us to solve bigger and
bigger problems, but today the life is full of little problems too,
like processing a single 50-lines long text file.
3) You buy an even faster computer, where even copying 1 GB of data is
fast enough.

Wolfram:

Have a look at Tools/Scripts/pindent.py

Click to expand...

Oh, that's it, almost. Thank you.
Bye,
bearophile

-----------------------

Appendix:

Another example, this is a little problem from this page:http://www.faqs.org/docs/abs/HTML/writingscripts.html

Find the sum of all five-digit numbers (in the range 10000 - 99999) containing exactly two out of the following set of digits: { 4, 5, 6 }. These may repeat within the same number, and if so, they count once for each occurrence.<

Click to expand...

I can solve it in 3.3 seconds on my old PC with Python like this:

print sum(n for n in xrange(10000, 100000) if len(set(str(n)) &
set("456")) == 2)

[Note: that's the second version of the code, the first version was
buggy because it contained:
... & set([4, 5, 6])

So I have used the Python shell to see what set(str(12345))&set("456")
was, the result was an empty set. So it's a type bug. A statically
language like D often can't catch such bugs anyway, because chars are
seen as numbers.]

In Python I can write a low-level-style code like this that requires
only 0.4 seconds with Psyco (it's backported from the D version,
because it has allowed me to think at lower-level. I was NOT able to
reach such low level and high speed writing a progam just for Psyco):

def main():
digits = [0] * 10
tot = 0
for n in xrange(10000, 100000):
i = n
digits[4] = 0
digits[5] = 0
digits[6] = 0
digits[i % 10] = 1; i /= 10
digits[i % 10] = 1; i /= 10
digits[i % 10] = 1; i /= 10
digits[i % 10] = 1; i /= 10
digits[i % 10] = 1
if (digits[4] + digits[5] + digits[6]) == 2:
tot += n
print tot
import psyco; psyco.bind(main)
main()

Or I can solve it in 0.07 seconds in D language (and about 0.05
seconds in very similar C code with -O3 -fomit-frame-pointer):

void main() {
int tot, d, i;
int[10] digits;
for (uint n = 10_000; n < 100_000; n++) {
digits[4] = 0;
digits[5] = 0;
digits[6] = 0;
i = n;
digits[i % 10] = 1; i /= 10;
digits[i % 10] = 1; i /= 10;
digits[i % 10] = 1; i /= 10;
digits[i % 10] = 1; i /= 10;
digits[i % 10] = 1;
if ((digits[4] + digits[5] + digits[6]) == 2)
tot += n;
}
printf("%d\n", tot);

}

Assembly may suggest a bit lower level ways to solve the same problem
(using an instruction to compute div and mod at the same time, that
can go in EAX and EDX?), etc.

But if I just need to solve that "little" problem once, I may want to
reduce the sum of programming time + running time, so the in such
situation the first Python version wins (despite the quickly fixed
bug). That's why today people often use Python instead of C for small
problems. Similar things can be said about a possible language that is
a little higher level than Python.

Bye,
bearophile

You're looking at a few variables.
1) Time to code as a function of person / personal characteristic and
program
2) Time to run as a function of machine and program
3) Bugs (distinct bugs) as a function of person / personal
characteristic and program
3a) Bug's obviousness upon running ... ( person, program ) -- the
program screwed up, but person can't tell 'til later -- ( for program
with exactly one bug, or func. of ( person, program, bug ) )
3b) Bug's time to fix ( person, program [, bug ] )
3c) Bug incidence -- count of bugs the first time through ( person,
program )

(3) assumes you have experts and you're measuring number of bugs &c.
compared to a bug-free ideal in a lab. If no one knows if a program
(say, if it's large) has bugs, high values for (3a) might be
important.
(1)-(3) define different solutions to the same problem as different
programs, i.e. the program states its precise implementation, but then
the only thing that can vary data point to data point is variable
names, i.e. how precise the statement, and only to a degree: you might
get bugs in a memory manager even if you reorder certain ("ideally
reorderable") sequences of statements; and you might get variations if
you use paralell arrays vs. structures vs. paralell variable names.
Otherwise, you can specify an objective, deterministic, not
necessarily binary, metric of similarity and identity of programs.
Otherwise yet, a program maps input to output (+ residue, the
difference in machine state start to completion), so use descriptive
statistics (mean, variance, quartiles, outliers, extrema) on the
answers. E.g., for (2), the fastest C program (sampled) (that maps I-

O) way surpasses the fastest Perl program (sampled), and it was

written by Steve C. Guru, and we couldn't find Steve Perl Guru; and
besides, the means across programs in C and Perl show no statistically
significant difference at the 96% confidence level. And besides,
there is no algorithm to generate even the fastest-running program (of
a problem/spec) for a machine in a language, much less (1) and (3)!
So you're looking at ( coder with coder trait or traitless, program
problem, program solution, language implementation, op'ing sys.,
hardware, inital state ) for variables in your answers. That's one of
the obstructions anyway to rigorous metrics of languages: you never
run the language. (Steve Traitless Coder-- v. interesting.-- given
nothing but the problem, the install and machine, and the internet--
throw doc. traits and internet connection speed in!-- how good is a
simple random sample?-- or Steve Self-Proclaimed Non-Zero Experience
and Familiarity Perl Coder, or Steve Self-Proclaimed Non-Trivial
Experience and Familiarity Perl Coder.)

And don't forget a bug identity metric too-- if two sprout up while
fixing one, is that one, two, or three? Do the answers to (1) and (2)
vary with count of bugs remaining? If a "program" maps input to
output, then Python has never been written.

That doesn't stop you from saying what you want though - what your
priorities are:
1) Time to code. Important.
2) Time to run. Unimportant.
3a) Bug obviousness. Important.
3b) Bug time to fix. Important.
3c) Bug incidence. Less important.

Ranked.
1) Time to code.
2) Bug obviousness. It's ok if Steve Proposed Language Guru rarely
codes ten lines without a bug, so long as he can always catch them
right away.
3) Bug time to fix.
4) Bug incidence.
unranked) Time to run.

Are you wanting an interpreter that runs an Amazon Cloud A.I. to catch
bugs? That's another $0.10, please, ma'am.

b.append(1)
... suddenly my code hits an unexpected performance drop

Expect it, or use a different data structure.

Steven D'Aprano · Feb 28, 2008

By the way bearophile... the readability of your posts will increase a
LOT if you break it up into paragraphs, rather than use one or two giant
run-on paragraphs.

My comments follow.

Steven D'Aprano:

I understand your point of view, and I tend to agree. But let me express
my other point of view. Computer languages are a way to ask a machine to
do some job. As time passes, computers become faster,

But never fast enough, because as they get faster, we demand more from
them.

and people find
that it becomes possible to create languages that are higher level, that
is often more distant from how the CPU actually performs the job,
allowing the human to express the job in a way closer to how less
trained humans talk to each other and perform jobs.

Yes, but in practice, there is always a gap between what we say and what
we mean. The discipline of having to write down precisely what we mean is
not something that will ever go away -- all we can do is use "bigger"
concepts, and thus change the places where we have to be precise.

e.g. the difference between writing

index = 0
while index < len(seq):
do_something_with(seq[index])
index += 1

and

for x in seq:
do_something_with(x)

is that iterating over an object is, in some sense, a "bigger" concept
than merely indexing into an array. If seq happens to be an appropriately-
written tree structure, the same for-loop will work, while the while loop
probably won't.

Probably many years
ago a language like Python was too much costly in terms of CPU, making
it of little use for most non-toy purposes. But there's a need for
higher level computer languages. Today Ruby is a bit higher-level than
Python (despite being rather close). So my mostly alternative answers to
your problem are: 1) The code goes slow if you try to perform that
operation? It means the JIT is "broken", and we have to find a smarter
JIT (and the user will look for a better language).

[...]

Of course I expect that languages will continue to get smarter, but there
will always be a gap between "Do What I Say" and "Do What I Mean".

It may also turn out that, in the future, I won't care about Python4000
copying ten gigabytes of data unexpectedly, because copying 10GB will be
a trivial operation. But I will care about it copying 100 petabytes of
data unexpectedly, and complain that Python4000 is slower than G.

The thing is, make-another-copy and make-another-reference are
semantically different things: they mean something different. Expecting
the compiler to tell whether I want "x = y" to make a copy or to make
another reference is never going to work, not without running "import
telepathy" first. All you can do is shift the Gotcha! moment around.

You should read this article:

http://www.joelonsoftware.com/articles/fog0000000319.html

It specifically talks about C, but it's relevant to Python, and all
hypothetical future languages. Think about string concatenation in Python.

A higher level
language means that the user is more free to ignore what's under the
hood, the user just cares that the machine will perform the job,
regardless how, the user focuses the mind on what job to do, the low
level details regarding how to do it are left to the machine.

More free, yes. Completely free, no.

Despite that I think today lot of people that have a 3GHZ CPU
that may accept to use a language 5 times slower than Python, that for
example uses base-10 floating point numbers (they are different from
Python Decimal numbers). Almost every day on the Python newsgroup a
newbie asks if the round() is broken seeing this:
0.33000000000000002
A higher level language (like Mathematica) must be designed to give more
numerically correct answers, even if it may require more CPU. But such
language isn't just for newbies: if I write a 10 lines program that has
to print 100 lines of numbers I want it to reduce my coding time,
avoiding me to think about base-2 floating point numbers.

Sure. But all you're doing is moving the Gotcha around. Now newbies will
start asking why (2**0.5)**2 doesn't give 2 exactly when (2*0.5)*2 does.
And if you fix that by creating a surd data type, at more performance
cost, you'll create a different Gotcha somewhere else.

If the
language use a higher-level numbers by default I can ignore that
problem,

But you can't. The problem only occurs somewhere else: Decimal is base
10, and there are base 10 numbers that can't be expressed exactly no
matter how many bits you use. They're different from the numbers you
can't express exactly in base 2 numbers, and different from the numbers
you can't express exactly as rationals, but they're there, waiting to
trip you up:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError

castironpi · Feb 28, 2008

By the way bearophile... the readability of your posts will increase a
LOT if you break it up into paragraphs, rather than use one or two giant
run-on paragraphs.

My comments follow.

Steven D'Aprano:

Click to expand...

I understand your point of view, and I tend to agree. But let me express
my other point of view. Computer languages are a way to ask a machine to
do some job. As time passes, computers become faster,

Click to expand...

But never fast enough, because as they get faster, we demand more from
them.

and people find
that it becomes possible to create languages that are higher level, that
is often more distant from how the CPU actually performs the job,
allowing the human to express the job in a way closer to how less
trained humans talk to each other and perform jobs.

Click to expand...

Yes, but in practice, there is always a gap between what we say and what
we mean. The discipline of having to write down precisely what we mean is
not something that will ever go away -- all we can do is use "bigger"
concepts, and thus change the places where we have to be precise.

e.g. the difference between writing

index = 0
while index < len(seq):
do_something_with(seq[index])
index += 1

and

for x in seq:
do_something_with(x)

is that iterating over an object is, in some sense, a "bigger" concept
than merely indexing into an array. If seq happens to be an appropriately-
written tree structure, the same for-loop will work, while the while loop
probably won't.

Probably many years
ago a language like Python was too much costly in terms of CPU, making
it of little use for most non-toy purposes. But there's a need for
higher level computer languages. Today Ruby is a bit higher-level than
Python (despite being rather close). So my mostly alternative answers to
your problem are: 1) The code goes slow if you try to perform that
operation? It means the JIT is "broken", and we have to find a smarter
JIT (and the user will look for a better language).

Click to expand...

[...]

Of course I expect that languages will continue to get smarter, but there
will always be a gap between "Do What I Say" and "Do What I Mean".

It may also turn out that, in the future, I won't care about Python4000
copying ten gigabytes of data unexpectedly, because copying 10GB will be
a trivial operation. But I will care about it copying 100 petabytes of
data unexpectedly, and complain that Python4000 is slower than G.

The thing is, make-another-copy and make-another-reference are
semantically different things: they mean something different. Expecting
the compiler to tell whether I want "x = y" to make a copy or to make
another reference is never going to work, not without running "import
telepathy" first. All you can do is shift the Gotcha! moment around.

You should read this article:

http://www.joelonsoftware.com/articles/fog0000000319.html

It specifically talks about C, but it's relevant to Python, and all
hypothetical future languages. Think about string concatenation in Python.

A higher level
language means that the user is more free to ignore what's under the
hood, the user just cares that the machine will perform the job,
regardless how, the user focuses the mind on what job to do, the low
level details regarding how to do it are left to the machine.

Click to expand...

More free, yes. Completely free, no.

Despite that I think today lot of people that have a 3GHZ CPU
that may accept to use a language 5 times slower than Python, that for
example uses base-10 floating point numbers (they are different from
Python Decimal numbers). Almost every day on the Python newsgroup a
newbie asks if the round() is broken seeing this:
0.33000000000000002
A higher level language (like Mathematica) must be designed to give more
numerically correct answers, even if it may require more CPU. But such
language isn't just for newbies: if I write a 10 lines program that has
to print 100 lines of numbers I want it to reduce my coding time,
avoiding me to think about base-2 floating point numbers.

Click to expand...

Sure. But all you're doing is moving the Gotcha around. Now newbies will
start asking why (2**0.5)**2 doesn't give 2 exactly when (2*0.5)*2 does.
And if you fix that by creating a surd data type, at more performance
cost, you'll create a different Gotcha somewhere else.

If the
language use a higher-level numbers by default I can ignore that
problem,

Click to expand...

But you can't. The problem only occurs somewhere else: Decimal is base
10, and there are base 10 numbers that can't be expressed exactly no
matter how many bits you use. They're different from the numbers you
can't express exactly in base 2 numbers, and different from the numbers
you can't express exactly as rationals, but they're there, waiting to
trip you up:

Decimal("0.3333333333333333333333333333")>>> assert x*3 == d(1)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError

"Gotcha"s is what I meant by "bugs". Something in the language
doesn't follow the writer's flow of thought. You can bring thoughts
to match the language with practice-- piano and chess spring to mind--
and you can bring the language to match thoughts. But whose, and what
are they? A good start is the writer's native spoken language (or
perhaps a hypothetical language for musicians, the libraries of which
resemble intervals and harmonies, and syntax rhythms), but context
(user history) is so richly populated, and computer languages so
young, that they don't really measure. No pun intended. It'd be like
(bear with me) dozens of different versions of a given import library,
each with subtleties that makes calling functions in it really fluid.

Three thought experiments: One, remove all the identifiers from a
program; make them numbers or unrelated names. You can have the
source to the libraries but their identifiers are gone too. And you
can have the specs for the architecture. So, after days of study, you
narrowed down what the file write statement is--- which got harder as
the language got higher-level. No you can't run it; no comments.
(What about strings and literals? Hmmm, hear me to the point first.)
However, a name is consistent from one usage to another: you can trace
them, but not infer anything from them.

For the second one, simulate many typos in a working program, and hand
it to a human to correct. (For a variation, the compiler can be
present and running is fine.) The typos include missing parentheses
(ouch), mispelled identifiers, misordered statements (!?), and
indentation mistakes ( / brace misplacements ).

Last, take a working program, and explain it only in human language to
another person who has no computer experience (none), but does accept
you want to do these things, and (due to some other facet of the
situation) wants to learn. He's asked later by another party about
details of it, in order to verify understanding.

Your success in these examples is a benchmark for computer performance
in understanding what a human wrote. It doesn't know what you mean,
but if any one of the three aspects the experiments illustrated is
missing, here's my point, the human doesn't either.

The more you have in common with the reader, the easier it is for him
to decipher your code. If he speaks the same language (human -and-
computer) as you, took the same class as you, works on the same
project as you at work, and has known you personally for a really long
time, maybe it would go pretty quick in the first two examples. With
a language in common, a knowledge of math and science, and a history
together, the third is quick too. But take a foreign stranger really
in to astrology, and the third is slow; take an author from a
different background as you, and the first two are slow too.

What does that say about the perfect computer language?

It can tolerate, or at least question you about, a few errors. ("Did
you mean...?") It can refer to idiosynchracies you've used elsewhere
in your code. It can refer to other mistakes you've made in (grin!)
other programs, and what their resolution was. You're plagued by a
certain gotcha more than other speakers, but your spelling is
teriffic, so that's probably not the problem, which makes something
else more likely to be.

For the examples, they may not be so easy to find in Python,
considering the One Obvious Way principle, but they may abound in
another, and I don't just mean brace placement.

Lastly, to be specific: the language of course can't question you
about anything. But something about the perfect language, either
syntax or libraries, makes tools possible that can.

bearophileHUGS · Feb 28, 2008

Steven D'Aprano:

the readability of your posts will increase a LOT if you break it up into paragraphs,<

You are right, I'll try to do it (when I go in flux state I write
quickly, but I tend to produce long paragraphs).

The thing is, make-another-copy and make-another-reference are semantically different things: they mean something different. Expecting the compiler to tell whether I want "x = y" to make a copy or to make another reference is never going to work,<

But the default behavior may become the "true" copy, that seems
simpler for a newbie to grasp. The language then may give a tool to
use references too (like passing arrays to functions in Pascal, you
can use "var" for pass-by-reference reference).

It specifically talks about C, but it's relevant to Python, and all hypothetical future languages. Think about string concatenation in Python.<

It's a nice article, and it says many things. (The D language manages
strings in a good enough way, they are represented below the hood as
stack allocated structs of [pointer_begin, length] in UTF8/16/32 (C
strings are available too, static arrays of chars too, etc) that point
to a block on the GC-ected heap. Such representation may change into a
[pointer_begin, pointer_end] in the future, to speed up iterations.
But it lacks still a good way to do a reserve() like in STL C++
Vector, that requires a third value in the struct).

I presume you have linked me that article because it essentially
explains the concept of "leaking abstractions", that is even if your
system allows you to manage it through high-level abstractions, you
often have to know what's under the cover anyway, because the behavior
of such subsystems may have a large impact on the performance of the
high-level operations you try to perform.
If this is why you have pointed me that article, then I have the
following comments:

1) That's why when I have started learning Python I was asking here
about the computational complexity of various operations done by
Python, like the string methods. Later I have found that the best
strategy is to just perform thousands of little benchmarks. Even later
I have started to just read the C sources of Python

2) The language I was talking about isn't meant to replace C or Java,
it's meant for newbies, students, and to be used on small problems. If
the problem is small enough, you can survive even if you ignore the
subsystems of the system you are using. So if you have 10 small
strings that you want to join once in a Python program you can use the
"+" too, without wasting too much running time. If you want more speed
or you want to solve bigger problems you can use different languages.

3) Subsystems may have a different degree of subsystem insulation:

3a) One subsystem that exists when I run a Python program is its GC,
but in most situations I can just ignore it, it works in a transparent
way, the abstraction doesn't leak much. Only in rare situations I may
have to disable it during a very fast creation of a complex data
structure, so it doesn't slow down that too much. When I do even more
complex things, with complex classes that use __del__ I may have to
think about the GC even more, but in most small programs the Python GC
is transparent.

3b) On the other hand, when I use the D language in a simple way I can
ignore its GC, using D almost as Java. But often I have to use D at a
lower level (because otherwise I prefer to use Python), I have to
manually allocate and deallocate data from the C heap or from the GC
heap, and in such cases the situation becomes hairy quickly, because I
don't know how exactly the GC will interact with my manual memory
management (that Python disallows). So often in D the GC is an almost
mysterious subsystem, and a very leaking abstraction (if you do
complex windows programming in C++, with smart pointers, etc, you may
have similar problems, so it's not a problem limited to D).

They're different from the numbers you can't express exactly in base 2 numbers, and different from the numbers you can't express exactly as rationals, but they're there, waiting to trip you up:<

If you want to become a CS student, a programmer, you have to know
something about IEEE 754, or even you have to study "What Every
Computer Scientist Should Know About Floating-Point Arithmetic". But
for a person that doesn't want to spend so much time to solve a small
and not important problem, and doesn't want to hire a programmer to
solve that problem do that for him/her, a very high level language may
offer ways to the same mathematical operations without too much
problems. If you use an old HP 48 calculator you have to be careful,
but often it gives you good answers

Thank you,
bearophile

castironpi · Feb 28, 2008

But the default behavior may become the "true" copy, that seems
simpler for a newbie to grasp. The language then may give a tool to
use references too (like passing arrays to functions in Pascal, you
can use "var" for pass-by-reference reference).

Do you want all the power? Do you want to take students in a specific
direction? If you're preparing them for the language of tomorrow, the
ask the people who foresaw C++ from C. And visionaries are good
resources-- the person who designed it might cut it too.

It sounds like you want to give them control of pretty much every
aspect if they choose, but can wave hands at a very high level too
("But -this- is important"-style).

But that's what Python does! Write your own structure in C, and the
program in Python. Perhaps maybe for extra credit, you can have
students tweak a 10% running time by "upgrading" their structure to
a... ahem... lower language.

Last but not least, try lopping off a few parentheses:

if obj exists:
->
if obj.exists() if hasattr( obj, 'exists' )
if exists( obj ) if os.path.exists is imported, or callable( exists )
in general
->
RuntimeAmbiguityError if both are true.

What do you think?

Terry Reedy · Feb 28, 2008

| But the default behavior may become the "true" copy, that seems
| simpler for a newbie to grasp.

To me, it is the opposite. If I say
gvr = Guido_van_Russum # or any natural language equivalent
do you really think a copy is made?

Copying is much more work than defining an alias or nickname.

castironpi · Feb 29, 2008

| But the default behavior may become the "true" copy, that seems
| simpler for a newbie to grasp.

To me, it is the opposite. If I say
gvr = Guido_van_Russum # or any natural language equivalent
do you really think a copy is made?

Copying is much more work than defining an alias or nickname.

It's interesting. If I say minigvr= Guido_van_Rossum except smaller,
my listener carries both the original model and the exception around
during the conversation. minigvr= type( 'MiniGvr',
( Guido_van_Rossum, ), dict( size= 0.5 ) )(), creates a MiniGvr class -
and- -instantiates- it, just with a different size. It depends-- if
you say, "what if mini-guido were to go to the store" it's very
different from saying, "what if mini-guidos were to go to the store"?
The first 'mini-guido' is an instance, and you're running the
hypothesis in a sandbox or playpen. The second is different. The
listener subclasses Guido_van_Rossum, but still in the sandbox. If
you said, 'a mini-guido went to the store yesterday', the listener
would run type( 'MiniGvr', ( Guido_van_Rossum, ), dict( size= 0.5 ) )
().goestostore( time= Time.Yesterday ).

What's more, you can the next day say, "Remember that miniguido that
went to the store," "Remember that miniguido I told you about
yesterday," or even, "Remember that miniguido that went to the store
that I told you about?" and your listener can say, "Yeah, that was two
days ago now." However, isinstance checks sometimes don't work:
"Remember that GvR that went to the store?" "No, you never told me
GvR went to the store.... wait, unless you mean the -Mini-GvR."

Your actual call is closer to this: memory.add( event=
Event.GoToStore( type( 'MiniGvr', ( Guido_van_Rossum, ), { 'size':
0.5 } )() ), time= Time.Yesterday ), but Go is also abstracted
( (destination= Store) ), Store is abstracted (SomeStore), (unless you
and the listener have a "the store" you always call in common), and he
may actually interpret "was at the store" instead of "went to",
depending on your interaction's particular idiolect; you might use
slightly different words to tell a co-worker the same story. Last but
not least, the data that filters down into the hardware of the brain
is partly non-propositional-- the listener gets a somewhat clear
picture in your preamble, which varies in clarity and what detail from
speaker to speaker and listener to listener pair.

If you want a computer language to model human thought, then is there
even such thing as subclassing? Otherwise, it's a toolbox, and pass-
by-reference is to a flathead screwdriver as pass-by-value is to a
Phillips. Which one do you want to carry, and which is the snap-on
extension? Doesn't that vary trade-to-trade, tradesman-to-tradesman,
and even site-to-site?

Steve Holden · Feb 29, 2008

> If you want a computer language to model human thought, then is there
even such thing as subclassing?

Kindly try to limit your ramblings to answerable questions. Without keen
insight into the function of the mind that is currently available to the
psychological and psychiatric professions, such philosophical
speculation is unlikely to do anyone any good, least of all you.

regards
Steve

Steve Holden · Feb 29, 2008

Steve said:
Kindly try to limit your ramblings to answerable questions. Without keen
insight into the function of the mind that is currently available to the
psychological and psychiatric professions, such philosophical
speculation is unlikely to do anyone any good, least of all you.

^available^unavailable^

castironpi · Feb 29, 2008

Steve said:
Steve said:

(e-mail address removed) wrote:
[...]
> If you want a computer language to model human thought, then is there

even such thing as subclassing?

Click to expand...

Click to expand...

Kindly try to limit your ramblings to answerable questions. Without keen
insight into the function of the mind that is currently available to the
psychological and psychiatric professions, such philosophical
speculation is unlikely to do anyone any good, least of all you.

Click to expand...

^available^unavailable^

In the neuroscience book I read (_Neuroscience_), my favorite section
was entitled, "Mechanisms of Long-Term Synaptic Plasticity in the
Mammalian Nervous System." Maybe reptiles do, I guess. <subclasses
Food>... and Python -is- a reptile...

Indentation styles	27	May 7, 2013
Script that converts between indentation and curly braces in Pythoncode	1	Jul 30, 2013
PEP 8 and indentation of continuation lines	0	Jun 21, 2011
if, continuation and indentation	23	May 27, 2010
How do I get this to actually create what I want?	0	Sep 4, 2024
yet another indentation proposal	16	Aug 18, 2007
Decorators, with optional arguments	3	Jul 2, 2010
how to reduce bugs due to incorrect indentation	17	Feb 5, 2014

Indentation and optional delimiters

bearophileHUGS

castironpi

bearophileHUGS

castironpi

bearophileHUGS

castironpi

Steven D'Aprano

wolfram.hinderer

bearophileHUGS

castironpi

Steven D'Aprano

castironpi

bearophileHUGS

castironpi

Terry Reedy

castironpi

Steve Holden

Steve Holden

castironpi

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads