J
jacob navia
http://apenwarr.ca/log/?m=201007#22
<quote>
Okay, one more example of C++ terribleness. This one is actually a
tricky one, so I can almost forgive the C++ guys for not thinking up the
"right" solution. But it came up again for me the other day, so I'll
rant about it too: dictionary item assignment.
What happens when you have, say, a std::map of std::string and you do
m[5] = "chicken"? Moreover, what happens if there is no m[5] and you do
std::string x = m[5]?
Answer: m[5] "autovivifies" a new, empty string and stores it in
location 5. Then it returns a reference to that location, which in the
first example, you reassign using std::string:
perator=. In the second
example, the autovivified string is copied to x - and left happily
floating around, empty, in m[5].
Ha ha! In what universe are these semantics reasonable? In what rational
set of rules does the right-hand-side of an assignment statement get
modified by default? Maybe I'm crazy - no, that's not it - but when I
write m[5] and there's no m[5], I think there are only two things that
are okay to happen. Either m[5] returns NULL (a passive indicator that
there is no m[5], like you'd expect from C) or m[5] throws an exception
(an aggressive indicator that there is no m[5], like you'd see in python).
Ah, you say. But look! If that happened, then the first statement - the
one assigning to m[5] - wouldn't work! It would crash because you end up
assigning to NULL!
Yes. Yes it would. In C++ it would, because the people who designed C++
are idiots.
But in python, it works perfectly (even for user-defined types). How?
Simple. Python's parser has a little hack in it - which I'm sure must
hurt the python people down to the cores of their souls, so much do they
hate hacks - that makes m[5]= parse differently than just plain m[5].
The python parser converts o[x]=y directly into o.__setitem__(x,y).
Whereas o[x] without a trailing equal sign converts directly into
o.__getitem__(x). It's very sad that the parser has to do such utterly
different things with two identical-looking uses of the square bracket
operator. But the result is you get what you expect: __getitem__ throws
an exception if there's no m[5]. __setitem__ doesn't. __setitem__ puts
stuff into your object; it doesn't waste time pulling stuff out of your
object (unless that's a necessary internal detail for your data
structure implementation).
But even that isn't the worst thing. Here's what's worse: C++'s crazy
autovivification stuff makes it slower, because you have to construct an
object just so you can throw it away and reassign it. Ha ha! The crazy
language where supposedly performance is all-important actually assigns
to maps slower than python can! All in the name of having language
purity, so we don't have to have stupid parser hacks to make [] behave
two different ways!
....
"...Well," said the C++ people. "Well. We can't have that."
So here's what they invented. Instead of inventing a sensible new []=
operator, they went even more crazy. They redefined things such that, if
your optimizer is sufficiently smart, it can make all the extra crap go
away.
There's something in C++ called the "return value optimization."
Normally, if you do something like "MyObj x = f()", and f returns a
MyObj, then what would need to happen is that 'x' gets constructed using
the default constructor, then f() constructs a new object and returns
it, and then we call x.operator= to copy the object from f()'s return
value, then we destroy f()'s return value.
As you might imagine, when implementing the [] setter on a map, this
would be kind of inefficient.
But because the C++ people so desperately wanted this sort of thing to
be fast, they allowed the compiler to optimize out the creation of x and
the copy operation; instead, they just tell f() to construct its return
value right into x. If you think about it hard enough, you can see that,
assuming the stars all align perfectly, m[5] = "foo" can benefit from
this operation. Probably only if m.operator[] is inlined, but of course
it is - it's a template! Everything in a template is inlined! Ha ha!
So actually C++ maps are as fast as python maps, assuming your compiler
writers are amazingly great, and a) implement the (optional)
return-value optimization; b) inline the right stuff; and c) don't screw
up their overcomplicated optimizer so that it makes your code randomly
not work in other places.
Okay, cool, right? Isn't this a triumph of engineering - an amazingly
world class optimizer plus an amazingly supercomplex specification that
allows just the right combination of craziness to get what you want?
NO!
No it is not!
It is an absolute failure of engineering! Do you want to know what real
engineering is? It's this:
map_set(m, 5, "foo");
char *x = map_get(m, 5);
That plain C code runs exactly as fast as the above hyperoptimized
ultracomplex C++. *And* it returns NULL when m[5] doesn't exist, which
C++ fails to do.
In the heat of the moment, it's easy to lose sight of just how much of
C++ is absolutely senseless wankery.
And this, my friends, is the problem.
<end quote>
<publicity mode ON>
Note that lcc-win implements operator []= as a different operator than
the plain operator [].
<publicity mode OFF>
<quote>
Okay, one more example of C++ terribleness. This one is actually a
tricky one, so I can almost forgive the C++ guys for not thinking up the
"right" solution. But it came up again for me the other day, so I'll
rant about it too: dictionary item assignment.
What happens when you have, say, a std::map of std::string and you do
m[5] = "chicken"? Moreover, what happens if there is no m[5] and you do
std::string x = m[5]?
Answer: m[5] "autovivifies" a new, empty string and stores it in
location 5. Then it returns a reference to that location, which in the
first example, you reassign using std::string:
example, the autovivified string is copied to x - and left happily
floating around, empty, in m[5].
Ha ha! In what universe are these semantics reasonable? In what rational
set of rules does the right-hand-side of an assignment statement get
modified by default? Maybe I'm crazy - no, that's not it - but when I
write m[5] and there's no m[5], I think there are only two things that
are okay to happen. Either m[5] returns NULL (a passive indicator that
there is no m[5], like you'd expect from C) or m[5] throws an exception
(an aggressive indicator that there is no m[5], like you'd see in python).
Ah, you say. But look! If that happened, then the first statement - the
one assigning to m[5] - wouldn't work! It would crash because you end up
assigning to NULL!
Yes. Yes it would. In C++ it would, because the people who designed C++
are idiots.
But in python, it works perfectly (even for user-defined types). How?
Simple. Python's parser has a little hack in it - which I'm sure must
hurt the python people down to the cores of their souls, so much do they
hate hacks - that makes m[5]= parse differently than just plain m[5].
The python parser converts o[x]=y directly into o.__setitem__(x,y).
Whereas o[x] without a trailing equal sign converts directly into
o.__getitem__(x). It's very sad that the parser has to do such utterly
different things with two identical-looking uses of the square bracket
operator. But the result is you get what you expect: __getitem__ throws
an exception if there's no m[5]. __setitem__ doesn't. __setitem__ puts
stuff into your object; it doesn't waste time pulling stuff out of your
object (unless that's a necessary internal detail for your data
structure implementation).
But even that isn't the worst thing. Here's what's worse: C++'s crazy
autovivification stuff makes it slower, because you have to construct an
object just so you can throw it away and reassign it. Ha ha! The crazy
language where supposedly performance is all-important actually assigns
to maps slower than python can! All in the name of having language
purity, so we don't have to have stupid parser hacks to make [] behave
two different ways!
....
"...Well," said the C++ people. "Well. We can't have that."
So here's what they invented. Instead of inventing a sensible new []=
operator, they went even more crazy. They redefined things such that, if
your optimizer is sufficiently smart, it can make all the extra crap go
away.
There's something in C++ called the "return value optimization."
Normally, if you do something like "MyObj x = f()", and f returns a
MyObj, then what would need to happen is that 'x' gets constructed using
the default constructor, then f() constructs a new object and returns
it, and then we call x.operator= to copy the object from f()'s return
value, then we destroy f()'s return value.
As you might imagine, when implementing the [] setter on a map, this
would be kind of inefficient.
But because the C++ people so desperately wanted this sort of thing to
be fast, they allowed the compiler to optimize out the creation of x and
the copy operation; instead, they just tell f() to construct its return
value right into x. If you think about it hard enough, you can see that,
assuming the stars all align perfectly, m[5] = "foo" can benefit from
this operation. Probably only if m.operator[] is inlined, but of course
it is - it's a template! Everything in a template is inlined! Ha ha!
So actually C++ maps are as fast as python maps, assuming your compiler
writers are amazingly great, and a) implement the (optional)
return-value optimization; b) inline the right stuff; and c) don't screw
up their overcomplicated optimizer so that it makes your code randomly
not work in other places.
Okay, cool, right? Isn't this a triumph of engineering - an amazingly
world class optimizer plus an amazingly supercomplex specification that
allows just the right combination of craziness to get what you want?
NO!
No it is not!
It is an absolute failure of engineering! Do you want to know what real
engineering is? It's this:
map_set(m, 5, "foo");
char *x = map_get(m, 5);
That plain C code runs exactly as fast as the above hyperoptimized
ultracomplex C++. *And* it returns NULL when m[5] doesn't exist, which
C++ fails to do.
In the heat of the moment, it's easy to lose sight of just how much of
C++ is absolutely senseless wankery.
And this, my friends, is the problem.
<end quote>
<publicity mode ON>
Note that lcc-win implements operator []= as a different operator than
the plain operator [].
<publicity mode OFF>