Steven said:
Dietmar Kuehl wrote:
It's the ulterior motive that bothers me. To wit: built-in job security
for language lawyers.
I'm not aware of anybody in the committee who has such "ulterior motive."
I certainly have not. As far as I can tell, everybody is doing their best
to maintain an efficient, flexible, and highly usable language. The fact
that the standard is not easy to follow derives from trying to avoid
complications by stating things only once and thereby avoiding
contradictions (which are still a problem). There is also the problem
that it is hard enough to push a change to the standard which enhances
the language definition. I'm almost 100% certain that it is even much
harder to move a change forward which supposedly only makes the standard
easier to read (for a certain fraction of the readers): Guaranteeing that
a change really does what is intended is pretty hard and we are all the
time busy fixing up oversights. I doubt that anybody is really willing to
accept the danger of breaking something just to make it easier to
understand! In fact, we even know about certain areas where there are
problems with the current definition (not only with respect to
understanding them but also what they are saying) and nobody dares
touching them!
.... and job security does not really matter for the majority of the
committee anyway: essentially, the majority of the attendees has a more
than average grasp of software development and thus does not really run
the risk of staying unemployed for extensive periods of time. And this
is not at all due to C++ only: I know of several committee members who
earn their money programming primarily in languages different from C++
(e.g. C, Java, or C#).
One aspect of the language specification that surprised
me is the amount of forwar reference it makes. I expected it to be more
linear. Not introducing concepts until their prerequisites were formally
presented.
Although this approach is desirable, it is not always possible in the
first place. In addition, this educational goal also has severe
drawbacks: currently, the standard is organized according to different
topics rather than any order imposed by trying to avoid forward
references. This makes it easy to locate specific sections. Since the
standard is not even intended to be read front to back, this organization
is much more reasonable. Also note that the standard is not intended as a
tutorial. It is intended as part of a contract and anybody determined
enough can learn to use it. It doesn't even take that long to get used to
using the standard. I think it took me less than three month to work
efficiently with the standard (of course, I didn't read the standard full
time in these three month: I just used it to look-up the details when
writing answers to questions asked in comp.lang.c++). If you think that
a deep knowledge of the C++ standard provides job security, I don't think
this is too much of an investment. Of course, I think there is more
knowledge involved in securing the jobs of the committee members...
I think you should stop whining about the standard and just accept that
it is the way it is. If you think you can do an excellent job in
"translating it to English", I'm quite confident that there is a market
for a book with the translation referencing the corresponding sections
of the standard. In any case, wasting your and other people's time with
your complaints is not going to change a jota in the standard!
Google for `Schroedinger ansatz'.
The German term "Ansatz" is about the details of a solution to a problem,
very much like "algorithm" is a description of how to solve a problem.
I.e. "Ansatz" is as inappropriate as "algorithm" for the things currently
called "algorithms". I think "operation", "solver", etc., i.e. something
which states the problem, not how to solve it, is the right approach.
I believe what you are saying is a variation point is a "branch point" the
compiler cannot analyse. My wording is probably sloppy, but I suspect the
concept is there.
No, I don't think so. Actually, yesterday I read the term "customization
point" for exactly the concept I called "variation point". I just recalled
the term wrongly: "customization point" is an appropriate name.
[static resolution of a member using 'mem_fun()' and relatives]
It's really a question of what the compiler can deterministally know at
compile time. If the parameter passed to mem_fun can be statically
determined by the context, the polymorphic aspects (vtbl indirection, etc)
can be eliminated.
Well, this is part of it. I had member function adapters which encoded
the member function in the type of the functor, e.g.:
template <typename T, typename S, T (S::*mem)()>
struct unary_member
{
T operator()(S& s) { return (s.*mem)(); }
}
When I played with stuff like this I verified that they are inlined
by contemporary compilers. Although 'mem_fun()' cannot possibly create
a corresponding object, const propagation in the compiler's optimizer
could in principle detect that the referenced member function never
really changes and possibly inline it. Of course, if the member
function is actually a virtual function, things become even more
complex but if the static type of the objects does not change, it
could indeed also detect this.
If the bounds of the iterators can be determined at
compile time, it may be possible to "unroll the loop".
Loop unrolling can be done without knowing the bounds statically: in
general, you don't want to unroll the whole loop anyway, even if the
bounds are statically known. You want to unroll a bunch of iterations
and put them into a loop which steps with bigger strides.
What I meant is that I've done it starting with STL iterators. It's
really
quite simple. There is one qualification, and I would have to check the
Java documentation to see exactly how Java hadels this: rather than
checking next()==null, I used bool hasNext(); It's a straightforward
wrapper around operator++(int);
Like I said before: The goal of STL is to be fast, not to deliberately
kill the performance! Actually, one of STL's goals was to provide
generic algorithms applicable to all reasonable data structures which
are as fast as algorithms hand crafted specifically for the corresponding
data structures. Although this goal is not met in all cases the deviation
is normally in a very small margin. There were also secondary goals,
like the applicability to subsequences (if the sequence supports at least
forwarding iterators) which make operations like "hasNext()" as the basis
unsuitable.
From my readding of TC++PL, getting the best performance is often obtained
by using the member function for a particular container template.
You got it wrong. The generic algorithms are not applicable in all cases
but in some of these cases the internal structure of a certain container
still allows an implementation with good performance. In cases like this,
the container provides a corresponding operation as a member. For
example, the generic algorithm for binary search (aka 'lower_bound()',
'upper_bound()', and 'equal_range()') are only applicable to random
access iterators. However, 'set's and 'map's can provide efficient
searches [for the key type only, though] although they only support
bidirectional iteration. Thus, there are specialized versions as member
functions.
But it
does seem reasonable that the implementation could detect the container
type, and chose the best algorithm for it.
The implementation of generic algorithms does not detect the container
types but the operations provided by the iterator. Thus, it can choose
to take advantage of better suited operations if the iterator is more
capable. However, in general the iterators do not have access to the
container but only to the container's elements. For example, the list
iterator will only point to the current element in the list which does
not have a pointer to the list it is contained in.
It seems your last sentence is the real distinction you are trying to
make. If I have value semantics at the fundamental level, I can probably
find a way for the compiler to evaluate my code having all three
operations combined in one, and to omit unused operations from the object
code.
No. My point is that combining all three operations into one does not
leave the opportunity to omit unnecessary operations. Also, STL iterators
have the flexibility to use other operations if available, e.g. stepping
with 'it += n' (with n > 1).
Please note what the first paragraph says about anonymous inner classes in
Java:
http://mindprod.com/jgloss/anonymousclasses.html
I know how anonymous inner classes work in Java. I think that they are
indeed quite similar to at least my wish for lambda expression in C++:
essentially, I just want a more convenient approach to create functors
than having to write a separate function. Whether it should include
support for closures or not and how the syntax should look are actually
the only bigger issues.
I know these are not identical to lambda expressions, but they fall into
the same category.
.... and I think that they help also programmers which are not that
advanced.
I'm not really sure what it means to have lambda support
in the core language, but I believe this area of programming is not
something that would help any but advanced programmers working is certain
specialized areas.
I put a quote from a little bit above here:
The foreach construct is something that would help a firstyear
computer science major get his or her homeword done more quickely and with
fewer errors.
Can you please identify which part of the two above code excerpts makes
one superior over the other? If there is nothing which makes the first
excerpt *much* easier for non-advanced programmers, we should go with
the more general approach. Personally, I only see one minor difference
between the two codes above: the variable has a name in the first one
while it is referenced by position in the second one. Of course, if you
want to name, you would be free to do so:
for_each(vec, { int i = _1; std::cout << i; });
For other types than integers you would use 'T const& i = _1;' or
something similar. In teaching a feature like this, it is not even
necessary to state that there is a lambda expression creating a functor.
On the other hand, even if it is done it is not harmful if the reader
was exposed to the concept of a functor before.
I'd have to read the API docs, but many of Java's containers are really
interfaces which make no requirement of the form of the "backing data
structure".
You just have to understand that it does not matter whether the
containers are interfaces or not. What matters for algorithms is what
kind of access can efficiently provided. In particular, you cannot do
both efficiently (meaning in O(1)):
- insert elements at arbitrary positions
- provide random access
For example, a vector has chosen to provide random access while a list
has chosen to provide element insertion at arbitrary positions (actually,
there are more operations in each group but the above are sufficiently
characteristic). Thus, you would need a slow conversion from a list to
an array if all algorithms operate using random access. Or you could
decide that you cannot insert into the middle of a container efficiently.
I'll grant that you are at the mercy of the Java
implementation unless you can JNI your way around it. I'm not sure how
Java linked lists pose any significant disadvantage over their C++
counterpart.
You stated that in Java you would use an array in your algorithms
(quoting you from an earlier article in this thread):
With Java you can call toArray() on a collection, and get a random access
interface to iterate over with a for() loop using the same sequence of
checks you've stated above.
This would mean one of two things:
- Insertion into Java lists is slow
- Using algorithms on Java lists is slow
You cannot get both if you go through an array interface. I consider
this a huge disadvantage. There are, however, other alternatives like
going through virtual functions but these approaches have their own
problems, e.g. a loss in optimization opportunities.
Java Arrays are very friendly little animals:
http://java.sun.com/docs/books/jls/third_edition/html/arrays.html
Note that these are arrays of _values_ rather than references (unless
that's what you chose to put in them).
Java arrays have their own problems, too. Actually, the designers of
Java got the conversion rules wrong for them: arrays should allow no
conversion because otherwise you could easily get violations of the
Liskov Substition Principle. Immutable arrays could be covariant but
Java has no concept of immutability on the type level.
The closest thing C++ has is Josuttis's boost:array<> which is a bit more
cumbersome to work with, and is not dynamic.
'std::vector' is the right match for Java arrays. The Java arrays are
not statically sized like C++ built-in arrays or 'std::tr1::array's.
There could possibly be class between 'std::vector' and 'std::tr1::array',
i.e. a class which is sized at run-time but once the size is set does not
allow size changes. This would be a nearly exact match for Java arrays.
The only additional "capability" would be the conversion to arrays of
bases which is, however, broken in the first place.
I'm confident there are advantages to using the existing STL algorithms
(what's the plural for ansatz?)
"Ansätze" (I don't know whether the third letter is shown on your
computer the way it should be because it is outside the conventional
Usenet encoding of ASCII; it should be an "a" with an "umlaut", i.e.
a horizontal pair of dots).
over the foreach construct. This may be
the best counterargument to introducing the foreach construct.
The best counterargument for introduction of a specialized solution is
that a more general solution exists.
It might
encourage people write suboptimal code. OTOH, it might also encourage
people to write C++ code in the first place.
If you explain why my version of the "for_each()" construct is vastly
inferior to yours, I can see reason to accept this argument. However,
I don't see it to be inferior. Actually, I see it as a vastly superior
construct because it removes syntax oddities in other places, too,
making the code easier also for new users.
Bear in mind that Java is organized differently than C++'s STL, so this
comparison is only of limited worth:
I wasn't aware of a class providing algorithms on abstract containers.
Of course, it is the wrong approach to require a container in the first
place. However, the really interesting analysis is whether it provides
the algorithms efficiently or only with relatively minor losses compared
to hand-crafted versions for specific containers.
As understand what you were saying, there are two approaches, trust the
programmer to tell you his or her template is implements a certain
concept, or verify it explicitly.
No, it is more the other way around: the presence of required features
is or least can be checked with both approaches. The difference is whether
the compiler decides whether certain requirements are met on the basis
that the corresponding declarations are present or whether it has to be
explicitly allowed to make use of the corresponding features. The issue
is essentially this: assume some generic implementation uses a member
function "foo()" with no arguments. Now, a user tries to use a class
with this generic implementation which indeed has a function "foo()"
with no argument. The question is essentially: can the compiler assume
that this function "foo()" meets the requirement without the user
explicitly stating so? After all, it could be by mere accident that
there is a function "foo()" but this function can do something rather
different than is expected by the generic implementation. For more
details, read the corresponding papers in the mailings.
On suggestion I made a while back is what I call
a massless interface. It's just a formal declaration of what is to be
provided, and does not become part of the executable code. The problem
with that is there may be circumstances when only part of the
meta-interface is needed by the class instantiated from the template.
Concepts are basically that. However, the problems are rather different
than what you assumed.
I was talking about a first level working understanding, not a more
complete and comprehensive one.
Right. You need to understand concepts to understand STL - at the first
level. If you don't understand concepts you have no chance whatsoever to
ever understand STL.
I will observe that this discussion helped
solidify my understanding that iterators are not "part of" the container,
but are instead instances of types bound to the containers through
typedefs.
.... or, to put it differently, you still haven't understood what iterators
in STL are about. Maybe you should start picking up a book introducing
STL rather than trying to make sense of what is written in the header
files. You won't learn how STL works from the header files.
Ah! But what do you really mean by sequence in this case? They are
ordered mappings over an unordered collection.
I never stated differently. You did, however! Iteration in STL defines
a sequence. The elements are order according to the position in which
they appear in the iteration, i.e. the iteration uses a sequence. Hm,
sounds like a tautology. You should, BTW, forget about collections when
you think about STL! Sequences are not directly related to collections,
i.e. you [still] don't need a collection or a container to have a
sequence. If you have a collection or a container it is likely that you
have multiple associated sequences. For example, the STL containers are
all associated with two sequences, one moving from front to back the
other in the reverse direction.
That's my point. The actual collection is not a sequence.
The issue was whether a STL algorithms operate on sequences. Nothing
more and nothing less. I never claimed that the containers or collections
are sequence. I actually claimed that you don't need containers in the
first place and that STL operates on sequences. Actually, I think you
should do yourself a favor and stop confusing yourself!
But it needs to be comprehensible to be usable.
Right. Please point out why
for (int i: vec) std::cout << i;
is more comprehensible than
for_each(vec, std::cout << _1);
For the beginner in C++ that will not be the case.
If your only argument is that the beginner in C++ cannot understand it
while everybody else can, you are just pulling up a strawman.
The foreach construct is applicable to every 3D
program I have ever written.
So is the for_each() construct I have given. In fact, the only difference
is a syntactical one: You can view a block as nothing else than a functor.
In your proposal, you put the functor after the closing parenthesis, in
my proposal it is within. Semantically, your functor has no arguments but
access it context entirely using the closure while my functor has one
argument and does not necessarily need a closure at all. Big deal, isn't
it? The real difference is, however, that you have a small, tiny feature
which cannot be used anywhere else while I got a huge feature which can
be used in many different places! For example, my feature also covers
the functor passed to 'find_if()' while yours does not.
I might be able to fold all that into
superior lambda expressions, but only after I learn to use them, and I
have too much other stuff to learn before I try.
Sorry for you. Why walk if you also have the option to crawl? You got so
much other stuff to learn, too...
I'm not sure what core language changes are involved in adding lambda
support, but the fact that there are changes to the core language required
to make it fully functional is the only argument I can see for wanting it
in C++0X. If it were just a library, I'd say let it be provided by and
for the people who understand it and want it.
If it could be implemented just as a library, I would not bother to ask
for it! Why would I? I can implement libraries myself which is a much
safer approach than waiting for compiler implementers to implement some
feature. However, since it requires core features to do it right, I
don't want too much competing features (like special case solutions) in
the language. Especially, if the special case can easily be addressed
using the general solution!