To bean or not to bean

P

Phlip

Steven said:
IIRC, that was the context not the necessary behavior. And I was really
intending the official <cassert> which is said to be documented in the C
standard documentation. I don't have that, but K&R tell me it aborts the
program. They don't tell me I can change that behavior.

Absolutely, Steve. Any macro (or little function) you write, which has a
name or intent anything like "assert", must abort your program. It's the
rules!
 
C

Chris F Clark

Steve said:
Take careful not that I am not actually advocated abolishing the CPP, nor am
I advocating the abolishment of the #include. I simply want superior means
of doing some of the things the CPP is currently used for.

Then the title is wrong. I could care less about #include. The quote
from BS isn't about #include either. The part of cpp he doesn't like
is the textual manipulation stuff. That is the part that is
*NECESSARY* and powerful.

At best #include is a small implementation detail for doing imports.
Sure it is ugly in its implementation, but it harks back to the fairly
early days of C. I believe there are even C++ compilers that don't
use/need it. The current standard certainly implies that with the new
syntax of #include for the language specified libraries. Those
#include statements don't need to be implemented as files.

Of course, for backward compatibility, there will always be a way to
implement them using files--too many C++ compilers have and always
will have implemented them that way. And, backward compatibility is
truly important in C++--and actually in almost every programming
language.

If you want to break backward compatibility, it is better to design a
"new" language, i.e. Java, C#, D, than to try to change an old
language. Users are going to migrate to newer languages anyway, and
you are better off making your new language be one of the alternatives
they can try, than trying to get all the compiler vendors for your
current language to come on board with your backward incompatible
version.

The exception being when the language is very young. C++ history
proves that. They managed to change some fairly key things in a
non-backward compatible way between Cfront 1.x and Cfront 2.x and we
users migrated. (It took some CPP macros, so that we could have both
1.x and 2.x compatible code, but that was okay.) However, some of the
changes from Cfront 2.x to the approved standard are not yet
incoporated into the most widely used compilers. C++ is too old
*NOW*. Any other backward incompatible changes will be mostly like
the changes that went into FORTRAN 90--I bet there were a lot more
FORTRAN 77 compilers than there ever will be FORTRAN 90 ones. I doubt
that in 10 years from now I will be writing any C++, maybe not in 5.
On the other hand, I've written C++ code for 15 years already, so it
won't be like I didn't use it for a long time.

-Chris
 
C

Chris F Clark

I said:
And, what is the beauty of those macros (besides their portability),
is that our primary source code (the uses of those macros) is easy to
read. The syntax of the assert (and other macros) is quite simple and
obvious. That is only doable because they are macros. If they
weren't macros, some of the behind-the-scene-stuff like determining
the current object would have to be present in the source code of each
use.

Steve Hatton replied:
I don't follow here. How can a macro determine the current object without
some kind of intentional intervention? I'm not saying it can't be done.
Perhaps you are talking about something similar to Qt's moc?

It takes some "cooperation". In fact, recently we have been trying to
templatize some of the cooperative parts. Here is a post I wrote on
that particular topic.

------------------------------------------------------------------------

In our C++ project we have some internal bug reporting macros that we
use to get useful information when the program does something
unexpected. Essentially at the point of the error, we invoke an
internal interactive debugger that knows the [important] classes
within our system and allow us to walk around the objects that exist
at the time of the fault. It mostly works fairly well. That catch
being that we have to hand implement some of the code each of the
classes (thus the "important class" caveat), which suggests that what
we really need is a template based approach, since it would be much
nicer to have the system automatically generate an implementation for
all classes, eliminating the important/unimportant distinction.

In addition, our current implementation does not work in static member
functions, which is a secondary problem (but a key one if we go to a
template based approach). Currently, we can sidestep this problem
because we have no important classes that also have static member
functions that use the bug reporting macro. (Actually, this topic
came up because I just made a class "important" that had a static
member function that used the bug report macro, and I had to remove
the static member function from the class and make it a global
function to work around the limitation of our current scheme.)

Essentially, we want an ErrorHere function that works anywhere in our
code. If it is in a non-static member function of the class, it calls
the ErrorHereInClass virtual member function of the class which allows
the developer to see the data of the object being operated upon.
Anywhere else, we want the code to default to "assert(false)" like
behaviour. Our current implementation pretty much achieves that. In
objects that we care about, we have a few things that we add to the
class source code and then have an appropriate ErrorHereInClass
function that we can specialize. In classes where haven't done that,
we get the default implementation, which essentially prints out an
appropriate error.

Most of the functionality is implemented via an "ErrorHere" header
file. The ErrorHere header file defines a couple of classes and some
macros (to sugar things by hiding some of the internal mechanism).
The key macro appearing to be a set of functions that work like
"assert(false)", i.e. you call one of them when the code is in trouble
and it figures out how to best report to the user the location of the
problem.

Here is tersely what the ErrorHere header looks like:

class ErrorHereDeveloper; // controls some internal features

class ErrorHereClass {
public:
ErrorHereClass( ErrorHereDeveloper *dev ) :
errorHereDeveloper( dev )
{}
~ErrorHereClass();
virtual void ErrorHereInClass( const char *textToReport ) // overriden
{ cout << textToReport << endl; }
void ErrorHereReport( const char *textToReport )
{ ErrorHereInClass( textToReport ); }
protected: // data is here for this class and derived classes to use
ErrorHereDeveloper *errorHereDeveloper;
};

// default version that returns above "base" class
ErrorHereClass ErrorHereClassFactory( ErrorHereDeveloper *dev )
{
ErrorHereClass mine( dev );
return mine;
}

// use this call when any developer can handle this bug
extern void ErrorHere( const char *textToReport, ... );
extern class ErrorHereDeveloper *anyone;
#define ErrorHere ErrorHereClassFactory( anyone ).ErrorHereReport

// I use this call when I want users to report the bug only to me
extern void ErrorHereForChris( const char *textToReport, ... );
extern class ErrorHereDeveloper *chris;
#define ErrorHere ErrorHereClassFactory( chris ).ErrorHereReport

.. . .

As you can see, a "call" to ErrorHere in the pressence of just this
header file,calls the "global" ErrorHereClassFactory which returns a
class where the ErrorHereReport function takes and simply prints the
string (using a virtual function). Of course, our real implementation
does something more complex, but this is the essential "base"
functionality.

Now, let's look at a customized class, say "Square":

class ErrorHereClassSquare;

class Square {
ErrorHereClassSquare ErrorHereClassFactory( ErrorHereDeveloper *dev )
{
ErrorHereClassSquare mine( this, dev );
return mine;
}
virtual void ErrorHereInClass( const char *textToReport )
{ cout << "Square[" << length << ", " << width << "]: " <<
textToReport << endl; }
// rest of Square
. . .
int length, width; // (ok, maybe I meant rectangle!)
};

class ErrorHereClassSquare : public ErrorHereClass {
public:
ErrorHereClassSquare( Square *square; ErrorHereDeveloper *dev )
: ErrorHereClass( dev ),
mySquare( square )
{}
~ErrorHereClassSquare();
virtual void ErrorHereInClass( const char *textToReport ) // override
{ if ( square ) square->ErrorHereInClass( textToReport );
else cout << textToReport << endl; }
private:
Square *mySquare;
};

So, as you can see, the special class ErrorHereClassSquare and the per
class function ErrorHereClassFactory allows us to customize the code
for our classes. However, this code is "boilerplate" and it would be
nice if we could somehow use a template to create them. I think I
understand how to write the template that will create the
ErrorHere<ClassSquare>. That seems relatively straight forward.

However, it would also be nice, if we could somehow get the
functionality to work in static member functions, i.e. to have
something that would "revert" to the default implementation if the
"this" pointer wasn't available. I have no idea how to make a
(function?) template that tests its current context and determines if
the function it is being called within has a "this" pointer or not.
My fear is that since we make ErrorHere calls through-out our code
(including in static member functions and non-member functions) that
if we attempt ErrorHere in a static member function with a template
solution in place, that this will result in a call to the template
class constructor for the corresponding ErrorHere<Class> without
having a this pointer to pass to the class. At least that's what
happens when I define a per class ErrorClassFactory method in a class
that makes ErrorHere calls from static member functions.

I apologize if this is trivial to accomplish, but it is just too
subtle for me and my meager template programming ability. In the end
I would like something that works within the limits of both Visual C++
6.x and g++ 3.2.3.

-Chris

*****************************************************************************
Chris Clark Internet : (e-mail address removed)
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------
 
C

Chris F Clark

Steve said:
In the sense of JUnit, I have used that approach. In that case you create a
harness, and some use cases to run against your code, testing that
preconditions produce correct post conditions. That typically stands
outside the actual program code.

That's the difference with assertions. They are inside the actual
program code. That's right. We ship our production code with the
assertions turned on. (Actually, the systems has levels, so the
production code has the key assertions turned on, but the ones which
have n**2 and worse performance are turned off, as they are only for
extreme debugging cases.)

We want the assertions turned on in the production code, because our
clients/users are building chips that will be stamped into millions
(perhaps billions) of pieces of sillicon that will sell for hundreds
of $ each. If our software does the wrong thing, and that causes them
to misdesign the chip, the cost is not even something one wants to
think about. I think you will find that the safety critical people
(people who make life sustaining software, banks, rocket control
systems) all take a similar point of view. We want to be able to
build tools that we can make certain are reliable and we are willing
to put great effort into assuring that those tools are reliable.
Running with assertions turned on in production code, is a small price
for us to pay in that regard.

The point being is that we know that in 600K lines of code there are
likely to still be bugs. There is no way we could test all the
combinations of interactions. More importantly, we are adding new
features all the times. It is simply impossible to get each of those
new features to work in every case with all the current features for
all the subtle cases. Therefore, we have assertions that tell us when
something is broken, and they invoke a user and developer friendly
debugger.

We can't depend on simple per unit testing, because most of the
interesting invariants are not isolated to one piece of code. It's
not like we have simple loops on arrays and we want to make sure we
don't step off the end (well, we have those too). We have a model
that has subtle semantics. That is what we want to make sure we have
right.
It probably also depends on the nature of the product whether such things
are generally useful. You seem to have something akin to a huge Karnaugh
map. Probably much more suited to such structured evaluation than the
kinds of systems I've worked on. [more below]

Nothing at all like that. We have a graphics front end of about 100K
lines, a verilog compiler of another 100k lines, an internal model
(the nets and gates etc.), another 150K lines, an interpreter for
still another 100K lines, and numerous other "parts" that are much
smaller. However, all of these pieces are pretty much standard code
in their domain.

The key point is that they are all non-trivial parts and they all
interact. So, if I am fixing something in the compiler, I want to
know if I screw up and produce a model that doesn't make sense for
some inputs, because if it does, then the code which runs in the
interepreter is likely to perform some sort of non-sensical
calculation and return "5" to the user for some calculation that was
only 2 bits wide.
Also bear in mind that I do use exceptions in similar ways.

That's good. However, with a code base like ours it is hard to make
systemic changes if we haven't packaged the code into easily
identifiable pieces. Assertions are a nice tool for packaging up one
piece. If you want consider the following:

#define assert(x) if (! x) throw(problemHere)

Now, all are assertions have just become exception based. Yes, one
probably wants something more sophisticated, but it illstrates the
point. Assertions are a way of indicating at the source level what we
expect to be true at run-time. The implementation of how the
assertion works is not important for that to be true.

And, that brings us back to the text processing power of CPP. The
point of CPP is to take something that looks like one kind of C++
statement and turn it into something much more sophisticated. And, to
do so in a systematic way through-out a large project.

I don't want to run an editor macro over the entire source tree just
because I have found a slight improvement in how we can debug
something. Because, if the code has been written by hand (e.g. "if
(!x) throw(problemHere);") there is a good change that some copy of
that code is spaced differently or somehow not "right" for my editor
macro. (Gee, look line wrapping made it happen right here in my email
sample.) As a result, the code will be broken and I won't even know,
because I can't look reliably at all the code. However, a CPP macro
will reliably replace all the instances, and if it is broken I'll get
a compiler syntax error on the resulting code because it won't be
valid C++.

Now, are there ways to make CPP do that job better, probably yes. For
example, if you want to propose adding a scoping feature that when
used limited the scope of macros, so that they would have scope, just
like other constructs, I would be supportive, because if it were
available in the compilers I used it would make my life better. And
if it weren't, well I'd be no worse off. (Well, assuming that you
remembered the backward compatibility rule, that existing code not
using the feature should not be broken by the feature.)

-Chris

*****************************************************************************
Chris Clark Internet : (e-mail address removed)
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------
 
P

Phlip

Chris said:
We want the assertions turned on in the production code, because our
clients/users are building chips that will be stamped into millions
(perhaps billions) of pieces of sillicon that will sell for hundreds
of $ each. If our software does the wrong thing, and that causes them
to misdesign the chip, the cost is not even something one wants to
think about. I think you will find that the safety critical people
(people who make life sustaining software, banks, rocket control
systems) all take a similar point of view. We want to be able to
build tools that we can make certain are reliable and we are willing
to put great effort into assuring that those tools are reliable.
Running with assertions turned on in production code, is a small price
for us to pay in that regard.

Point. In a softer language that "takes care of assertions for us", control
over those assertions is much harder to exert.

The Ariane IV rocket, a few minutes off the launch pad, slammed its rockets
to one side, stalled, broke up, and . It slammed its rockets over because
the language, Ada, took care of an assertion for the programmer, when the
programmers ignored it. The controller with the exception should have not
throw it. Instead it threw, and the rocket controllers mistook the exception
for a command.

(There are many, many other rationales for which bit of the Ariane mission
could have worked better. Flight did not require the controller involved,
and it could have been switched off.)

Returning control to programmers (and enabling programmers, within a
process, to use that control) can be better than a language that
The point being is that we know that in 600K lines of code there are
likely to still be bugs. There is no way we could test all the
combinations of interactions. More importantly, we are adding new
features all the times. It is simply impossible to get each of those
new features to work in every case with all the current features for
all the subtle cases. Therefore, we have assertions that tell us when
something is broken, and they invoke a user and developer friendly
debugger.
Righteous.

That's good. However, with a code base like ours it is hard to make
systemic changes if we haven't packaged the code into easily
identifiable pieces. Assertions are a nice tool for packaging up one
piece. If you want consider the following:

#define assert(x) if (! x) throw(problemHere)

Ahem.

#define assert_(x) if (! (x)) throw(problemHere(#x, __FILE__,
__LINE__))

;-)

I could continue to add a Debug-mode version of that assertion which halts
the program with a breakpoint on the failing line.

(And re-writing a Standard C++ Library thing is against my religion.)
 
C

Chris F Clark

I said:
.... If you want consider the following:

#define assert(x) if (! x) throw(problemHere)

Philip corrected:
Ahem.

#define assert_(x) if (! (x)) throw(problemHere(#x, __FILE__,
__LINE__))

;-)

I could continue to add a Debug-mode version of that assertion which halts
the program with a breakpoint on the failing line.

I presume the correction is due to my overly simplistic assertion. In
this case, I was trying to point out that one could use exceptions to
implement assertions and trying to use the simplest code possible.
However, Philip's point is accurate in that, one probably wants
something more sophisticated than my strawman example. As I'm sure
Philip knows (and this is just for other readers who aren't as C++
savvy), in fact, if one reads a typical assertion implementation, one
learns that CPP is just one of the tools in the bag-of-tricks to
getting a sophisticated assertion tool. One generally calls an
"assertion_failure" routine with parameters like #x, __FILE__,
etc. and that routine then decides how the assertion is to be
reported, e.g. by throwing an exception.

And, in fact, Philip's correction nicely illustrates the broader
point. If I had coded the first exception throwing assertion model in
a real app, and a coworker had seen it, they could have fixed it just
as Philip did. As a result, all of my assertions in the code would
now work better!

That would not have happened if I had written the code inline in all
the places I might have desired the check. They might have found one
instance. If they were motivated and had the time, they might have
done a search for other places, but they might easily have missed some
place where my coding was slightly different and it wasn't obvious
that this was an exception for assertion error checking.

Thanks Philip!
-Chris

*****************************************************************************
Chris Clark Internet : (e-mail address removed)
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------
 
S

Steven T. Hatton

Chris said:
That's the difference with assertions. They are inside the actual
program code. That's right. We ship our production code with the
assertions turned on. (Actually, the systems has levels, so the
production code has the key assertions turned on, but the ones which
have n**2 and worse performance are turned off, as they are only for
extreme debugging cases.)

To some extent we are discussing two different issues. One is assertions as
a software engineering practice, the other is how they are implemented.
I've taken no position on the former matter other than to say that
particular approaches did not appeal to me, and I have not found things
called assertions useful. On the latter point I have pointed out that
TC++PL(SE) suggests a native language alternative to the use of macros for
assertions.
We want the assertions turned on in the production code, because our
clients/users are building chips that will be stamped into millions
(perhaps billions) of pieces of sillicon that will sell for hundreds
of $ each. If our software does the wrong thing, and that causes them
to misdesign the chip, the cost is not even something one wants to
think about. I think you will find that the safety critical people
(people who make life sustaining software, banks, rocket control
systems) all take a similar point of view. We want to be able to
build tools that we can make certain are reliable and we are willing
to put great effort into assuring that those tools are reliable.
Running with assertions turned on in production code, is a small price
for us to pay in that regard.

I'm beginning to understand that this is an issue of vocabulary. What you
are calling assertions are sometimes called consistency checks.
Stroustrup's presentation of assertions is in conjunction with the concept
of invariants. Invariants trace back to the original theories of database
management and ACID transactions. So it can probably be shown that the use
of assertions is logically similar, or identical, to the approaches used in
DBMS's. I haven't given this much thought, and the terminology describing
ACID transactions is not currently part of my daily vocabulary, so I
suspect there is much room for refinement of this notion.
The point being is that we know that in 600K lines of code there are
likely to still be bugs. There is no way we could test all the
combinations of interactions. More importantly, we are adding new
features all the times. It is simply impossible to get each of those
new features to work in every case with all the current features for
all the subtle cases.

What I was saying about removing things had to do with removing them when
you know you have it right. If I have a bug in my software that I'm not
seeing right off, I will put in checks that test values in the areas where
I believe the problem exists. Often, when I isolate and correct the
problem, I can study that piece of code closely enough to convince myself
that similar problems are not present, e.g., suppose I have an index that's
going out of bounds on me. I put in some checks to detect where it's going
out of bounds. When I find that I wrote i < x, when I should have written
i < f(x), and i and f(x) are clearly defined. The error trapping code
which often functions in a way similar to assertions, becomes superfluous.

At that point leaving it is would result in nothing but code bloat that
obscures the logic of the program. There are often boundary points between
modules in compartmentalized code at which the kinds of checks made by
assertions can be performed to verify that the data entering, and/or
leaving, the component is valid. I've often called these sanity checks.

My first use of C++ exceptions was for just such a situation. I have a
multidimensional grid consisting of an arbitrary number of conceptual
arrays each having an arbitrary number of indeces. The values are defined
at runtime. I also have function objects which generate addresses into
this grid. It can either be addressed using an n-tuple of indeces, or it
can be addressed using a single index. The actual data is stored linearly
in one valarray. The rest is just a conceptual indexing mechanism I
devised to study the algorithms involved in this kind of situation.

Now, this is a bit trickier than simply checking that I wrote i < x rather
than i <= x for my array bounds. In that situation I put in a try catch
that test the addresses generated by the addressor objects. Within 2
minutes of introducing the checks, I was finding problems I had not
foreseen. I could have called my checks assertions, but I wasn't thinking
along those lines.
Therefore, we have assertions that tell us when
something is broken, and they invoke a user and developer friendly
debugger.

This makes perfect sense to me. I will point out that soft-coded assertion
switches might provide more flexibility than traditional macro-based
assertions. And, certainly, we could devise a way of combining these ideas.
Assertions are a way of indicating at the source level what we
expect to be true at run-time. The implementation of how the
assertion works is not important for that to be true.

I agree. This is just a question of semantics. What I was calling
assertions at the time you entered this discussion were specifically the
ones in <cassert>. I will grant that I also said I didn't find
Stroustrup's alternative of using templates rather than macros overly
appealing. I may reconsider that in light of this discussion.
And, that brings us back to the text processing power of CPP. The
point of CPP is to take something that looks like one kind of C++
statement and turn it into something much more sophisticated. And, to
do so in a systematic way through-out a large project.

But a macro has what amounts to a signature just like a function, or a
template. I have yet to be convinced that same systematic replacement
can't be done from within the language by changing definitions. To some
extent, assertion macros seem to be providing a bit of compile-time
introspection for you.

In Java, since every object has a /class/ member describing the class of
that object, there is no need to use such a mechanism. No, I am not
advocating a UBC for C++, I'm already convinced it's not a good idea. But,
a Sub-Universal Base Class, might be worth considering.
However, a CPP macro
will reliably replace all the instances, and if it is broken I'll get
a compiler syntax error on the resulting code because it won't be
valid C++.

In this case, the real argument for retaining the CPP would be that it is
available on all conforming implementations. Certainly you can accomplish
all of this from outside the language. It's done all the time. Even
though Trolltech uses macros to identify their extensions to C++, they go
beyond simple macro replacement to provide the meta object support. (Just
to preempt the cries of foul: The result of moccing the code is again
standard C++).
Now, are there ways to make CPP do that job better, probably yes. For
example, if you want to propose adding a scoping feature that when
used limited the scope of macros, so that they would have scope, just
like other constructs, I would be supportive, because if it were
available in the compilers I used it would make my life better.

To some extent, it would no longer be the CPP, it would be the C++PP. If
the C++ preprocessor were to significantly diverge from the C preprocessor,
it would really throw a wrench into the C compatability gears. I suspect
if the result of my starting this thread is the introduction of a C++PP, I
will be receiving death-threats from the FSF.
And
if it weren't, well I'd be no worse off. (Well, assuming that you
remembered the backward compatibility rule, that existing code not
using the feature should not be broken by the feature.)

There's also the deprecation rule.
 
S

Steven T. Hatton

Chris said:
Then the title is wrong.

I have to disagree. I borrowed the form from Edsger W. Dijkstra's title for
a reason. The goto was never abolished, but it is rarely used in most
code.

hattons@ljosalfr:/usr/src/linux/kernel/
Wed Sep 01 18:15:45:> grep goto *.c | wc -l
290

I could care less about #include. The quote
from BS isn't about #include either. The part of cpp he doesn't like
is the textual manipulation stuff. That is the part that is
*NECESSARY* and powerful.

I understand that his focus is on the use of defines, nonetheless, I believe
I am, to a large extent addressing the same problem. I don't want
unnecessary stuff introduced into my translation unit that can result in
unexpected behavior, or hidden dependencies. In addition, and this may go
beyond what Stroustrup is concerned with, I find the use of #include to
introduce resources messy, redundant, inefficient, often confusing, and
significantly annoying.

I see little gained by isolating a given #define to a namespace, if it can
still sneak into my translation unit in a #include. Sure, it narrows down
the possibilities, but the problem remains. When I import an identifier
into my file, that's _all_ I want. Sure Koening lookup might technically
bring in a bit more, but I don't believe that introduces anything like the
problem I am trying to solve.
At best #include is a small implementation detail for doing imports.
Sure it is ugly in its implementation, but it harks back to the fairly
early days of C. I believe there are even C++ compilers that don't
use/need it.

If you can provide an example, please share it with us. I really, really,
want to explore this area.
The current standard certainly implies that with the new
syntax of #include for the language specified libraries. Those
#include statements don't need to be implemented as files.

But that is not what I'm talking about. The Standard still specifies that
the entire contents of the header is included in the translation unit. I
believe it would be trickier in C++ than in Java to accomplish what I'm
describing, but I believe it is doable. The problem with C++ is that more
of the supporting infrastructure needs to be compiled per instance, than
does with Java. That means I cannot simply import a template in the way
Java imports a class. The template will (officially) have to be compiled.
(I will note that gcc 3.4.x is now advertising precompiled templates.) But,
even in Java, there is often a need to compile a class before it can be
used where it is imported, so this is simply a metter of degrees.
Of course, for backward compatibility, there will always be a way to
implement them using files--too many C++ compilers have and always
will have implemented them that way. And, backward compatibility is
truly important in C++--and actually in almost every programming
language.

I've seen that issue dealt with from both approaches. Certainly, it is
easier to break version 1.x when going to 2.x, than it is to break 3.x when
going to 4.x. Both Java and Qt were willing to forego backward
compatability for the sake of progress. There are ways of grandfathering
in older software by lugging around antiquated libraries and compilers, and
creating glue-code to bridge the differences.
If you want to break backward compatibility, it is better to design a
"new" language, i.e. Java, C#, D, than to try to change an old
language. Users are going to migrate to newer languages anyway, and
you are better off making your new language be one of the alternatives
they can try, than trying to get all the compiler vendors for your
current language to come on board with your backward incompatible
version.

I agree, for the most part. However, backward compatability can sometimes be
achieved as an either/or option. Either you use the new feature, or the old
one for a given translation unit, or a given program.
C++ is too old
*NOW*. Any other backward incompatible changes will be mostly like
the changes that went into FORTRAN 90--I bet there were a lot more
FORTRAN 77 compilers than there ever will be FORTRAN 90 ones. I doubt
that in 10 years from now I will be writing any C++, maybe not in 5.
On the other hand, I've written C++ code for 15 years already, so it
won't be like I didn't use it for a long time.

What ever happened to Claudio Puviani?
 
S

Steven T. Hatton

Kai-Uwe Bux said:
a) This is rethoric. The information in "Information technology is about
information." is the information, my program deals with. The information
in "Good solid easily obtainable information is vital to design, to
implementation, to trouble shooting and to security." is information about
the structure of my code. The second statement may be true, but it does
not follow from the first. You are using one term to reference two
different entities.

It's not contradictory, it is simply recursive. I am applying the same
argument a mechanical engineer would use to explain his use of computers to
do his job. The programs and tools a software engineer uses are the
programming language, the compiler, the IDE, the libraries, etc.
b) The information about my code is already available. After all, it must
be sufficient for the compiler to generate object code. However, I agree
that in C++ the information may be scattered around and is not as local as
the human mind (or some IDE) would like it to be.

And I believe this is a significant problem that can and should be
addressed.
c) I still see that there are trade offs. If you increase what can be
known at coding time at the cost of what can be done, then there are trade
offs. Since, apparently, I do not face the difficulties you are dealing
with, I would rather keep the power of what can be done.

But what power do you lose by being able to import a resource with a using
declaration, or to bring in -perhaps implicitly- an entire namespace with a
using declaration? That really is what I am suggesting, more than
anything. WRT exceptions, the fact of the matter is that placing certain
requirements (restrictions) on their use improves their usability. If that
behavior can be configured at compile time in such a way that the
restrictions are not enforced, you have lot virtually nothing. The only
issue becomes the requirement that you do specify the alternative behavior.
Most compilers would probably provide a means of modifying the option
through a commandline switch, so compiling code that doesn't use the
feature would require no more than adding one command to your autoconf.ac,
or environment variables.
I will drop exceptions. Obviously talking about them just gives you an
opportunity not to address the issue of enforcing policies versus
providing mechanisms.

Its your issue, not mine. The only reason I mentioned exceptions is that it
is the only place where I am advocating placing restrictions on what you
can do by default in C++. There are many restrictions placed on your use
of code. That's what type checking is all about. If you are suggesting
that I want to see changes in the way C++ is used in general, yes, that is
correct.
Maybe, if cpp was even more powerful and more convenient, a superior
library management could be implemented using macros.

I actually have a wishlist item in the KDevelop bug database that suggests
attempting such a thing using #pragma and a strategy of filename, resource
name consonance.
 
S

Steven T. Hatton

Sam said:
Strangely enough I've been using non-threaded applications that
do more than one thing at a time and are reasonably sophisticated.

Where sophisticated means things like: makes multiple concurrent TCP
connections, encodes and decodes audio sending and receiving it over
UDP, displays GUI and so on - without a thread in sight (well one
thread for the pedants).

OK. Let me be clear as to what I really meant be threading. I really
intended concurrent programming with resource locking and synchronization.
Technically, threading means tracing the same executable image with
multiple instruction pointers. It's main advantages are the multiple use
of the same executable image by different 'processes', and the reduced need
for context switching between processes.
I even write them occassionaly.

Threading throws away decades of work in creating systems with useful
protected memory spaces for processes. And lets the average programmer
meet all the problems and reinvent (poorly) all the solutions all over
again. Rather than using the solution implemented by the (hopefully
much more experienced and competent in the domain) OS authors.

What you seem to be suggesting is that threads are used in situations where
multiple processes would be better. Am I to also understand that all
concurrency is bad?
Of course there's that vanishingly small percentage of problems that
are best solved with threads, but chances are you, me, and the
next guy aren't working on one of them.

Please clarify what you mean by 'thread'. I suspect we aren't talking about
the same thing.
 
S

Sam Holden

OK. Let me be clear as to what I really meant be threading. I really
intended concurrent programming with resource locking and synchronization.

But you only need resource locking and synchronization because of the
existance of the threads. The code I was thinking about above contains a
fifo queue onto which one "section" of code pushes "events" it has
received via a TCP connection. Another "section" of code pops those
"events" in order to deal with them.

This could use threads - that would be one way to implement it. One
thread would do a blocking read on the network socket and when an event
was received would push it on the queue. Another thread would do a
"blocking pop" (or some equvalient thing) on the queue and deal with the
events. That would require resource locking and synchronization - in
this case the queue is the resource. It would also mean that all the
global data in the program is accessable by both threads. Obviously the
queue needs to be, but due to using threads all the rest of it is too.
Any global data could change state at any time - of course the
programmers will hopefully know which parts of the global state they are
allowed to touch and which need locking and synchronisation. But why
throw away decades of work into systems which can eforce those
restrictions?

In this case my code doesn't use threads, it uses asynchronous events -
when data is available on the socket it gets read and an event pushed on
the queue. To pop from the queue, a callback is provided which is called
when something becomes available and so on. The "main loop" of the
system just multiplexes between the various "sections" according to the
events (button clicks, network sockets changing state, etc).

The disadvantage of this is that on a multiprocessor machine the program
only uses one processor, with threads it could use all of the processors
(well as many as it had threads).

The advantage of this is that no locking is required (which makes things
faster), the programmer doesn't have to worry about "atomic operations"
but can write code knowing that no variable will change value all of a
sudden - no "action at a distance".

Technically, threading means tracing the same executable image with
multiple instruction pointers. It's main advantages are the multiple use
of the same executable image by different 'processes', and the reduced need
for context switching between processes.


What you seem to be suggesting is that threads are used in situations where
multiple processes would be better. Am I to also understand that all
concurrency is bad?

Using threads is a *massive* trade off. You are trading off decades of
research and experience into concurrency and system design in order to
get whatever it is threads are giving you (a performance increase?)

The operating system was hopefully designed and implemented by people
who knew what they were doing (which of course may not be the case...)
and who, hopefully, didn't try to reinvent to many of the things like
semaphores.

Threads share memory, the problem is they share all memory (well not
stack and registers, but all "heap" memory if you will). So when you use
threads you have just lost the entire protected memory concept of modern
(and not modern, for that matter) operating systems. You have lost the
resource management handling of the operating system, so you need
to do your own synchronisation.

An obvious alternative is multiple processes and explicitely shared
memory which only contains the data which needs to be shared. That way
the OS can be used to protect the data that doesn't need to be shared,
instead of just trusting no other thread executes buggy code.

Concurrancy doesn't require threads. Threads are one way of achieving
it. How is multiple processes communicating via some form of IPC not
concurrency?

Threads are not all bad, they are the best solution to some problems.
But that set of problems is vanishingly small compared to the
problems threads are usually used to solve...
Please clarify what you mean by 'thread'. I suspect we aren't talking about
the same thing.

A thread is a concurrently executing entity which has it's own program
counter, registers, stack, and so on but shares main memory with other
threads. That's not a great definition, but hopefully will confirm or
counter your suspicion.
 
S

Steven T. Hatton

Sam said:
But you only need resource locking and synchronization because of the
existance of the threads. The code I was thinking about above contains a
fifo queue onto which one "section" of code pushes "events" it has
received via a TCP connection. Another "section" of code pops those
"events" in order to deal with them.

Hey, here's an idea: order these events by some kind of priority. Things
that don't need as much attention could be placed lower on the queue. You
could call it a priority queue. (Yes, I'm yanking your chain.) ;)
This could use threads - that would be one way to implement it. One
thread would do a blocking read on the network socket and when an event
was received would push it on the queue. Another thread would do a
"blocking pop" (or some equvalient thing) on the queue and deal with the
events. That would require resource locking and synchronization - in
this case the queue is the resource.

You could also have a queue of threads. That's what thread pooling does for
you.
It would also mean that all the
global data in the program is accessable by both threads.

I tend to use mutable global data about 1% as many times as I use a goto, so
this seems like a non-issue for me. Now, synchronizing shared resource
access /is/ an issue that must be addressed. Java provides the means to
accomplish this, out of the box, in a fairly intuitive way. Some people
think C++ should likewise provide that support. I originally was inclined
to agree. I have now come to the conclusion that it's probably better to
facilitate their use by providing what C++ already does provide, and leave
it to third party providers to produce the thread libraries. And there are
plenty.
Obviously the
queue needs to be, but due to using threads all the rest of it is too.
Any global data could change state at any time - of course the
programmers will hopefully know which parts of the global state they are
allowed to touch and which need locking and synchronisation. But why
throw away decades of work into systems which can eforce those
restrictions?

If I'm writing a server, an OS, a desktop manager, or similar, I'm pretty
much required to support concurrency. Andy people have attempted all of the
above with Java. The most successful of these has been creating the
server. I don't believe C++'s lack of native support for threading is a
major deficiency. It's nice to have in Java, and the fact that it is
native to the language means that I don't need to study the idiosyncracies
of a particular thread library when I work with a different library. The
downside, of course, is that I can't reasonably implement my own, perhaps
superior, thread support.
In this case my code doesn't use threads, it uses asynchronous events -
when data is available on the socket it gets read and an event pushed on
the queue. To pop from the queue, a callback is provided which is called
when something becomes available and so on. The "main loop" of the
system just multiplexes between the various "sections" according to the
events (button clicks, network sockets changing state, etc).

That's basically polling, if I understand you correctly. It works, Emacs
has been plodding along for decades that way, and there are certainly
advantages to designing your program that way.
The disadvantage of this is that on a multiprocessor machine the program
only uses one processor, with threads it could use all of the processors
(well as many as it had threads).

The advantage of this is that no locking is required (which makes things
faster), the programmer doesn't have to worry about "atomic operations"
but can write code knowing that no variable will change value all of a
sudden - no "action at a distance".

But that is partly due to the fact that your problem domain does not require
you to provide inherently asynchronous services. It may be the case that
threadding is over used by developers. I can't really say. I do know I
was in a situation where the lead engineer was creating his own thread
pooling rather than using that which was provided by the serve we were
developing on. I found that to be a poor choice. But that fact that every
request for service was handled by a different thread was an intuitively
obvious approach to the design.
Using threads is a *massive* trade off. You are trading off decades of
research and experience into concurrency and system design in order to
get whatever it is threads are giving you (a performance increase?)

The operating system was hopefully designed and implemented by people
who knew what they were doing (which of course may not be the case...)
and who, hopefully, didn't try to reinvent to many of the things like
semaphores.

But these are services the OS provides the application programmer. There are
times when it makes sense to use threads, and times when it doesn't.
Sometimes the use of threads is dictated by the development environment.
Application servers tend to naturally favor the use of threadding, and that
is where I have seen them used, and used well, in Java.
Threads share memory, the problem is they share all memory (well not
stack and registers, but all "heap" memory if you will).

That isn't correct. There are such a things as thread local resources which
are not shared between threads.

Concurrancy doesn't require threads. Threads are one way of achieving
it. How is multiple processes communicating via some form of IPC not
concurrency?

Threads are not all bad, they are the best solution to some problems.
But that set of problems is vanishingly small compared to the
problems threads are usually used to solve...

I can't comment on that because I have not seen them missused.
A thread is a concurrently executing entity which has it's own program
counter, registers, stack, and so on but shares main memory with other
threads. That's not a great definition, but hopefully will confirm or
counter your suspicion.

All but the point about 'main memory' are fairly consistent with my
understanding of the term. Threads have some registers of their own, and
some are shared by all threads in a program. The memory shared by threads
is the executable image, and global data. But I don't use global data! The
reuse of the executable image is an advantage even on non SMP systems, as
is the reduced context switching.
 
S

Steven T. Hatton

Steven said:
Sam Holden wrote:

That isn't correct. There are such a things as thread local resources
which are not shared between threads.

http://doc.trolltech.com/3.3/qthreadstorage.html

The QThreadStorage class provides per-thread data storage.

QThreadStorage is a template class that provides per-thread data storage.

Note that due to compiler limitations, QThreadStorage can only store
pointers.

The setLocalData() function stores a single thread-specific value for the
calling thread. The data can be accessed later using the localData()
functions. QThreadStorage takes ownership of the data (which must be
created on the heap with new) and deletes it when the thread exits (either
normally or via termination).

The hasLocalData() function allows the programmer to determine if data has
previously been set using the setLocalData() function. This is useful for
lazy initializiation.

For example, the following code uses QThreadStorage to store a single cache
for each thread that calls the cacheObject() and removeFromCache()
functions. The cache is automatically deleted when the calling thread exits
(either normally or via termination).

QThreadStorage<QCache<SomeClass> *> caches;

void cacheObject( const QString &key, SomeClass *object )
{
if ( ! caches.hasLocalData() )
caches.setLocalData( new QCache<SomeClass> );

caches.localData()->insert( key, object );
}

void removeFromCache( const QString &key )
{
if ( ! caches.hasLocalData() )
return; // nothing to do

caches.localData()->remove( key );
}
 
S

Steven T. Hatton

Chris said:
That would not have happened if I had written the code inline in all
the places I might have desired the check.

Why can't you use declarations and definitions to achieve the same thing?
Other than __LINE__ and __FILE__, the use of the CPP seems superfluous. I
do which it were possible to pass strings and or char*s as template
arguments. That would increase their usability as a substitute for
assertions.
They might have found one
instance. If they were motivated and had the time, they might have
done a search for other places, but they might easily have missed some
place where my coding was slightly different and it wasn't obvious
that this was an exception for assertion error checking.

What happens when your macro is designed in such a way that expects a
particular kind of behavior on the part of your code, and the modified
macro doesn't understand that?
 
S

Sam Holden

http://doc.trolltech.com/3.3/qthreadstorage.html

The QThreadStorage class provides per-thread data storage.

No. It provides virtual per-thread data storage. The actual data is
accessable to all threads - all they need is a pointer to its location
(unless of course a pointer into the stack was used).

It provides a conveniant way of having the correct pointer provided to
the correct thread. But in the presence of buggy code, there is no
guarantee that the data won't be scribbled on by another thread.

In comparison with multiple processes any data which is not in
explicitely shared memory *can not* be scribbled on by another buggy
process (well not without help from a buggy kernel).

Threads are fine and dandy if there are no bugs in the code. Then
again an operating system with no memory protection, that allows any
process to write to any memory location is fine and dandy if there are
no bugs (or malicious intent) in the processes. Threads protect you
from the malicious intent part (all the code is yours), but bugs are
harder to avoid.

Threads cost a lot, and deliver little. Sometimes that little is of such
great importance that the cost is worth it, but often not.

[Snip QT docs]

In a language, such as Java, it would be possible for the virtual machine
to provide such thread local storage, but C++ and current threading
models aren't such a beast.
 
S

Steven T. Hatton

Sam said:
No. It provides virtual per-thread data storage. The actual data is
accessable to all threads - all they need is a pointer to its location
(unless of course a pointer into the stack was used).

Actually, they can blow past the stack boundaries in some cases.
It provides a conveniant way of having the correct pointer provided to
the correct thread. But in the presence of buggy code, there is no
guarantee that the data won't be scribbled on by another thread.

I don't claim to be an expert on p-threads, but from reading the
specification you really have to try to access the TLS of another thread.
The only way I can see of doing such a thing is the blow the bounds of an
array, or access a dangling pointer. Neither of these errors are unique to
multi-threading. The only difference between doing this per thread, or per
process, is that accessing out of process memory will get you a big fat
fine from the O/S. If you access a neighboring thread's data, that just
results in corrupted data. There's nothing stopping you from doing that
within a heavyweight process, as long as you don't try to access something
outside of the process's memory space.
In comparison with multiple processes any data which is not in
explicitely shared memory *can not* be scribbled on by another buggy
process (well not without help from a buggy kernel).

Hmmmm. I thought real men don't need protection from themselves. Just ask
Phlip. ;)
Threads cost a lot, and deliver little. Sometimes that little is of such
great importance that the cost is worth it, but often not.

It really depends on the context. Servers are natural candidates for the
use of threading.
[Snip QT docs]

In a language, such as Java, it would be possible for the virtual machine
to provide such thread local storage,

Recall that this fork started with this exchange:
Phlip said:
Steven said:
And [from Java] you get threadding, unicode, effortless portability, incredibly smooth
refactoring, highlevel abstraction with the tools to support it, great,

Threading is good??
[end excerpt]
but C++ and current threading models aren't such a beast.

Threads and the Single UNIX(R) Specification, Version 2 (Copyright 1997 The
Open Group)

"Typically, the value associated with a given key for a given thread is a
pointer to memory dynamically allocated for the exclusive use of the given
thread (for example, per-thread stack and stack pointer). The scenario for
establishment and use of thread-specific data can be described as follows.
A module that needs to maintain static data on a per-thread basis creates a
new thread-specific data key as a part of its initialization routine. At
initialization, all threads of the process are assigned null values for the
new key. Then, upon each thread's first invocation of the module (which can
be determined by checking for a null key value), the module dynamically
allocates memory for the exclusive use of the calling thread, and stores a
pointer to the memory as the calling thread's value of the new key. Upon
later invocations of the same module by the same thread, the module can
access the thread's data through the new key (that is, the thread's value
for the key). Other modules can independently create other thread-specific
data keys for other per-thread data for their own use."

You might argue that this is 'on the stack', but that will be true of any
pointers you use to access heap data. As long as your pointers are
properly managed, you should never have a problem. And as I've already
pointed out, that is not unique to threading.

If you are suggesting that the use of threads requires that you know what
you are doing, and that you do it right. I agree. If you are suggesting
it is more difficult to learn than core C++, or at least necessary to learn
something difficult in addition to core C++, yup. I agree again.
 
R

Richard Herring

Steven T. Hatton said:
Why can't you use declarations and definitions to achieve the same thing?

I think you're about to answer your own question...
Other than __LINE__ and __FILE__, the use of the CPP seems superfluous. I
do which it were possible to pass strings and or char*s as template
arguments.

That's exactly the point about stringizing and token-pasting that was
raised and answered about 50 posts back in the thread.
That would increase their usability as a substitute for
assertions.

But you still wouldn't be able to pass _expressions_, which is what the
use of macros allows. And, thanks to the # operator, you can pass
something which is simultaneously an expression and a string.
What happens when your macro is designed in such a way that expects a
particular kind of behavior on the part of your code, and the modified
macro doesn't understand that?

What happens when any piece of code, macro or not, is designed under a
misapprehension?
 
K

Kai-Uwe Bux

Steven said:
It's not contradictory, it is simply recursive.

It is neither, the technical term for the way you used the word
"information" in your argument is, I think, "equivocation". But this does
not really matter since the point that you argue for has some merit
regardless of whether the argument was sound: I will just grant you that
"good solid easily obtainable information" about what is going on in my
code is a nice thing to have. And I also agree that C++ allows to write
code where this sort of meta-information is incredibly hard to obtain.
I am applying the same
argument a mechanical engineer would use to explain his use of computers
to
do his job. The programs and tools a software engineer uses are the
programming language, the compiler, the IDE, the libraries, etc.

I do not understand this analogy at all. But then again, we are not in
disagreement about the importance of understanding your code and how it
interacts with components beyond your control (like libraries).

And I believe this is a significant problem that can and should be
addressed.

There are mitigating strategies that I incorporate in my coding styles. A
problem is, of course, that I have to rely on libraries that might not
follow my coding style. Nonetheless, I feel that a lot can be done without
changing the language.

But what power do you lose by being able to import a resource with a using
declaration, or to bring in -perhaps implicitly- an entire namespace with
a
using declaration? That really is what I am suggesting, more than
anything.

I started to comment on this, and I realized that I was about to write
nonsense. So I realized that I do not understand your proposal. Could you
point me to a post where you have given more details about what these using
directives should do from a users point of view; obviously I missed some
postings of yours. Before I have a firm grasp of the proposed addition, I
cannot estimate what would have to be sacrificed (if anything) to make it
work -- after all this mechanism has to be integrated with the rest of C++.

WRT exceptions, the fact of the matter is that placing certain
requirements (restrictions) on their use improves their usability. If
that behavior can be configured at compile time in such a way that the
restrictions are not enforced, you have lot virtually nothing. The only
issue becomes the requirement that you do specify the alternative
behavior. Most compilers would probably provide a means of modifying the
option through a commandline switch, so compiling code that doesn't use
the feature would require no more than adding one command to your
autoconf.ac, or environment variables.


Its your issue, not mine. The only reason I mentioned exceptions is that
it is the only place where I am advocating placing restrictions on what
you
can do by default in C++. There are many restrictions placed on your use
of code. That's what type checking is all about. If you are suggesting
that I want to see changes in the way C++ is used in general, yes, that is
correct.

Do you suggest changes to the standard? I am not concerned with any effort
of yours to change "the way C++ is used in general" (a cultural change),
because it would be up to me to follow the crowd or not. Changes to the
standard are what I am concerned about. If you do not talk about those,
then I apologize for the misunderstanding.

I actually have a wishlist item in the KDevelop bug database that suggests
attempting such a thing using #pragma and a strategy of filename, resource
name consonance.

Sounds interesting and cool.


Best

Kai-Uwe Bux
 
S

Steven T. Hatton

Richard said:
In message <[email protected]>, Steven T. Hatton

But you still wouldn't be able to pass _expressions_, which is what the
use of macros allows. And, thanks to the # operator, you can pass
something which is simultaneously an expression and a string.

I don't know what the reason for templates not taking char* constants is, so
I can't really comment on what you might be able to do with them if the
capability to pass them were there.
What happens when any piece of code, macro or not, is designed under a
misapprehension?

Regular code won't do things like

#define twice(x) ((x)+(x))
int n = 1;
int sum;
sum = twice(++n); //undefined

It looks modestly handy to be able to pass an expression and have it
'stringized'. I tried for a long time to do something like that in Java to
no avail. Then I found out how to leverage introspection to accomplish
what I wanted.
 
S

Steven T. Hatton

Kai-Uwe Bux said:
Steven T. Hatton wrote:


It is neither, the technical term for the way you used the word
"information" in your argument is, I think, "equivocation".

What I mean by recursive is that I'm talking about applying the tools
created by information technology to the tool that create these tools.
I do not understand this analogy at all.

A mechanical engineer uses computers to explore his designs, to access
information, to organize his resources, etc. That's non-recursive in the
sense that a mechanical engineer is not creating IT tools. When I, as a
software engineer, apply information technology to my work it is recursive.
Self-referential, if you will.
There are mitigating strategies that I incorporate in my coding styles. A
problem is, of course, that I have to rely on libraries that might not
follow my coding style. Nonetheless, I feel that a lot can be done without
changing the language.

I agree. But I still find #inclusion redundant, primitive, inelegant,
potentially confusing, and lots of other bad things.
I started to comment on this, and I realized that I was about to write
nonsense. So I realized that I do not understand your proposal. Could you
point me to a post where you have given more details about what these
using directives should do from a users point of view; obviously I missed
some postings of yours. Before I have a firm grasp of the proposed
addition, I cannot estimate what would have to be sacrificed (if anything)
to make it work -- after all this mechanism has to be integrated with the
rest of C++.

This is from a previous post. It isn't as concise as I would like, but I
need to work on clarifying in my own mind exactly what I'm suggesting
before I try to formaize it for others. All of what follows might be
replace by the requirement that 'Given a fully qualified identifier in a
source file, the implementation shall locate the declaration, (and
definition if needed), and make it available so that the source file can be
successfully compiled.' How it does this? I don't care!

//-----------------------------------------------------------------------
All of this is correct.  But I'm not sure that's the most problematic aspect
of the CPP.  Though the CPP and its associated proprocessor directives do
constitute a very simple language (nowhere near the power of sed or awk),
it obscures the concept of translation unit by supporting nested #includes.
When a person is trying to learn C++, the additional complexity can
obfuscate C++ mistakes.  It's hard to determine if certain errors are CPP
or C++ related.

IMO, the CPP (#include) is a workaround to compensate for C++'s failure to
specify a mapping between identifiers used within a translation unit and
the declarations and definitions they refer to.

As an example let's consider the source in the examples from
_Accelerated_C++:_Practical_Programming_by_Example_ by Andrew Koenig and
Barbara E. Moo:
 
http://acceleratedcpp.com/

// unix-source/chapter03/avg.cc
#include <iomanip>
#ifndef __GNUC__
#include <ios>
#endif
#include <iostream>
#include <string>

using std::cin;                  using std::setprecision;
using std::cout;                 using std::string;
using std::endl;                 using std::streamsize;

I chose to use this as an example because it's done right (with the
exception that the code should have been in a namespace.) All identifiers
from the Standard Library are introduced into the translation unit through
using declarations.  Logically, the using declaration provides enough
infomation to deterministically map between an identifier, and the
declaration it represents in the Standad Library.  The #include CPP
directives are necessary because ISO/IEC 14882 doesn't require the
implementation to resolve these mappings.  I believe - and have suggested
on comp.std.c++ - that it should be the job of the implementation to
resolve these mappings.

Now a tricky thing that comes into play is the relationship between
declaration and definition.  I have to admit that falls into the category
of religious faith for me.  Under most circumstances, it simply works, when
it doesn't I play with the environment variables, and linker options until
something good happens.

I believe what is happening is this: When I compile a program with
declarations in the header files I've #included somewhere in the whole
mess, the compiler can do everything that doesn't require allocating memory
without knowing the definitions associated with the declarations.
(by compiler I mean the entire CPP, lexer, parser, compiler and linker
system) When it comes time to use the definition which is contained in a
source file, the source file has to be available to the compiler either
directly, or through access to an object file produced by compiling the
defining source file.

For example, if I try to compile a program with all declarations in header
files which are #included in appropriate places in the source, but neglect
to name one of the defining source files on the command line that initiates
the compilation, the program will "compile" but fail to link.  This results
in a somewhat obscure message about an undefined reference to something
named in the source.  I believe that providing the object file resulting
from compiling the defining source, rather than that defining source
itself, will solve this problem.

The counterpart to this in Java is accomplished using the following:

* import statement

* package name

* directory structure in identifier semantics

* classpath

* javap

* commandline switches to specify source locations



Mapping this to C++ seems to go as follows:

* import statement

This is pretty much the same as a combination of a using declaration and and
a #include.  A Java import statement looks like this:

import org.w3c.dom.Document

In C++ that translates into something like:

#include <org/w3c/dom/Document.hh>
using org::w3c::dom::Document

* package name

This is roughly analogous to the C++ namespace, and is intended to support
the same concept of component that C++ namespaces are intended to support.
In Java there is a direct mapping between file names and package names.
For example if your source files are rooted at /java/source/ (c
\java\source) and you have a package named org.w3c.dom the name of the file
containing the source for org.w3c.dom.Document will
be /java/source/org/w3c/dom/Document.java. Using good organizational
practices, a programmer will have his compiled files placed in another,
congruent, directory structure, e.g., /java/classes/ is the root of the
class file hierarchy, and the class file produced by
comepiling /java/source/org/w3c/dom/Document.java will reside
in /java/classes/org/w3c/dom/Document.class.  This is analogous to placing
C++ library files in /usr/local/lib/org/w3c/dom
and /usr/local/include/org/w3c/dom.  

* directory structure in identifier semantics

In Java the location of the root of the class file hierarchy is communicated
to the java compiler, and JVM using the $CLASSPATH variable.  In C++ (g++)
the same is accomplished using various variables such as $INCLUDE_PATH
(IIRC) $LIBRARY_PATH $LD_LIBRARY_PATH and -L -I -l switches on the
compiler.

Once Java know where the root of the class file hierarchy is, it can find
individual class files based on the fully qualified identifier name.  For
example:

import import org.w3c.dom.Document

means go find $CLASSPATH/org/w3c/dom/Document.class

The C++ Standard does not specify any mapping between file names and
identifiers.  In particular, it does not specify a mapping between
namespaces and directories.  Nor does in specify a mapping between class
names and file names.

* classpath

As discussed above the $CLASSPATH is used to locate the roots of directory
hierarchies containing the compiled Java 'object' files.  To the compiler,
this functions similarly to the use of $LIBRARY_PATH for g++.  It also
provides the service that the -I <path/to/include> serves in g++

* javap

The way the include path functionality of C++ is supported in Java is
through the use of the same mechanism that enables javap to provide the
interface for a given Java class.

For example:

Thu Aug 19 09:40:27:> javap org.w3c.dom.Document
Compiled from "Document.java"
interface org.w3c.dom.Document extends org.w3c.dom.Node{
    public abstract org.w3c.dom.DOMImplementation getImplementation();
   ...  
    public abstract org.w3c.dom.Attr createAttribute(java.lang.String);
       throws org/w3c/dom/DOMException
....
}

What Javap tells me about a Java class is very similar to what I would want
a header file to tell me about a C++ class.

* commandline switches to specify source locations
This was tacked on for completeness.  Basically, it means I can tell javac
what classpath and source path to use when compiling.  If a class isn't
defined in the source files provided, then it must be available in compiled
form in the class path.

One final feature of Java which makes life much easier is the use of .jar
files.  A C++ analog would be to create a tar file containing object files
and header associated header files that compilers and linkers could use by
having them specified on the commandline or in an environment variable.


I know there are C++ programmers reading this and thinking it is blasphemous
to even compare Java to C++.  My response is that Java was built using C++
as a model.  The mechanisms described above are, for the most part, simply
a means of accomplishing the same thing that the developers of Java had
been doing by hand with C and C++ for years.  There is nothing internal to
the Java syntax other than the mapping between identifier names and file
names that this mechanism relies on.  This system works well. The world will
be a better place when there is such a thing as a C++ .car file analogous
to a Java .jar file.  Grant that these will not be binary compatable from
platfor to platform, but in many ways that doesn't matter.
//-----------------------------------------------------------------------
Do you suggest changes to the standard? I am not concerned with any effort
of yours to change "the way C++ is used in general" (a cultural change),
because it would be up to me to follow the crowd or not. Changes to the
standard are what I am concerned about. If you do not talk about those,
then I apologize for the misunderstanding.

Yes, I am suggesting changes to the standard be considered. I've already
suggested the exception mechanism be changed.

Sounds interesting and cool.

I recently found there is more infrastructure in the code base for KDevelop
which might facilitate this than I previously thought. This in particular:
http://www.kdevelop.org/HEAD/doc/api/html/classAST.html

I have the sense that Roberto Raggi's AST might provide a good foundation
for an entire C++ compiler. In comparison to what I saw in the gcc source
code, Roberto's seems a lot cleaner.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top