Questions about "mismatch"

gwowen · Dec 22, 2010

Why was end2 omitted?
That is the core of the discussion.

Even if you do not use the faster version, a difference test is just an
integer comparison

It's only an *iterator* comparison, which isn't necessarily the same
thing. Your library caches indices by design, so this iterator
comparison is always cheap -- so every iterator is a potential random-
access-iterator. But, you have to pay the price of invalidating all
iterators even for structures where it is not absolutely necessary
(deletion from a linked list or a map, say).

It's a trade-off. Personally, I'd've liked defined behaviour for
those containers where the check can be shown to be cheap (e.g.
anything for which std::distance is O(1)), and undefined for
everything else, but I can see why such asymmetry was not thought good
by the standard writers.

jacob navia · Dec 22, 2010

Le 22/12/10 12:33, Joshua Maurice a écrit :

For pointers or std::vector::iterator as commonly implemented in non-
debug mode, I fail to see how this is the case. You effectively need
to do the test:
((aEnd - aStart)< (bEnd - bStart))

I don't see how you can get around at least two subtractions or their
equivalents, and that means hitting the ALU. It definitely does not
mean a simple single test and branch processor instruction.

However, I suppose that if you're doing your silly system where you
keep indexes along with the iterators, then it might resolve down to a
single test and branch. However, designing iterators to be like that
would add so much overhead to the normal usage of the iterators that
it would almost certainly noticably negatively affect your
performance.

In my implementation of red black trees, for instance, I use
(conceptually) a flexible array for storage. This has the added
advantage that I have a good implementation of iterators essentially
for free, since I can go from the start of the array to the end
without much problems, instead of doing quite expensive tree
operations to determine what element would be the next.

Then, storing an index into this (hidden) array allows me at the same
time improve iteration and make calculating pointer distances
trivial.

But it is silly of course.

And to your point, yes, you have to do two subtractions, what in
modern machines is not very complicated or time consuming mind you.
Obviously it is more costly than NOT doing it at all. My point
is that it is such that it can't even be measured.

When I spoke of an integer comparison I was speaking about

i1 != end1 && i2 != end2

in the program proposed by Mr gwowen above in this thread. That would
be just two integer comparisons assuming iterators fit in a pointer
and that a pointer fits in a register.

jacob

jacob navia · Dec 22, 2010

Le 22/12/10 13:19, SG a écrit :

Why was end2 omitted?
That is the core of the discussion.

Click to expand...

[snip]

Admittedly, the precondition about the second sequence's length could
be more explicit in the C++ standard. But it's rather self-
explanatory, if you apply common sense.

Not only that. There is no DEFINED outcome for the error, i.e. a crash
is accepted as a "fact of life", or memory corruption or whatever.

Anyhow, I couldn't care less about std::mismatch omitting an end2
parameter. As others have pointed out already, some vendors
(Microsoft, GNU) offer an "STL debug mode".

This is another can of worms. If there is a "debug" mode and in
production (that's where it is important to have checks) there
is none, you are making maintenance a very complicated business.

Three years after the original programmer left, the software
crashes *sometimes* in mysterious fashion. This depends on the
usage pattern, some customers get crashes, others do not get
any and it is difficult to pinpoint the problem or attach it to
any particular code.

After a month of debugging the C++ expert arrives at the
problem in an off by one calculation, where mismatch
sometimes accessed invalid memory but not always. And it
couldn't be catched by setting the debug flag to one because
it would only happen in the tighter conditions of optimized
production code. The debug version works always.

Sounds familiar?

So, you get to choose
between checking iterators/algorithms and fast iterators/algorithms.

Why can't we have a MINIMAL checking even in production systems?

What is this philosophy of saving some integer comparisons and
risking months of debugging?

I'm working in the high performance computing domain and I appreciate
how the C++ language/library design turned out.

OK, go ahead then.

jacob navia · Dec 22, 2010

Le 22/12/10 13:38, gwowen a écrit :

It's only an *iterator* comparison, which isn't necessarily the same
thing. Your library caches indices by design, so this iterator
comparison is always cheap -- so every iterator is a potential random-
access-iterator. But, you have to pay the price of invalidating all
iterators even for structures where it is not absolutely necessary
(deletion from a linked list or a map, say).

That design decision was motovated by the problem of multithreading.
True, in a hash table you can delete an element and it doesn't affect
really the others. But since all my containers cache the "count" field
that tracks how many items are stored in the container, a deletion
implies updating that field.

Now if you are unlucky and another thread was iterating
the map it could do the iteration by querying the size of the map
at the start, then iterating using that cached value. That code
would be wrong, the size of the container changed and the cached
value would be wrong. So, the correct way would be to always
query the container for the number of items it stores at each
iteration. But that would be grossly inefficient in 99% of the
situations.

It is better to have simple rules, because software is already
complicated enough really.

It's a trade-off.

Engineering *is* a trade off in everything. To hit the right trade
off is our job as software developers.

Personally, I'd've liked defined behaviour for
those containers where the check can be shown to be cheap (e.g.
anything for which std::distance is O(1)), and undefined for
everything else, but I can see why such asymmetry was not thought good
by the standard writers.

The solution I am trying to implement consists on making that
query always cheap in all supported containers. That is why my
implementation of red-black trees uses an underlying array.
(Actually is a bit more complicated than that but it is conceptually an
array)

Ian Collins · Dec 22, 2010

Le 22/12/10 13:19, SG a écrit :

Why was end2 omitted?
That is the core of the discussion.

Click to expand...

[snip]

Admittedly, the precondition about the second sequence's length could
be more explicit in the C++ standard. But it's rather self-
explanatory, if you apply common sense.

Click to expand...

Not only that. There is no DEFINED outcome for the error, i.e. a crash
is accepted as a "fact of life", or memory corruption or whatever.

Does the C or C++ standard define what happens if you call strlen with
something other than a null terminated array of char?

jacob navia · Dec 22, 2010

Le 22/12/10 20:01, Ian Collins a écrit :

Le 22/12/10 13:19, SG a écrit :

On 22 Dez., 10:57, jacob navia wrote:

Why was end2 omitted?
That is the core of the discussion.

[snip]

Admittedly, the precondition about the second sequence's length could
be more explicit in the C++ standard. But it's rather self-
explanatory, if you apply common sense.

Click to expand...

Not only that. There is no DEFINED outcome for the error, i.e. a crash
is accepted as a "fact of life", or memory corruption or whatever.

Click to expand...

Does the C or C++ standard define what happens if you call strlen with
something other than a null terminated array of char?

Yes, and that was one of the main reasons I proposed counted strings
as a new string type, you remember those discussions in comp.lang.c?

But the answer was that C is so fucked up and there are so many things
wrong with it that it was not worth the trouble.

Hence my surprise that in C++ things aren't basically different.

The advantage of a DEFINED outcome for errors is that any error will at
least provoke always the same behavior. Leaving errors as "undefined
behavior" makes impossible to know what will happen, in most cases
at best a crash, or at worst memory corruption.

But OK. Let's leave it at that.

Ian Collins · Dec 22, 2010

Le 22/12/10 20:01, Ian Collins a écrit :

Le 22/12/10 13:19, SG a écrit :
On 22 Dez., 10:57, jacob navia wrote:

Why was end2 omitted?
That is the core of the discussion.

[snip]

Admittedly, the precondition about the second sequence's length could
be more explicit in the C++ standard. But it's rather self-
explanatory, if you apply common sense.

Not only that. There is no DEFINED outcome for the error, i.e. a crash
is accepted as a "fact of life", or memory corruption or whatever.

Click to expand...

Does the C or C++ standard define what happens if you call strlen with
something other than a null terminated array of char?

Click to expand...

Yes, and that was one of the main reasons I proposed counted strings
as a new string type, you remember those discussions in comp.lang.c?

Ah, so you are seeing my responses. Yes I do remember those, most
entertaining!

But the answer was that C is so fucked up and there are so many things
wrong with it that it was not worth the trouble.

I'm glad you accept some aspects of C (unfortunately inherited by C++)
are beyond redemption. But I still think you are missing the point that
C++ allows you add extra overloads to library functions, so providing
the minimum set is a good thing.

namespace std
{
template <typename InputIterator1, typename InputIterator2>
pair<InputIterator1,InputIterator2>
mismatch( InputIterator1 beg, InputIterator1 end,
InputIterator2 cbeg, InputIterator2 cend )
{
// Do error checkin here.

return mismatch( beg, end, cbeg );
}
}

jacob navia · Dec 22, 2010

Le 22/12/10 21:28, Paavo Helde a écrit :

My gut feeling is this is just to mirror the std::copy interface.
However, given the more symmetric nature of std::mismatch() a more
symmetric interface would have not been unappropriate indeed.

Maybe it would be if it were not about such a remote corner of STL. I
have never seen anybody mentioning std::mismatch() in this group before.

Well, it is about software design. The ommission of end2 is the *result*
of a certain software deisgn, that you explain quite well here:

In C++, the program is responsible for ensuring that *all* parameters to
the standard library functions are valid, not only the third parameter of
std::mismatch(). For example, also the first range for std:mismatch()
must be valid, one may not pass a start iterator from one container and
end iterator from another, for example. However, STL does not guarantee
any protection against such errors, this is just UB.

Exactly. The programmer is supposed to never make any mistake, and any
mistake, even a completely trivial one, produces immediately a
catastrophe. Yes, there are debug versions, but they are optional,
the library itself doesn't even mention the errors even less specifies
what happens when an error occurs.

This means that a crash (at best) or memory corruption are considered
a "fact of life" just to avoid a negligible performance overhead.

Similarly, such an
error could be avoided by changing the interface to the function, for
example passing a start iterator and a count instead of the iterator
pair, or by passing a reference to the container plus start and end
indices. I do not see much difference to the missing end2 parameter case,
why are you so obsessed with this specific design decision?

Because I was writing that particular function for my container library.
I thought it would be a good idea to be compatible with C++, so I asked
here. I was startled that in C++ you pass a start point for an
iteration without passing the length or a way for the library to check
its arguments. Then I was appalled that there wasn't any error analysis
AT ALL. Nothing. It is just assumed that the programmer never makes
mistakes and that any mistake is fatal as a matter of course.

Incredible but true.

If you want to make a proposal of changing the standard and for example
add an overload with end2, there are ways to do that.

I know, but after the negative reaction here I think that it is not
really a good idea. People here seem to believe that fast execution
speed is the ONLY criteria for good software. Error analysis, software
rebustness and security are criteria that do not seem to interest
people, or at best they are an afterthought.

It is not just "mismtach". It is an example of a certain way of
designing software, that is why the discussion is so interesting.

We have witnessed a quantum leap in hardware performance, machines today
are 20-30 times FASTER than just 10-15 years ago but we go on as
programms would need only one criteria: raw performance. Then we get
into monstruosities that nobody understands any more and crash sometimes
for unknown reasons but there is just not any means of finding out what
happens.

Some programmer in a hurry left a "mismtach" call with an "off by one"
bug and left the company. The code part wasn't used very often, and the
wrong call produced only some missing data in the pipeline that nobody
noticed. The program would just crash sometimes but nobody can tell you
why.

Sounds like you have seen that kind of situation?

Error analysis is the first part of software maintenance. It AVOIDS
nightmare situations like that. Let's face it. There are good
programmers, and bad programmers. Good programmers are good most of
the time, bad programmers are bad most of the time. But even exceptional
programmers do mistakes. Mistakes are an INHERENT part of programming,
and good engineered libraries take this fact into account!

Again I would like to stress that I am exposing my personal point of
view, and I am not implying that people that do not agree with me are
bad programmers, etc.

Thanks for your contribution.

jacob

jacob navia · Dec 22, 2010

Le 22/12/10 22:47, Leigh Johnston a écrit :

You are forgetting that a pointer can also be an iterator; are you
suggesting that pointers should be removed from the C++ language to
prevent potential UB?

This is ridiculous. If somebody says that a buffer overflow could
be avoided by checking the length of the second container he is
"proposing that pointers should be removed from the language"

This is just unfair discussion tactics designed to scare people
into silence.

What silliness!

Yes, but the silly person is not me. I did never said that I wanted
to ban pointers

Part of C++'s success as a
language is that it is mostly compatible with C meaning it is fairly
low-level. You pay for what you use and you don't pay for what you don't
use (checked iterators). If you don't like this go and suck on Java.

I am not a language evangelizer, I do not despise Java programmers nor
the Java language. I am just surprised that there absolutely NO ERROR
ANALYSIS here.

Good programmers are good most of the time but even good programmers
DO MAKE MISTAKES. Mistakes are a FACT OF PROGRAMMING that you just can't
ignore, and to specify what happens when a mistake happens (ERROR
ANALYSIS) is a very important point of software constructions, i.e.
FAILURE MODES.

Precisely I do not want to pay for what I did not write: software
mainten ance is simplified when errors provoke always the same
consequences. With your way of constructing software, errors can
just overwrite memory (luckily "mismtach" doesn't write anything)
or read nonsense data from uninitialized memory or whatever, it
is a maintenance NIGHTMARE.

Ian Collins · Dec 22, 2010

ORLY?

"The behavior of a C++ program is undefined if it adds declarations or
definitions to namespace std or to a
namespace within namespace std unless otherwise specified. A program may
add a template specialization
for any standard library template to namespace std only if the
declaration depends on a user-defined type
of external linkage and the specialization meets the standard library
requirements for the original template
and is not explicitly prohibited."

Fair point, but the bigger issue is still remains: you can provide your
own overload. Even it wasn't naughty, putting it in std is probably one
way to cause confusion. I guess it's ironic that one way to remove
possible UB is to add some more!

Marc · Dec 23, 2010

I am not a language evangelizer, I do not despise Java programmers nor
the Java language. I am just surprised that there absolutely NO ERROR
ANALYSIS here.

C++ is like cake mix you buy at the grocery store: you have to add eggs,
oil and water and then bake it, assemble and frost it before you get to
eat any cake. Cake mix is more than just raw materials though, and comes
with instructions also (very important to read and understand the
instructions). Premade cakes are something different for a different
target market. To offer cake mix rather than finished cake was a
conscious decision made by the C++ library designers. Maybe you are the
baker and will offer a C++ all wrapped up for those who don't like to
bake and would rather just eat cake. Some people like the cake mix so
they can add their own special touches to it: half the oil for low cal
cake - no problem, a touch of rum - no problem, chocolate surprise
inside - no problem. You can't start with finished cake and go back to
cake mix, but you can go from cake mix to finished cake and then some.

gwowen · Dec 23, 2010

Mistakes are a FACT OF PROGRAMMING that you just can't
ignore, and to specify what happens when a mistake happens (ERROR
ANALYSIS) is a very important point of software constructions, i.e.
FAILURE MODES.

There's a false distinction here - nobody is suggesting that
programmers should not do error checking and analysis - the only
debate is really "how much error checking should be done by the
standard library, and how much should be left to the programmer."
Putting a lot in the library makes programmers' jobs easier - they
don't have to worry about knowing as many preconditions for
mismatch(), say. Omitting the checks from the library increases the
chance of undefined behaviour in poorly written programs, but can gain
other advantages (fewer checks and branches in tight loops, less
frequently invalidated iterators, less memory overhead for small
containers). And of course, its easier for a programmer to add checks
to unchecked routines than remove them from checked routines,
especially with function overloading.

There is no right answer to the question "Which is the right thing to
do?". It's a judgement call. The C-standard definitely favours the no-
checks approach, and C++ has followed this to a greater or lesser
extent.

There's a third way of course, exemplified by std::vector<>.at() vs
std::vector<>.operator[]().

deque questions	4	May 5, 2010
Container persistence (save/load)	4	Mar 30, 2010
several newbie questions about stl	10	Feb 8, 2009
Questions about buffered streams	28	Jan 7, 2008
Questions on various string literals in c++0x	1	Dec 7, 2010
some questions about multiple inheritance	9	Jun 26, 2007
STL:: Questions on set_intersection	3	Oct 11, 2005
Questions about optmodel constraints	0	Nov 21, 2010

Questions about "mismatch"

gwowen

jacob navia

jacob navia

jacob navia

Ian Collins

jacob navia

Ian Collins

jacob navia

jacob navia

Ian Collins

Marc

gwowen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads