Use != rather than < in for loops ?

  • Thread starter lovecreatesbea...
  • Start date
D

Daniel T.

Julián Albo said:
Are you saying that C++ is only intended to machines with ascii compatible
charsets?

I'm saying that the is... functions in <cctype> are intended to be used
only on characters that can be represented using 7 bits. i.e., if
assert( ch & 0x7F == ch || ch == EOF ) fails, then the result returned
by ispunct and family on 'ch' is undefined. As such a cast to unsigned
char is meaningless. Again, I could be wrong on this, but that is my
understanding. If you know of an authoritative source that says the cast
is necessary, I'd love to see it.

There are a separate set of is... functions defined in <local> that can
be used for characters that are represented using more than 7 bits, and
a cast is never needed for any of those.
 
?

=?ISO-8859-15?Q?Juli=E1n?= Albo

Daniel said:
I'm saying that the is... functions in <cctype> are intended to be used
only on characters that can be represented using 7 bits. i.e., if

Do you know of an authoritative source that says that? All I have readed is
that the int argument must be EOF or the value of an unsigned char.
 
D

Daniel T.

Julián Albo said:
Do you know of an authoritative source that says that?

I don't have much in the way of authoritative sources. I was hoping
someone who had the standard could tell us one way or the other.

I did find, "In the C locale or in a locale where character type
information is not defined, characters are classified according to the
rules of the ASCII 7-bit coded character set." (Found at
http://www.mksxserver.com/docs/man3/ispunct.3.asp,
http://www-zeuthen.desy.de/apewww/APE/software/nlibc/html/ctype_8h.html,
http://ou800doc.caldera.com/en/man/html.3C/ctype.3C.html et al.)

I have to admit, I know very little about the locale system and fully
expect that those who ask about using ispunct here have not called
setlocale() (a function which is not even mentioned in TC++PL AFAICT.)

TC++PL does make a comment that passing a signed char into the cctype
functions can be problematic but it gives no details.

According to
http://www.dinkumware.com/manuals/?manual=compleat&page=ctype.html the
only standard (i.e., not-implementation defined) characters that ispunct
will return true for are within the ascii character set. Passing a non
7-bit ASCII character to ispunct seems to be, at best, implementation
spicific.
 
C

Clark S. Cox III

Daniel said:
I don't have much in the way of authoritative sources. I was hoping
someone who had the standard could tell us one way or the other.

I did find, "In the C locale or in a locale where character type
information is not defined, characters are classified according to the
rules of the ASCII 7-bit coded character set." (Found at
http://www.mksxserver.com/docs/man3/ispunct.3.asp,
http://www-zeuthen.desy.de/apewww/APE/software/nlibc/html/ctype_8h.html,
http://ou800doc.caldera.com/en/man/html.3C/ctype.3C.html et al.)

Those are manpages, specific to particular platforms.
I have to admit, I know very little about the locale system and fully
expect that those who ask about using ispunct here have not called
setlocale() (a function which is not even mentioned in TC++PL AFAICT.)

TC++PL does make a comment that passing a signed char into the cctype
functions can be problematic but it gives no details.

According to
http://www.dinkumware.com/manuals/?manual=compleat&page=ctype.html the
only standard (i.e., not-implementation defined) characters that ispunct
will return true for are within the ascii character set. Passing a non
7-bit ASCII character to ispunct seems to be, at best, implementation
spicific.

Yes, those characters are all in the ASCII standard, but that doesn't
mean that they have to be represented by the equivalent ASCII values.
here is nothing preventing a conforming C++ implementation on a platform
that uses EBCDIC for instance (which is an 8-bit encoding).
 
A

Andrew Koenig

I suppose it really depends on what it is used for - but I still think <
is less error-prone than a != (as any value can be != - while you're
limiting a range using <) .

No -- it's more error-prone, not less, because it makes it easier for errors
to escape undetected.
What "correct" result are you after?
The last point in the loop?
.. Surely the data effected by the code inside the loop proves if it is
"correct" or not - and not the result of an incrementing index?
If you wanted the last point using "i != N" then you're after N... Why not
just USE N?

Here is a concrete example of what I'm talking about.

Suppose I want to write a loop that calls f(n) for every value of (integer)
n such that n>=0 and n<N.

Then I might write a loop with an invariant such as

I have called f(n) for every value of n such that n >= 0 and n < i

and initially establish the invariant by writing

i = 0;

This initialization of i establishes the invariant because there are no
values of n that are both >= 0 and < i.

The loop itself might look like this

while (i != N) { /* or i < N */
// increase i while maintaining the invariant
f(i);
++i;
}

Now, suppose we want to prove that this loop does what we intend.
Regardless of whether the loop condition is i != N or i < N, we have to
prove that the loop terminates.

If the condition is i != N, we can't prove that the loop terminates unless
we prove that N >= 0, a fact that you are claiming argues that i < N is
superior.

However, if the condition is i < N, that complicates the second part of the
proof. Because in that case, when the loop does terminate, we do not know
that i == n, which means that instead of proving the original requirement:

we have called f(n) for every value of (integer) n such that n >= 0 and
n < N

we have instead proved

we have called f(n) for every value of (integer) n such that n >= 0 and
n < i, where i is an unknown value >= n

In other words, in exchange for simplifying the proof of termination, we
have complicated the proof of correctness.

I would much rather simplify the proof of correctness, because if for some
reason the program doesn't terminate, we'll surely know about it. In other
words, I'd rather have a program that goes into an infinite loop than one
that quietly produces the wrong answer.
.. I really don't like that.
Reminds me of all these buffer over-run errors where something unexpected
means CONSTANT bug-fixing.

I have no idea what you're talking about here.
It also reminds me of a current position I have at work, using an un-named
application (I don't think it's my place to name-names) which produced an
endless loop because of some incorrect data.. It took them 3 days to track
this down on a production environment (the data was incorrect for 2 weeks
over the xmas and only produced errors after people started again: adding
to the difficulty).

I actually support the previous posters' paragraph which states "limiting
value N", and "if it is incorrect to execute the loop body if.. i is
greater than N".

I just thought I'd post my thoughts on this, because I'd prefer my code to
do what I program - rather than produce a program which may do 'other
things'.

Exactly. And to be confident that your code does what you program, you
should make it as easy as possible to prove that it does what you intend.
 
?

=?ISO-8859-15?Q?Juli=E1n?= Albo

Daniel said:
I did find, "In the C locale or in a locale where character type
information is not defined, characters are classified according to the
rules of the ASCII 7-bit coded character set." (Found at
http://www.mksxserver.com/docs/man3/ispunct.3.asp,
http://www-zeuthen.desy.de/apewww/APE/software/nlibc/html/ctype_8h.html,
http://ou800doc.caldera.com/en/man/html.3C/ctype.3C.html et al.)

Those looks like the documentation of an implementation, non a standard
description. And is highly likely that all those implementations are
intended for machines with ascii-compatible character set.

But even in that case, the locale can be not 'C' and can not work as 'C' in
this regard.
I have to admit, I know very little about the locale system and fully
expect that those who ask about using ispunct here have not called
setlocale()

Very nice for your part to ignore all people in the world that use languages
with more characters than the ascii set.
TC++PL does make a comment that passing a signed char into the cctype
functions can be problematic but it gives no details.

A good indication that you should not do it if without a good reason, IMO.
Passing a non 7-bit ASCII character to ispunct seems to be, at best,
implementation spicific.

But passing it as unsigned char, the result will always be the intended by
the user who knows about his locale. Passing it as signed char, maybe not.
An unnecessary risk for no other benefit than avoid to write a few chars in
the source.
 
D

Daniel T.

Julián Albo said:
Those looks like the documentation of an implementation, non a standard
description.

Which is why I have been asking someone to look it up in the standard.
TC++PL says that implementation documentation is the authoritative
source.
But even in that case, the locale can be not 'C' and can not work as 'C' in
this regard.

Odd, the documentation I can find for setlocale says that the default
local is "C", "A minimal environment for C-language translation"
Very nice for your part to ignore all people in the world that use languages
with more characters than the ascii set.

Please, no need to be combative. If one wants to handle languages with
more characters than the ascii set, one is probably better off using
some form of unicode which means all of the functions in cctype are
useless.

I've said more than once that I'm not sure of the correct answer here.
I'm asking what it is. You seem to be sure of the correct answer, but
you haven't produced any authoritative source to back your answer up.
A good indication that you should not do it if without a good reason, IMO.

But I'd like to understand the details. Do you know of any source that
explains them? TC++PL says, "discussion of [local conversion between
codesets] is beyond the scope of this book. Please consult your
implementation's documentation." Well, I did that above for several
implementations. Is this something that is purely implementation defined?
 
?

=?ISO-8859-15?Q?Juli=E1n?= Albo

Daniel said:
Which is why I have been asking someone to look it up in the standard.
TC++PL says that implementation documentation is the authoritative
source.

The standard says nothing about the working of the C-inherited functions,
other than some minor variants, and refering to the appropriate C-standard
document.
Please, no need to be combative. If one wants to handle languages with
more characters than the ascii set, one is probably better off using
some form of unicode which means all of the functions in cctype are
useless.

Probably today is a better approach, but there are still a lot of programs
that use 8-bit national charsets. And in may cases a program can be quickly
adapted with a change of locale and rewriting it with unicode support is
not an option. And even using uft-8 encoded unicode some locales can
correctly support several of the is... functions.
But I'd like to understand the details. Do you know of any source that
explains them?

I think the base point is clear: ispunct and family takes and int and
expects it contains the EOF value or the int value of an unsigned char,
passing it a plain char does not grant this expectation. You can check the
man page of some unix system, they are usually less ascii-centric than
other sources, see for example:

http://www.die.net/doc/linux/man/man3/ispunct.3.html

Or refer to the C-standard documents. I don't have it at hand.

And, as others have pointed, the fact that some documents talks about
character presents in the ascii charset, does not necessarily mean that the
ascii encoding is used.
TC++PL says, "discussion of [local conversion between
codesets] is beyond the scope of this book. Please consult your
implementation's documentation." Well, I did that above for several
implementations. Is this something that is purely implementation defined?

I think this quote refers to more elaborated things than character
classification functions. Chapter and point?
 
D

Daniel T.

Julián Albo said:
Daniel T. wrote:

I think the base point is clear: ispunct and family takes and int and
expects it contains the EOF value or the int value of an unsigned char,
passing it a plain char does not grant this expectation.

Unfortunately, I don't have the C standard either.
TC++PL says, "discussion of [local conversion between
codesets] is beyond the scope of this book. Please consult your
implementation's documentation." Well, I did that above for several
implementations. Is this something that is purely implementation defined?

I think this quote refers to more elaborated things than character
classification functions. Chapter and point?

Section 21.7, the last paragraph.

It seems though, that you are right in that if the programer has any
reason to expect that the 8th bit will be on for any character, he
should cast to unsigned char before passing the char to the cctype
functions.
 
?

=?ISO-8859-15?Q?Juli=E1n?= Albo

Daniel said:
TC++PL says, "discussion of [local conversion between
codesets] is beyond the scope of this book. Please consult your
implementation's documentation." Well, I did that above for several
implementations. Is this something that is purely implementation
defined?
I think this quote refers to more elaborated things than character
classification functions. Chapter and point?
Section 21.7, the last paragraph.

This paragraph looks confusing to me (I have a edition in Spanish, but I
think the translation is very good, and the paragraph you quoted confirms
it). Maybe it means "consult (the C++ standard or more specialized books
and) your implementation's documentation".
 
D

Daniel T.

Julián Albo said:
Daniel said:
TC++PL says, "discussion of [local conversion between
codesets] is beyond the scope of this book. Please consult
your implementation's documentation." Well, I did that above
for several implementations. Is this something that is purely
implementation defined?

I think this quote refers to more elaborated things than
character classification functions. Chapter and point?

Section 21.7, the last paragraph.

This paragraph looks confusing to me (I have a edition in Spanish,
but I think the translation is very good, and the paragraph you
quoted confirms it). Maybe it means "consult (the C++ standard or
more specialized books and) your implementation's documentation".

Hmm... The English version (3rd ed) says:

...a locale can also be used explicitly to control the appearance
of monetary units, dates, etc., on in put and output and conversion
between codesets. However, discussion of that is beyond the scope
of this book. Please consult your implementation's documentation.

Oh well.
 
T

Tr0n

Oh c*** - I'm talking to Andrew Koenig!?
:)
Thanks for discussing this though Andrew - I haven't had a discussion
that added spice to my blood in a couple of years!

Andrew said:
No -- it's more error-prone, not less, because it makes it easier for errors
to escape undetected.

You still won't give an example of these "errors".
While my example is not only already provided - you can also have
something like this:

int N=3; # This is the index limit we have
int incr=2; # This is the loop incremental
int n=0; # This is the initial index

while (n != N) {
cout << f(n) << endl;
n+=incr;
}

Could crash when going to index 4 - which is out-of-bounds.. Or still
loop until n goes out of the integer limit.
Here is a concrete example of what I'm talking about.

Suppose I want to write a loop that calls f(n) for every value of (integer)
n such that n>=0 and n<N.

Right, from here I'll add some notes just to make sure I get the right
idea..

... 0 <= n < N
Then I might write a loop with an invariant such as

I have called f(n) for every value of n such that n >= 0 and n < i

and initially establish the invariant by writing

i = 0;

This initialization of i establishes the invariant because there are no
values of n that are both >= 0 and < i.

The loop itself might look like this

while (i != N) { /* or i < N */
// increase i while maintaining the invariant
f(i);
++i;
}

Now, suppose we want to prove that this loop does what we intend.

.... and that intent is to step through each index (n) in the range:
0 <= n < N while:
0 <= n < i must also be held true
Regardless of whether the loop condition is i != N or i < N, we have to
prove that the loop terminates.

If the condition is i != N, we can't prove that the loop terminates unless
we prove that N >= 0, a fact that you are claiming argues that i < N is
superior.

However, if the condition is i < N, that complicates the second part of the
proof. Because in that case, when the loop does terminate, we do not know
that i == n, which means that instead of proving the original requirement:

You mean i == N? (I take it as you do..)
Since n < i as part of your above proof.

I think you are also missing a third statement from your proof.. That
you will call all values of i in the range:
0 <= i < N,
or increment i by the smallest integer value.
we have called f(n) for every value of (integer) n such that n >= 0 and
n < N

Which can also be stated as:
You call f(n) for every value of n where 0 <= n < i while i != N
(conceivably allowing f(N) to be called if i goes above N)
we have instead proved

we have called f(n) for every value of (integer) n such that n >= 0 and
n < i, where i is an unknown value >= n

I take it again you meant i as an unknown value >= N .
Which contradicts itself as the value of N may have possibly been passed
to f(n) - which it is unable to do.

This is better stated:
You call f(n) for every value of n where 0 <= n < i while i < N

I'll state this here;
In both cases, we assume that i is increased by the smallest integer
value, in which case the condition before invalidating the i < N
condition is where i = N-1 .

If you at all change the incrementing of the loop - or N - you can end
up in dangerous territory.
In other words, in exchange for simplifying the proof of termination, we
have complicated the proof of correctness.

I would much rather simplify the proof of correctness, because if for some
reason the program doesn't terminate, we'll surely know about it. In other
words, I'd rather have a program that goes into an infinite loop than one
that quietly produces the wrong answer.

Again, I see no proof that either way can produce an incorrect answer.
If an 'incorrect' answer is produced, surely it is the code inside the
loop which would be at fault.
(I've put 'incorrect' as it is technically an 'undesired' or
'unexpected' result which is produced)
I have no idea what you're talking about here.

Loading an input line (say from a file) into a data array (let's say
char[N}) while the length of the data is more than N.
(rather simplistic version)

Also errors where a program is 'expecting' something to be correct.
When have humans ever been perfect, and 'correct' all the time? And why
should our programs expect this?
Exactly. And to be confident that your code does what you program, you
should make it as easy as possible to prove that it does what you intend.

I try to program so that my code WILL do what I intend, but from the
above analysis - you are coming from the reverse angle - your code is
proven after the fact.

Thanks again for the discussion, it's really been an eye-opener.

--
(e-mail address removed)

A pumpkin warrior, brave and good
The last survivor from the wood
So go now swiftly climb the stair
And cut a lock of witch’s hair.
 
A

Andrew Koenig

Oh c*** - I'm talking to Andrew Koenig!?

Um, yes.
Thanks for discussing this though Andrew - I haven't had a discussion
that added spice to my blood in a couple of years!

You're quite welcome.
You still won't give an example of these "errors".
While my example is not only already provided - you can also have
something like this:

int N=3; # This is the index limit we have
int incr=2; # This is the loop incremental
int n=0; # This is the initial index

while (n != N) {
cout << f(n) << endl;
n+=incr;
}

Could crash when going to index 4 - which is out-of-bounds.. Or still
loop until n goes out of the integer limit.

Yes it could. However, this is something of an unusual example, as I think
you will see if you try to figure out the invariant of this loop.

It would be something like "we have called f(n) for values of n with the
following properties..." and because n is being incremented by a variable
(which, for all we know might change during the loop as a side effect of
calling f), those properties might be a bit of a pain to define exactly.
Right, from here I'll add some notes just to make sure I get the right
idea..

.. 0 <= n < N
Yes.


... and that intent is to step through each index (n) in the range:
0 <= n < N while:
0 <= n < i must also be held true

I prefer not to think of loops in terms of concepts such as "step through".
Instead, I like to think of them this way:

establish an invariant
while (a condition is true) {
change the values of variables while maintaining the truth of the
invariant
}

Then, after the loop terminates, I know that the invariant is still true,
and that the condition is now false.

In the example above, the invariant is

f(n) has been called for every integer n with 0 <= n < i

and the condition is i != N.
You mean i == N? (I take it as you do..)
Yes.

I think you are also missing a third statement from your proof.. That
you will call all values of i in the range:
0 <= i < N,
or increment i by the smallest integer value.

I don't understand this statement. I agree that it is necessary to prove
that we will call f(i) for all 0<= i < N, but that proof is a trivial
consequence of two claims:

1) The loop invariant is that we have called f(n) for all 0 <= n < i
2) After the loop terminates, i == N

The point is that the loop invariant is still true after the loop
terminates, so if i == N, we can substitute N for i in (1) above and see
immediately that we have called f(n) for all 0 <= n < N. If the loop
condition were i < N instead of i != N, we would still have to prove that i
> N was impossible.

Which can also be stated as:
You call f(n) for every value of n where 0 <= n < i while i != N
(conceivably allowing f(N) to be called if i goes above N)

Sorry -- when I said "we will call f(i) for all 0 <= i < N", I meant "for
all 0 <= i < N and no other values of i".
I take it again you meant i as an unknown value >= N .
Yes.

Which contradicts itself as the value of N may have possibly been passed
to f(n) - which it is unable to do.

Sorry; which *what* is unable to do?
This is better stated:
You call f(n) for every value of n where 0 <= n < i while i < N

I'll state this here;
In both cases, we assume that i is increased by the smallest integer
value, in which case the condition before invalidating the i < N
condition is where i = N-1 .

If you at all change the incrementing of the loop - or N - you can end
up in dangerous territory.

Indeed. But I think that's slightly off the point--at least the point I'm
trying to make.

The point I'm trying to make is that if your condition is i != N, you know
that after the loop terminates, i == N -- and you can use that fact in
reasoning about the program. If your condition is i < N, you know only that
i >= N, which is a weaker condition. So you have to go through additional
steps to show that even though i > N might be possible after the loop has
terminated, it cannot have happened while the loop was executing.
Again, I see no proof that either way can produce an incorrect answer.
If an 'incorrect' answer is produced, surely it is the code inside the
loop which would be at fault.
(I've put 'incorrect' as it is technically an 'undesired' or
'unexpected' result which is produced)

I completely agree that it is possible to prove that either form of the loop
is correct. I'm just saying that the proof is easier if the condition is i
!= N than if it is i < N.
I have no idea what you're talking about here.

Loading an input line (say from a file) into a data array (let's say
char[N}) while the length of the data is more than N.
(rather simplistic version)

If we're going to talk about pragmatics here, then we get into matters of
taste and experience. My experience is that it is common for people to
write i <= N when they really meant i < N, and that the problems that result
from such errors are exceedingly difficult to find. My taste is therefore
that it would be better to write i != N, yielding a program that works or
fails catastrophically, instead of possibly writing i <= N instead of i < N,
yielding a program that might fail undetectably.

But that's a matter of taste, about which reasonable people can disagree.
Also errors where a program is 'expecting' something to be correct.
When have humans ever been perfect, and 'correct' all the time? And why
should our programs expect this?

Programs don't expect anything; programmers do.
I try to program so that my code WILL do what I intend, but from the
above analysis - you are coming from the reverse angle - your code is
proven after the fact.

Proving code after the fact is extremely difficult, unless the code was
constructed in the first place to be easy to prove.
Thanks again for the discussion, it's really been an eye-opener.

You're quite welcome.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,261
Messages
2,571,040
Members
48,769
Latest member
Clifft

Latest Threads

Top