Requesting advice how to clean up C code for validating string represents integer

Roland Pibinger · Feb 13, 2007

Roland Pibinger wrote, On 13/02/07 15:45:

The OP wanted to be more specific in error reporting hence my suggesting
ways of analysing this further.

You have already trapped the case when endptr==chars above, so you know
that endptr!=chars if you reach here so I would consider the above test
to be a sign of the coder having not understood what s/he was writing.

.... or who wants to make explicit which condition is tested instead of
using a 'catch-all' else block. Since errno, endptr, chars and *endptr
are used in the if statements it's not so easy to correspond those
comparisons to the relevant parts or the strtol specification.

It is guaranteed not to happen!

I'll replace the line with assert(0).

Best regards,
Roland Pibinger

Flash Gordon · Feb 13, 2007

Roland Pibinger wrote, On 13/02/07 19:04:

... or who wants to make explicit which condition is tested instead of
using a 'catch-all' else block.

Then why isn't it
else if (endptr != chars && '\0' != *endptr && errno !- ERANGE)

> Since errno, endptr, chars and *endptr
are used in the if statements it's not so easy to correspond those
comparisons to the relevant parts or the strtol specification.

It is very easy. It is even easier to see that you you have a redundant
if because you have already checked for the opposite condition and your
final if only muddies the waters.

I can see no good reason to test for a COND and !COND in a simple if
chain such as this.

I'll replace the line with assert(0).

Slightly better would be to get rid of the last if above and just use an
else and then put an appropriate assert in the else clause. However, if
you are going to assert anything at all there is still the question why
you don't assert everything.

robert maas, see http://tinyurl.com/uh3t · Feb 14, 2007

From: Flash Gordon said:
So it still was not needed in the program you posted.

That depends on how you think of the program. If it had been
intended as a standalone program to distribute to others, then the
sleep could be regarded as a "dunzell" (StarTrek TOS jargon), i.e.
a part that serves no useful function. However in fact it was just
a test rig to develop modules which would later be installed
primarily in a CGI environment (where the toplevel stdin test loop
would not be present at all). As a test rig, where I might at any
time add new buggy code that might produce infinite spew, whereby
I'd need protection from modem-buffer disaster, it was quite
appropriate for the sleep to be in the toplevel loop at all times.
What was posted was just the current version of that test rig at
the moment I posted. But in fact that sleep would be present in
*any* version of that test rig at any time after I encountered the
modem-buffer disaster and consequently took precautions against it
ever happening again in any version of that test rig or any other
test rig descended from it.

If anyone happens to like my program enough to copy it and use it
themselves, but doesn't like the sleep in it, feel free to remove
it, but then don't complain to me if you subsequently try to modify
the program in other ways and introduce a bug and fill up your
modem buffers or even worse fill up all free swap space on your PC
and crash the OS and can't re-boot. (YMMV)

That is because it is the wrong approach
1) read the documentation to see what the correct way to do it is
2) write the code
3) test it

That's not good development technique. Documentation often is
misunderstood. If your approach is followed, your program might
have a subtle bug where you're not getting the value you thought
you're getting but you have the test written backwards or otherwise
wrong and for the cases you tested your multiple mistakes are
covering for each other making the program "work" despite being
totally wrongly written.

It's best to read the documentation (as I did, but did't include in
the steps of actual program development, sorry if you assumed
contrary to fact), and the install both the call to whatever
library routine *and* a printf of the return value, then look at
the output to see if it conforms to how you read the documentation
to mean, and if so then proceed to write the test on that basis.
But if the return value doesn't agree with what you thought the
documentation said, you need to consider various alternatives:
- You aren't calling the correct function because you loaded the
wrong library.
- You are calling the correct function in the wrong way (as
happened to me the first time I tried strtoll, see other thread).
- You misunderstood the documentation.

Once you are sure the function returns the value you expect in all
test cases that cover in-range out-of-range cases as well as
carefully constructed right-at-edge-of-range cases, if any of that
makes sense for the given fuction, *then* it's time to write the
test to distinguish between the various classes of results as you
now *correctly* understand them based on agreement between your
reading of documentation and your live tests.

So in this case, calling fgets, I needed to test all these cases:
- Empty input: NonNull return value, Buffer contains EOL NUL
- Normal input: NonNull return value, Buffer contains chars EOL NUL
- Input that overruns buffer: NonNull return value, Buffer contains chars NUL
- Abort via end-of-stream generated via ctrl-D: NULL return value.
- Abort via ctrl-C: Program aborts to shell all by itself.
- Abort via escape (a.k.a. altmode): Goes in as garbage screwup character, avoid.
One case (buffer overrun) I really needed to see for myself,
because the documentation didn't make it clear whether fgets would
omit the NUL so it could fill the entire buffer with data to cram
it all in and not lose that last character, or truncate the data
one shorter to guarantee a NUL was there. In fact the latter
occurs. But I was prepared to force a NUL there, overwriting the
last byte, if fgets had done the first instead. One thing I *did*
have to do is check whether the last character before the NUL was
EOL or not, and clobber it to NUL (shortening string by one
character per c's NUL-terminated-byte-array convention for
emulating "strings") only if it was EOL, so the string seen by the
rest of the program would consistently *not* have the EOL
character.

Now not all those cases are actually necessary for the end purpose
of this program, developing a module intended for CGI usage, but
it's nice to know my how the basic terminal-input routine of my
stdin test rig performs for *all* inputs before making extensive
use of it for *anything*. I don't want confusion later where I
don't know whether strange results are due to bug in test rig or
bug in the actual module I'm trying to develop.

Wrong. C does not allow *any* sleeping. ...

Let me re-phrase that: The sleep function provided by the library
whose header is unistd.h doesn't allow any sleep times except
integers. Now are you happy?

robert maas, see http://tinyurl.com/uh3t · Feb 14, 2007

From: CBFalconer said:
What for? You have a macro called EOF available. Use it.

Even if I were to take advice, I'd *still* as a first step put in a
printf to tell me the value of EOF and also the returned value so I
can see if they really are the same when I press ctrl-D.

But: <http://www.gnu.org/software/libc/manual/html_node/EOF-and-Errors.html>
Many of the functions described in this chapter return the value of
the macro EOF to indicate unsuccessful completion of the operation.
Since EOF is used to report both end of file and random errors, it's
often better to use the feof function to check explicitly for end of
file and ferror to check for errors.

It doesn't sound like comparing the return value with EOF is a good
way to diagnose what really happened. If I ever decide this needs
fixing, I'll fix it by checking both feof and ferror, not by
comparing with EOF. (Or compare with EOF as a first pass, then if
that matches go ahead and check both feof and ferror to see which
sub-case applies.)

robert maas, see http://tinyurl.com/uh3t · Feb 14, 2007

From: (e-mail address removed) (Roland Pibinger)

The linked code does not reflect the current C Standard:
"If the correct value is outside the range of representable values,
LONG_MIN, LONG_MAX ... is returned ... and the value of the macro
ERANGE is stored in errno."

Hmm, indeed I seem to recall seeing that in the specs as I was
researching this before deciding to use strtoll instead. I'll have
to take a look at that someday when I have time.

robert maas, see http://tinyurl.com/uh3t · Feb 14, 2007

From: (e-mail address removed) (Roland Pibinger)

IMO, the last part of the function should look like the following:
errno = 0;
long_var = strtol(chars, &endptr, 0);
if (ERANGE == errno) {
printf("Number out of range.\n");
} else if (endptr==chars) {
printf("No number or not parsable number given.\n");
} else if ('\0' == *endptr) {
printf("Looks good? N=%ld\n", long_var);
} else if (endptr != chars) {
printf("After number, extra characters on input line.\n");
} else {
printf("Unknown error, should never happen.\n");
}

Before I started this task, I made a design desision that
whitespace before and/or after the number is fine, but any other
stray character not part of the [optionalSign] oneOrMoreDigits is
an error. Your advice is inconsistent with the part of the decision
whereby trailing whitespace is fine.

Part of my decision was that whitespace allowance should be
symmetric. It should be allowed before iff allowed after. strtol is
assymtric in this respect, allowing whitespace before (and
rejecting stray non-white text before), but failing to distinguish
between trailing whitespace (OK) and trailing junk (Not OK), either
rejecting both (if caller checks to make sure the final pointer
matches end of string), or accepting both (if caller doesn't make
that check).

There's so much that strtol fails to check the way I want, that
it's best to just not use it at all for preliminary syntax
checking, so I ended up writing my own code, which first version
was ugly, but second version is pretty clean, making liberal use of
strspn and strcspn, which I didn't know about until after I had
already written that ugly first version (and translated it to
equally ugly c++), and then gone ahead to write clean lisp and java
versions, and then also gone ahead to write regex stuff for perl
and PHP, and finally I came back to look at the ugly C to see if I
might make it less ugly.

Your advice to use strtol to do the preliminary syntax check wasn't
good, but in an indirect way it helped, because searching for
documentation for strtol accidently turned up the documentation for
strtoll and for strspn and strcspn.

CBFalconer · Feb 14, 2007

robert maas said:
Even if I were to take advice, I'd *still* as a first step put in a
printf to tell me the value of EOF and also the returned value so I
can see if they really are the same when I press ctrl-D.

You NEVER need to know the value of EOF. You simply need to know
that it is negative, and outside the range of char, especially
unsigned char. This is why you usually receive chars in an int.

But: <http://www.gnu.org/software/libc/manual/html_node/EOF-and-Errors.html>

Many of the functions described in this chapter return the value of
the macro EOF to indicate unsuccessful completion of the operation.
Since EOF is used to report both end of file and random errors, it's
often better to use the feof function to check explicitly for end of
file and ferror to check for errors.

WRONG. Those functions are to distinguish between error and
physical EOF when some input routine actually returns EOF. By the
time feof has shown up it is too late to control use of the input
data. C is unlike Pascal in this respect.

BTW, please do not strip attributions for material you quote.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews

CBFalconer · Feb 14, 2007

robert maas said:
.... snip ...

Part of my decision was that whitespace allowance should be
symmetric. It should be allowed before iff allowed after. strtol
is assymtric in this respect, allowing whitespace before (and
rejecting stray non-white text before), but failing to distinguish
between trailing whitespace (OK) and trailing junk (Not OK),
either rejecting both (if caller checks to make sure the final
pointer matches end of string), or accepting both (if caller
doesn't make that check).

Not so. The returned value of endptr simply allows the user to
make that decision for himself.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews

Richard Heathfield · Feb 14, 2007

robert maas, see http://tinyurl.com/uh3t said:

From: Flash Gordon <[email protected]>

Click to expand...

[...] C does not allow *any* sleeping. ...

Click to expand...

Let me re-phrase that: The sleep function provided by the library
whose header is unistd.h doesn't allow any sleep times except
integers. Now are you happy?

"If something isn't in the a standard library for C, then it doesn't
exist for the purpose of this project." - robert maas, in the article
starting this thread.

<unistd.h> is not a standard header, and none of the functions for which
it is required to be included are in the standard library. Therefore,
by your own argument, the sleep function you are talking about does not
exist.

Flash Gordon · Feb 14, 2007

robert maas, see http://tinyurl.com/uh3t wrote, On 14/02/07 01:41:

That depends on how you think of the program. If it had been

<snip>

I think of programs as presented. As presented there was no reason for
the sleep.

If anyone happens to like my program enough to copy it and use it
themselves, but doesn't like the sleep in it, feel free to remove
it, but then don't complain to me if you subsequently try to modify
the program in other ways and introduce a bug and fill up your
modem buffers or even worse fill up all free swap space on your PC
and crash the OS and can't re-boot. (YMMV)

None of those would give me a problem. Even if it was possible for one
of those to give me a problem I would not need the sleep function.

You might want to find out how to use a debugger on your system, then
you can step through the code when you are not sure about it as part of
your testing.

That's not good development technique.

True, I should have included some earlier steps such as analysing the
requirements & designing the software.

> Documentation often is
misunderstood.

My experience if that the above applies to people who thing that
experimenting with a function is a good way to find out about it. It
does not in my experience apply to those who believe the best way to
find out is to read the documentation.

> If your approach is followed, your program might
have a subtle bug where you're not getting the value you thought
you're getting but you have the test written backwards or otherwise
wrong and for the cases you tested your multiple mistakes are
covering for each other making the program "work" despite being
totally wrongly written.

That is what testing if for. You feed in as much data (in the loosest
sense) as practical carefully crafted to do your damnedest to break the
code and thus find what is wrong with it.

You said in your post that the way to do it was basically to experiment
with the function.

It's best to read the documentation (as I did, but did't include in
the steps of actual program development, sorry if you assumed
contrary to fact),

I can only go on what you actually post.

> and the install both the call to whatever
library routine *and* a printf of the return value, then look at
the output to see if it conforms to how you read the documentation
to mean, and if so then proceed to write the test on that basis.
But if the return value doesn't agree with what you thought the
documentation said, you need to consider various alternatives:
- You aren't calling the correct function because you loaded the
wrong library.
- You are calling the correct function in the wrong way (as
happened to me the first time I tried strtoll, see other thread).
- You misunderstood the documentation.

Testing your program will find all of these. Well, it will if you test
it properly.

Once you are sure the function returns the value you expect in all
test cases that cover in-range out-of-range cases as well as
carefully constructed right-at-edge-of-range cases, if any of that
makes sense for the given fuction, *then* it's time to write the
test to distinguish between the various classes of results as you
now *correctly* understand them based on agreement between your
reading of documentation and your live tests.

So in this case, calling fgets, I needed to test all these cases:
- Empty input: NonNull return value, Buffer contains EOL NUL
- Normal input: NonNull return value, Buffer contains chars EOL NUL
- Input that overruns buffer: NonNull return value, Buffer contains chars NUL
- Abort via end-of-stream generated via ctrl-D: NULL return value.
- Abort via ctrl-C: Program aborts to shell all by itself.
- Abort via escape (a.k.a. altmode): Goes in as garbage screwup character, avoid.

All that would have been in the test set for testing your program so
having read the documentation and written the relevant module you would
test and see that it worked as expected, including the program
gracefully handling "garbage" input.

One case (buffer overrun) I really needed to see for myself,
because the documentation didn't make it clear whether fgets would
omit the NUL so it could fill the entire buffer with data to cram
it all in and not lose that last character, or truncate the data
one shorter to guarantee a NUL was there. In fact the latter
occurs.

If you cannot understand the documentation you have available that is
the time to ask those with more experience/knowledge. Had you at that
point posted here saying that it was not clear from the documentation
you have then someone here would clarify it for you.

> But I was prepared to force a NUL there, overwriting the
last byte, if fgets had done the first instead. One thing I *did*
have to do is check whether the last character before the NUL was
EOL or not, and clobber it to NUL (shortening string by one
character per c's NUL-terminated-byte-array convention for
emulating "strings") only if it was EOL, so the string seen by the
rest of the program would consistently *not* have the EOL
character.

So, as you cannot tell that from your documentation how do you know that
behaviour is not specific to your implementation and might not change
when a patch is installed on the machine later today?

Now not all those cases are actually necessary for the end purpose
of this program, developing a module intended for CGI usage, but
it's nice to know my how the basic terminal-input routine of my
stdin test rig performs for *all* inputs before making extensive
use of it for *anything*. I don't want confusion later where I
don't know whether strange results are due to bug in test rig or
bug in the actual module I'm trying to develop.

So you test your test rig once you have written it.

Let me re-phrase that: The sleep function provided by the library
whose header is unistd.h doesn't allow any sleep times except
integers. Now are you happy?

Yes.

Understanding what is part of C and what is not is important so that you
can isolate the system specifics and know what will have to be changed
to run the program on some other system.

Roland Pibinger · Feb 14, 2007

From: (e-mail address removed) (Roland Pibinger)
IMO, the last part of the function should look like the following:
errno = 0;
long_var = strtol(chars, &endptr, 0);
if (ERANGE == errno) {
printf("Number out of range.\n");
} else if (endptr==chars) {
printf("No number or not parsable number given.\n");
} else if ('\0' == *endptr) {
printf("Looks good? N=%ld\n", long_var);
} else if (endptr != chars) {
printf("After number, extra characters on input line.\n");
} else {
printf("Unknown error, should never happen.\n");
}

Click to expand...

Before I started this task, I made a design desision that
whitespace before and/or after the number is fine, but any other
stray character not part of the [optionalSign] oneOrMoreDigits is
an error. Your advice is inconsistent with the part of the decision
whereby trailing whitespace is fine.

Ok, in your original code you did not distinguish between (allowed)
trailing whitespace and (not allowed) extra characters:

} else if ('\0' != *endptr) {
printf("After number, extra characters on input line.\n");

Part of my decision was that whitespace allowance should be
symmetric. It should be allowed before iff allowed after. strtol is
assymtric in this respect, allowing whitespace before (and
rejecting stray non-white text before), but failing to distinguish
between trailing whitespace (OK) and trailing junk (Not OK), either
rejecting both (if caller checks to make sure the final pointer
matches end of string), or accepting both (if caller doesn't make
that check).

Here is a 'symmetric' version that allows for leading and trailing
whitespace but not for 'stray non-white text':

errno = 0;
long_var = strtol(chars, &endptr, 0);
if (ERANGE == errno) {
printf("Number out of range.\n");
} else if (endptr==chars) {
printf("Not a (parsable) number given.\n");
} else {
while (isspace (*endptr)) { // trailing whitespace?
++endptr;
}
if ('\0' == *endptr) {
printf("Looks good? N=%ld\n", long_var);
} else {
printf("After number, invalid extra characters on input
line.\n");
}
}

I hope that this is now a 100% solution. I agree that strtol is a good
example of how not to design a function interface.

Best regards,
Roland Pibinger

robert maas, see http://tinyurl.com/uh3t · Feb 14, 2007

From: (e-mail address removed) (Roland Pibinger)

Before I started this task, I made a design desision that
whitespace before and/or after the number is fine, but any other
stray character not part of the [optionalSign] oneOrMoreDigits is
an error. Your advice is inconsistent with the part of the decision
whereby trailing whitespace is fine.

Click to expand...

Ok, in your original code you did not distinguish between (allowed)
trailing whitespace and (not allowed) extra characters:

I don't believe you've even looked at my original code.
Do you rememer seeing this function definition?
/* Given a string (nul-term), and index where digits ended,
scan to very end making sure no junk, return code:
garafnum = garbage after number */
enum errcode strchkint4(char* str, int* pix) {
char ch;
while (1) {
ch = str[*pix];
if ((0 == ch) || ('\n' == ch)) {
/* printf("At ix=%d, ch=%c, nul/eol reached.\n", *pix, ch); */
return(0);
}
else if (' ' == ch) {
/* printf("At ix=%d, ch=%c, skip white.\n", *pix, ch); */
(*pix)++;
}
else {
/* printf("At ix=%d, ch=%c, junk.\n", *pix, ch); */
return(garafnum);
}
}
}
If you don't remember seeing that code, then you haven't looked at
the original code I wrote for the C implementation of this task,
because *that* is the relevant original code.

} else if ('\0' != *endptr) {
printf("After number, extra characters on input line.\n");

You're totally confused. That's not my original code at all.
Here's the chronology:
-1- Original code, such as the piece I posted above.
-2- Translation of original code to C++, which can be found here:
<http://www.rawbw.com/~rem/HelloPlus/CookBook/h4s.html#h4intcpp>
-3- Complete re-write in Common Lisp.
-4- Translation of lisp version to java.
-5- Complete re-write in perl.
-6- Translation of perl version to PHP.
-7- Getting advice to try strtol.
-8- Researching strtol, discovering strtoll which is better.
-9- Trying strtoll in test rig, having trouble.
-A- Getting advice about why strtoll didn't work for me.
-B- Fixing test rig to use strtoll correctly, but being dissatisfied
because it fails to distinguish between trailing whitespace and
trailing junk.
-C- Discovering strspn and strcspn.
-D- Translating lisp/java version to c using strspn and strcspn,
using strtoll only after the syntax check has already been
completed.
-E- Your confusion between the first version -1- using while loop
and something somewhere from -9- to the end using strtol[l].

I agree that strtol is a good example of how not to design a
function interface.

At least we're in agreement about that one thing!

There's still the policy decision whether to show absolute
beginners how to write their own code, such as scanning for the
first character that matches or doesn't match a bag of some type,
using position-if and position-if-not in Common Lisp or strspn and
strcspn in C, or just call a magic genie which does almost what you
want but screws up in one aspect requiring a post-call fixup to
make the result 100% correct. At the moment, I prefer the scanning
method in all languages except perl and PHP, because it's
symmetric, and easily translatable between for languages rather
than special to just one add-on library of one laguage. In perl and
PHP I'm presently using regular expressions, a sort of "magic
genie" but without the design flaw that strol[l] have, because (1)
they are nicely integrated into the language, no hassle to use
them, and (2) they are in fact advertised as a primary reason to
use those languages so I might as well show off such usage when I'm
comparing how to do the same task in all six languages.

On the other hand, that's slightly moot for this specific purpose,
which was merely to extract a numeric value from a HTML FORM field
string in the safest way possible, so that the numeric value could
then be used in the actual sample code fragment, which I haven't
started writing yet.

If anyone is curious about the overall project (multi-language
"cookbook" in form of matrix per one or two datatypes that each
operation/function deals with), I've finished all the built-in c
and c++ operators, and their Common Lisp equivalents, and now I'm
doing the c libraries, starting with ctype.h where I'm about
halfway finished. See toplevel "cookook" file:
<http://www.rawbw.com/~rem/HelloPlus/CookBook/CookTop.html>
click on chapter 3 skeleton in progress.

Flash Gordon · Feb 14, 2007

robert maas, see http://tinyurl.com/uh3t wrote, On 14/02/07 20:23:

At least we're in agreement about that one thing!

There's still the policy decision whether to show absolute
beginners how to write their own code, such as scanning for the
first character that matches or doesn't match a bag of some type,
using position-if and position-if-not in Common Lisp or strspn and
strcspn in C, or just call a magic genie which does almost what you
want but screws up in one aspect requiring a post-call fixup to
make the result 100% correct.

From your perspective it might "screw up" one aspect, but that is
because you are assuming the string is meant to have only one data item.
strtol and friends are designed on the basis that you might want to pass
the rest of the string to something else, so they tell you where to
start. In your case that is looking to see if the remainder is white
space or not, but sometime people might be doing other things.

> At the moment, I prefer the scanning
method in all languages except perl and PHP, because it's
symmetric, and easily translatable between for languages rather
than special to just one add-on library of one laguage. In perl and
PHP I'm presently using regular expressions, a sort of "magic
genie" but without the design flaw that strol[l] have, because (1)
they are nicely integrated into the language, no hassle to use
them, and (2) they are in fact advertised as a primary reason to
use those languages so I might as well show off such usage when I'm
comparing how to do the same task in all six languages.

On the other hand, that's slightly moot for this specific purpose,
which was merely to extract a numeric value from a HTML FORM field
string in the safest way possible, so that the numeric value could
then be used in the actual sample code fragment, which I haven't
started writing yet.

Personally I would still go with strtol[l] and then check whether the
trailing data is white space or not.

If anyone is curious about the overall project (multi-language
"cookbook" in form of matrix per one or two datatypes that each
operation/function deals with), I've finished all the built-in c
and c++ operators, and their Common Lisp equivalents, and now I'm
doing the c libraries, starting with ctype.h where I'm about
halfway finished. See toplevel "cookook" file:
<http://www.rawbw.com/~rem/HelloPlus/CookBook/CookTop.html>
click on chapter 3 skeleton in progress.

Looking at some of the earlier stuff you have work to do there as well.
The hello world programs in C are using implicit int for main which is
not allowed in the latest standard, the web one fails to include stdio.h
which is required (unless you want to do the work of providing your own
prototype), and one of them is a deliberately obfusticated program which
relies on ASCII which the C standard does not guarantee.

This from your "CookBook" is wrong for C95 and earlier, and since you
use implicit int all over the place you are not using C99:

| In c, each function definition is supposed to be before the first time
| it is called. That's because the compiler works forward through the
| file checking each fuction-call to make sure the function is defined,
| and generates an error message immediately when it sees a attempt to
| call a function that isn't defined.

So is this prototype you show "int g2(int n1,n2);"

There are several other errors.

I suggest you need to learn C properly before writing any kind of
"CookBook" that includes C in the languages it uses.

Keith Thompson · Feb 15, 2007

Flash Gordon said:
robert maas, see http://tinyurl.com/uh3t wrote, On 14/02/07 20:23: [snip]
This from your "CookBook" is wrong for C95 and earlier, and since you
use implicit int all over the place you are not using C99:

| In c, each function definition is supposed to be before the first time
| it is called. That's because the compiler works forward through the
| file checking each fuction-call to make sure the function is defined,
| and generates an error message immediately when it sees a attempt to
| call a function that isn't defined.

So is this prototype you show "int g2(int n1,n2);"

There are several other errors.

Including the use of "defined" rather than "declared". A function
call requires a declaration for the called function; it doesn't
require a definition. (That's in C99; C90 allows calls without
declarations, but providing declarations, preferably prototypes, is
still an excellent idea.)

robert maas, see http://tinyurl.com/uh3t · Feb 15, 2007

From: Flash Gordon said:
you are assuming the string is meant to have only one data item.

Yes, that's the situation here, when validating the contents of a
single HTML-FORM text field, which is supposed to contain exactly
the representation of one integer using decimal notation,
optionally with whitespace around it either/both way(s).

strtol and friends are designed on the basis that you might want
to pass the rest of the string to something else, so they tell
you where to start.

So basically you make sure you've gobbled everything preceding the
item of interest, except whitespace, then you call the function,
which skips the leading whitespace and gobbles the item of
interest, leaving any trailing whitespace and any items of later
interest. So whitespace is treated in an asymmetrical manner, and
at the very end of a chain of [white]* [item]! parsing you have a
single [white]* [null] parser just to verify somebody didn't leave
more useful items that haven't been gobbled?

I'll have to remember that paradigm if and when I ever ask a user
to type in more than one item on a single line, such as if I ever
write a CGI-accessible Soduku solver where a whole row is entered
in a single text field.

Thanks for explaining that other input paradigm, sorta like scanf
but more robust.

In your case that is looking to see if the remainder is white
space or not, but sometime people might be doing other things.

Yes. If I wanted to fit my single-item syntax-check into that
multi-item-chain paradigm, I'd have to do it like you suggested in
an earlier message. But unfortunately when it says "no number
present" it really means "no number *immediately* present at start
of line, ignoring optional whitespace". So to satisfy my spec, that
would have to be sub-cased, where if it hits the no-number
condition I'll have to scan for a digit anyway to separate the
sub-cases of junk-before-number and truly-no-number-anywhere.

For now I still like the strspn and strcspn version best for the
current application. But thanks for the explanation of the other
paradigm that I might use for another application someday.

Looking at some of the earlier stuff you have work to do there as
well. The hello world programs in C are using implicit int for
main which is not allowed in the latest standard, the web one
fails to include stdio.h which is required (unless you want to do
the work of providing your own prototype),

Let me use -Wall to fix all that ... h.c h1.c h2.c done

In cgis.c (needed for h3.c and beyond), there's a line of code that
shifts the existing value to the left 4 bits and then adds in the
four new bits obtained from the hexadecimal character in the string
it's walking. The line of code looks like this:
c = c<<4 + h;
but the gnu c compiler complains:
cgis.c:118: warning: suggest parentheses around + or - inside shift
Give that there are clearly extra spacing around the =, while the
<< is compact, it's quite clear the intention of the author was:
c = (c<<4) + h;
so it's stupid for the compiler to suggest making it instead:
c = c<<(4 + h);
Should I leave it as-is, or put parens around the shift to avoid
the stupid mis-leading warning?? (Your personal opinion, what you'd
do in my circumstance, writing code examples to share with others,
but in this case simply using somebody else's module to which I
already had to fix a bug before it'd compile.)

Fixed h3.c, all done. Thanks for the heads-up. All my code worked
fine as they were, but they are supposed to be examples for novices
to copy and try and emulate etc. so they faltered in that respect.
Take another look now if you have time.

and one of them is a deliberately obfusticated program which
relies on ASCII which the C standard does not guarantee.

Which one specifically? Cite a line of code taht relies on ASCII
and I'll get the idea which section of it to study?

... you use implicit int all over the place ...

The only place I used implicit int was in return value for main,
which has now been fixed in all cgi-bin/*.c files unless I screwed
up somewhere.

So is this prototype you show "int g2(int n1,n2);"

I don't see anything wrong with that prototype. Do I need to
declare n1 and n2 separately, like this?
int g2(int n1, int n2);

There are several other errors.

Feel free to find a couple totally different errors and tell me
about them, the I'll fix them and anything else they remind me of.

I suggest you need to learn C properly before writing any kind of
"CookBook" that includes C in the languages it uses.

I already took three semester-length C classes. That's all that are
offered at De Anza College. What do you suggest for further
correction of anything I happened to get wrong after three
semesters of formal study plus various Web-based exploration
looking for specific info such as strtoll and strcspn?

Random832 · Feb 15, 2007

2007-02-13 said:
Not a bug. It's just that the part of the program to detect EOF wasn't yet
written, and that's the very part I was trying to develop.
Step 1: Put in a printf to see what value comes back when I press ctrl-D.
Step 2: Write code to detect that value and break out of loop.
Step 3: Test that to see whether it works.
Step 4: Remove the printf.
Unfortunately step 1 blew me out for ten minutes or so without the sleep.

Why was there a loop at all in step 1?

Flash Gordon · Feb 15, 2007

robert maas, see http://tinyurl.com/uh3t wrote, On 15/02/07 00:24:

Let me use -Wall to fix all that ... h.c h1.c h2.c done

You should use "-ansi -pedantic" as well, together with possibly -W.

In cgis.c (needed for h3.c and beyond), there's a line of code that
shifts the existing value to the left 4 bits and then adds in the
four new bits obtained from the hexadecimal character in the string
it's walking. The line of code looks like this:
c = c<<4 + h;
but the gnu c compiler complains:
cgis.c:118: warning: suggest parentheses around + or - inside shift
Give that there are clearly extra spacing around the =, while the
<< is compact, it's quite clear the intention of the author was:
c = (c<<4) + h;
so it's stupid for the compiler to suggest making it instead:
c = c<<(4 + h);

You consider a compiler to be stupid for following the language
specification? C, like most computing languages, does not use white
space to group expressions. I seem to recall you also cover Perl in your
"CookBook" and based on this one assumption I would say you don't know
Perl or C.

Did you actually even go to the effort of trying code before putting it
up on your web site? I think not.

I already took three semester-length C classes. That's all that are
offered at De Anza College.

I'm sorry, but either you failed or those courses based on your current
knowledge or they appear to be almost worthless.

> What do you suggest for further
correction of anything I happened to get wrong after three
semesters of formal study plus various Web-based exploration
looking for specific info such as strtoll and strcspn?

Well, in general doing web-based stuff is a bad idea unless you have a
*very* good reason to trust the specific ones you are using. Your
"CookBook" currently seems to be a prime example of why you should *not*
trust web resources.

Do the world a favour and take down your "CookBook" since you are a long
way from having enough knowledge to write it for even one language, let
alone 6.

I suggest you start looking at the comp.lang.c FAQ (Google will find it)
and buy a copy of K&R2 (the full details are in the bibliography of the
FAQ). Work through *all* the exercises in K&R2 starting with the
assumption that you do not know C since you really do not know it.

Richard Bos · Feb 15, 2007

But: <http://www.gnu.org/software/libc/manual/html_node/EOF-and-Errors.html>
Many of the functions described in this chapter return the value of
the macro EOF to indicate unsuccessful completion of the operation.
Since EOF is used to report both end of file and random errors, it's
often better to use the feof function to check explicitly for end of
file and ferror to check for errors.

GNU is wrong on ISO C and does not care. Film at eleven.

Richard

CBFalconer · Feb 15, 2007

Richard said:
GNU is wrong on ISO C and does not care. Film at eleven.

In what way?

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews

robert maas, see http://tinyurl.com/uh3t · Feb 16, 2007

From: Keith Thompson said:
Including the use of "defined" rather than "declared".

OK, there was one section of CookTop.html that was sloppy in the
jargon. I think I've tentatively fixed it. It's rather awkward at
present, but at least it doesn't confuse the two terms. Here's the
(backwards) diff:
% diff CookTop.html*
1787,1788c1787
< checking each fuction-call to make sure the function is declared (i.e.
< at least a prototype showing return type and formal parameters), and
---

checking each fuction-call to make sure the function is defined, and

1790,1793c1789,1791
< function that isn't declared. It can't guess that you're calling a function
< you will be defining later in the file. Most of the time you actually define
< each function before using it. But if you really must call a
< fuction before you defie it, for example if you have two functions that
---

function that isn't defined. It can't guess that you're calling a function
you will be defining later in the file. But if you really must call a
fuction before you define it, for example if you have two functions that

1795,1796c1793
< You write just a declaration for any function that needs to be called before
< it's defined. You write the type of return value,
---

You write a function-definition template. You write the type of return value,

1811c1808
< to try to keep the declaration matching the actual function definition
---

to try to keep the template matching the actual function definition

Thanks for the "heads-up".

How can I view / open / render / display a pdf file with c code?	0	Sep 23, 2023
Arrays and Functions (how to clean up code)	3	Nov 2, 2009
How to try a range of hex values in C# code ?	0	Nov 19, 2022
Copy string from 2D array to a 1D array in C	1	Nov 1, 2023
If you need to code a Windows Forms software that uses C# software how do i make the design for a software that makes this Post Description function ?	0	Sep 21, 2022
How to debug every line of a c code with macros like functions ?	0	Aug 8, 2022
How do i add parentheses and exponents to my code?	2	Dec 1, 2022
Using GIT to get remote code	1	Dec 30, 2021

Requesting advice how to clean up C code for validating string represents integer

Roland Pibinger

Flash Gordon

robert maas, see http://tinyurl.com/uh3t

robert maas, see http://tinyurl.com/uh3t

robert maas, see http://tinyurl.com/uh3t

robert maas, see http://tinyurl.com/uh3t

CBFalconer

CBFalconer

Richard Heathfield

Flash Gordon

Roland Pibinger

robert maas, see http://tinyurl.com/uh3t

Flash Gordon

Keith Thompson

robert maas, see http://tinyurl.com/uh3t

Random832

Flash Gordon

Richard Bos

CBFalconer

robert maas, see http://tinyurl.com/uh3t

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads