Requesting advice how to clean up C code for validating string represents integer

Keith Thompson · Feb 19, 2007

Yevgen Muntyan said:
Keith said:

Yevgen Muntyan said:

robert maas, see http://tinyurl.com/uh3t wrote:
From: "Racaille" <[email protected]>
I would just use sscanf() and be done:
[snip]
Start at the top of:
<http://www.rawbw.com/~rem/HelloPlus/CookBook/Matrix.html>
What's "GNU-c" and why are some functions "GNU-c" while others
are "c"? Then some people read it and think they use GNU-c,
as some people use C/C++...

Click to expand...

GNU C is the C-like language accepted by gcc. It's very similar to
standard C, but has a number of extensions, some of which violate the
C standard.

Click to expand...

You sure you are talking about the same thing as that silly "GNU-c"
on the website?

I can only address what "GNU C" *really* means. I can't guess what
the author of the web page meant by it. We'll both just have to wait
for him to respond (or to correct the web page).

Yevgen Muntyan · Feb 19, 2007

CBFalconer said:
Keith said:

Yevgen Muntyan said:

robert maas, see http://tinyurl.com/uh3t wrote:

[snip]
Start at the top of:
<http://www.rawbw.com/~rem/HelloPlus/CookBook/Matrix.html>
What's "GNU-c" and why are some functions "GNU-c" while others
are "c"? Then some people read it and think they use GNU-c,
as some people use C/C++...

Click to expand...

GNU C is the C-like language accepted by gcc. It's very similar
to standard C, but has a number of extensions, some of which
violate the C standard.

Note that gcc is a compiler, not a complete implementation; the
C runtime library is provided separately, and varies from one
platform to another. There is a GNU C library (separate from,
but usable with, gcc).

But I have no idea why the referenced web page applies the
"GNU-c" tag to the isblank() and strtoll() functions, both of
which are standard C (but both are new in C99).

Click to expand...

I believe the referenced page is authored by Maas. If Yevgen scans
old posts in this newsgroup for Maas's posts (google is handy for
this), and the myriad corrections required, he may be able to
evaluate the accuracy of that page, at least as far as it applies
to C.

You broke my plan guys. I figured it'd better to ask instead of
stating that something is nonsense, this thread shows that the former
is more likely to work.

If this guy knows google-fu well enough, he can even get readers,
and those readers might even read what he wrote out there. And
then be afraid to use scary GNU-c thing. Or something. Been there
myself (as a reader, that is).

Silly code pieces are not as bad as things like "GNU-c" or what he
says about character type. The former won't be used by anyone, the
latter will be memorized, unintentionally. Anyway, too tired to speak
English. And was tired when posted, silly idea.

Yevgen

Richard Bos · Feb 19, 2007

Malcolm McLean said:
There is a case for using terms in the way the ANSI committee do in the C
standard.

However the committee is not important enough to be allowed to define how we
use basic programming terms in the English language, even in a C context. If
you are using a term that has other meanings, like "object", in the sense
defined by the standard, then really you ought to qualify "as defined by the
standard".

TTBOMK, the word "object" had a couple of meanings in programming, one
of them being the one used in the ISO(! ANSI is old hat and foreign) C
Standard, long before this new-fangled fad called "object oriented
programming" introduced a new one. So really, it's the OOPsers who need
to conform, not the rest of us.

Richard

Chris Dollin · Feb 19, 2007

robert said:
No wonder everyone is confused.

/I'm/ not confused. Not about this.

In the statement:
x = 5*w;
x is not a value of any kind, it's a **name** of a **variable**
which probably has a value but the value most certainly isn't x.

Duh. Yes. I didn't say otherwise.

The /evaluation/ of the expression `x` will yield an lvalue,
which we can usefully think of as "the address of x".

If w has the value 3, then after executing that statement x will have
the value 15. 15 isn't an lvalue, is it? But you say it is!!

No, I don't. Why do you think I do?

That's bullshit.

I admire your ability to produce a cogent and informed argument.

'a' is the *name* of a *place* where data can be
stored and later retrieved. Depending on where a place is specified
in a statement, either the retrieval-from-place or storage-into-place
operation may occur. Some expressions denote a place, such as 'a'
in the above example, or chs[2] in the following example:
char chs[30];
chs[2] = 'a' + 5;
printf("The third character is '%c'.\n", chs[2]);
Some expressions don't denote a place, such as "'a' + 5" in the
above example. Such expressions can be used only to produce a value
*from* the expression, not to store a value gotten from elsewhere
*into* the expression.

Expressions that "denote a place" are those that can be evaluated
for their lvalues. Those that don't, can't.

I fail to see how it makes any sense at all.

Consider something like

fun f( x ) = ... x := x + 1 ...
... f( 1 ) ...

in a language with pass-by-binding (the formal argument is
bound to the (l)value of the actual argument). The body of
`f` can dink around with `x`, even if the actual argument
is a literal, and without affecting /all/ the places where
the value of `1` is required.

Your FORTRAN example demonstrates why it can be better for
each (l)evaluation of `1` to yield a /new/ lvalue, not the same
one every time.

That's a completely garbled way of thinking of it.

You are mistaken.

To store a value
into a place you need to know what function to call to effect the
storage.

(`affect`, not `effect`)

No; in fact you don't. You can have a uniform way of /updating/
the store, and different ways of computing the lvalue. Since
the update is "done" by an operation which is roughly

update( store, lvalue, rvalue )

there's plenty of room to capture any interesting details.

Common Lisp clarifies the whole idea best with SETF.

That's a matter of opinion: whether it's a "clarification" to
make assignment depend on that systematic use of macros is ...
a choice.

(fx:snip)

It's not enough to say
where to make the change, you must say how specifically to make the
change there, whether to modify the left side or the right side.

Yes. Which doesn't say anything against the lvalue/rvalue description.

(I'm using c notation here to make it easier for you to understand):

Why do you think using C will do that? It's not as though I'm
unfamiliar with Lisp, after all.

Now (what you can do *only* in Common Lisp), you define a SETF
method so that the code:

You can do something very similar using Pop11's updaters. They
don't use macros to do it: they use functions plus one assignment
rewrite rule. Because they don't depend on the /macro-time/
value of the identifiers in function position, they will also
work on procedure /arguments/ as well as /globals/.

I thought Dylan had something similar to SETF as well.

In summary, there's no value (usefulness) to an "lvalue" as you
explain it.

Then I've done a worse job of explanation than I'd wish.

What is misnomered an "lvalue" is really c's version of a setf method,
which unfortunately can't be extended by users as it can in lisp.

I think you're confusing the `lvalue` I was explaining with the
term as it's used in C, and also trying to view the non-Lisp
world exclusively through Lisp lenses.

Chris Dollin · Feb 19, 2007

Ben said:
No, I think it is reasonable (though informal) definition of why the
term came into common use. If you want a very formal analysis you
must turn to denotational semantics, originally developed by
Christopher Stratchey and Dana Scott. I think Chris Dollin referred
to Strachey if not to the topic of denotational semantics.

Indeed so. /That/ topic would take me much further outside my
CLC-comfort-zone.

robert maas, see http://tinyurl.com/uh3t · Feb 19, 2007

From: Keith Thompson said:
C99 7.19.6.7p3:
The sscanf function returns the value of the macro EOF if an input
failure occurs before any conversion. Otherwise, the sscanf
function returns the number of input items assigned, which can be
fewer than provided for, or even zero, in the event of an early
matching failure.

Does the C99 standard say anywhere whether a value is or is not
assigned in case of overflow? That is, does it say either of these:
- In case of overflow, assignment must be suppressed.
- In case of overflow, a truncated value must be assigned.
If it says neither (modulo wording of course, anything close would
suffice to define which behaviour is required), then I agree with:

Unfortunately, sscanf() with "%d" invokes undefined behavior if the
number can't be represented as an int (i.e., on overflow).

Accordingly I think my policy of first using strspn and strcspn to
efficiently validate the *syntax* of the integer first, and then
using strtoll to convert to long long and tell if overflow occurred
there, and then check against MIN/MAX values to determine whether
that long long can be safely cast to the narrower type wanted by
the application, is the "right" algorithm for my purpose. It's a
little more work, but it completely eliminates undefined behaviour,
and it gives diagnostics with maximum discriminatory power. Yeah,
sure, I could be sloppy and just say "syntax error" or "overflow",
or even allow silent overflow, and let the user figured out what's
wrong with what he put in the Web form, but I don't want that to be
my standard for user friendliness.

Thanks for your fine argument re ambiguity (undefined behaviour) in
C99 standard hence one more reason not to use sscanf all by itself
for validating Web-form contents. Personally, I don't consider use
of undefined behaviour to be good programming practice, so I surely
don't want to show such use in my examples for readers of my
"cookbook". (If anyone catches a place where I've been sloppy in
this respect, I hope you alert me about it!)

robert maas, see http://tinyurl.com/uh3t · Feb 19, 2007

From: Yevgen Muntyan said:
What's "GNU-c" and why are some functions "GNU-c" while others
are "c"? Then some people read it and think they use GNU-c,
as some people use C/C++...

I'm trying to warn the reader about some functions which aren't
defined in the official standard, but *are* available in GNU C
which is perhaps the most popular implementation (of compiler and
loader-commands to use corresponding libraries), so the reader
won't be totally confused if he writes code relying on such a
library function but it's not available in the implementation he's
using. If you can suggest a better way to note this distinction
between functions guaranteed to be available in all conforming
implementations and those "extras" provided by major vendors,
please do. I don't like the idea of just having a caveat at the top
of the document saying "some of the c functions described below
aren't available in all implemetations of c". I'd rather
individually flag those very few which might not be available.

Now if the particular function is in the C99 spec, that would be
better to mention, but again I'd need some way of annotating those
functions which aren't in older C but are in C99 without confusing
the reader.

As to C/C++: Because I'm carefully annotating which library needs
to be indluded for *every* library function without exception, the
C and C++ entries for those will always be done separately, because
C++ has renamed all the C libraries to get the C++ compatible
version thereof. So when I am finished with these parts of the
matrix chapter of my "cookbook", there will typically be three LI
elements, the original C library, the C-style C++ converted
library, and the true flavor of C++ way of doing the same kind of
operation (in addition to the equivalents in lisp/java/perl/PHP of
course). I condense the c/c++ part (and java too if applicable)
only where no library is required so you can just write the code
without checking if you loaded the library first. That case applies
of course only to built-in operators operating on primitive types
or structs etc., not to any library calls.

By the way, I hope my readers will look up the complete definition
of the function, using Google for example, any place my quickie
definition isn't quite complete in all details. At least I've
provided the name of the function, which is of use in a Google
search, and the syntax of the call (parameters required, and
general nature of return value expected) which in conjunction with
the formal definition found elsewhere should be enough for the
reader/programmer to write code using the function I've described.
I'm thinking of including in the "suggestions for use of this
document" a more specific recommendation of search terms to use
when looking for complete specifications, someday ...

robert maas, see http://tinyurl.com/uh3t · Feb 19, 2007

From: Keith Thompson said:
This is comp.lang.c, where we discuss the C programming language,
which is defined by the ISO C standard, which has a perfectly
good technical definition for the word "object".

Is it anything like last year's new technical definition of "planet"
by the IAU (International Astronom... Union)? :-(

I'm not saying that you shouldn't discuss "objects" in the OOP
sense, or in the sense of physical things, or whatever.

OK, then given your instance, I'd be glad to try very hard to
always qualify my use of the word "object" with one of:
- In the original lisp sense, any distinctly allocated block of
memory with a single handle on it, which would include c structs
allocated with malloc or calloc.
- In a generalization of that sense which also includes
static/stack allocated structs and arrays in c.
- In the OOP sense, encapsulation of original lisp/struct sense
together with instance methods tied to the class of such
instances (not separate copy within each individual instance as
some java textbooks mis-state).

I'm just saying that *in this newsgroup*, the word has the
overriding meaning of a "region of data storage in the execution
environment, the contents of which can represent values".

What is the definition of "region" used there? Does it refer *only*
to contiguous blocks of memory which are guaranteed to be
contiguous per the spec, or does it also include blocks of memory
which just happen to be contiguous per one implementation but not
necessarily another? For example, if the declarations are:
short int a[9];
long int b[5];
short int c[7];
if a particular compiler optimizes space by moving the long int
array ahead of either of the short int arrays to reduce amount of
padding needed to respect long boundaries, so that
a[7] a[8] c[0] c[1] c[2]
form a contiguous block of memory, is that considered a "region"
hence an "object"??

I suspect that ISO hasn't defined the meaning of "region" any more
than the IAU defined the meaning of "clearing the neighborhood".
But I'll leave that conclusion as tentative, pending your reply.

Also, I'm not clear on the intended meaning of "contents" and
"values". Is it possible for an object to have only one content
which represents only one value, or must "contents" and "values" be
used strictly in a plural sense? So for example:
short int d[1];
That has only one content, which can represent only one value,
right? Or is "values" plural because you can re-assign that single
array-cell to have different values at different times?

robert maas, see http://tinyurl.com/uh3t · Feb 19, 2007

From: Keith Thompson said:
GNU C is the C-like language accepted by gcc. It's very similar
to standard C, but has a number of extensions, some of which
violate the C standard.

Do you happen to know if there's a complete list of such violations
online in accessible format (plain text or simple HTML)? I'd like
to consult such a list as I develop my cookbook/matrix.

Note that gcc is a compiler, not a complete implementation; the C
runtime library is provided separately, and varies from one
platform to another. There is a GNU C library (separate from, but
usable with, gcc).

Yes, I understand that. Presumably if you compile a c program with
gcc, and specify the generic names of the headers for the various
libraries, for example:
#include <stdlib.h>
rather than specifying the actual path to the header to library, for example:
#include "/usr/local/bin/ansi/c/stdlib.h"
and you don't use a switch such as -ansi that forces gcc to use the
ansi instead of gnu version, then gcc will automatically arrange
that you get the GNU C version of each library rather than the ANSI
version. (Correct me if I'm wrong on this point!)

When I have line that looks like this:
<li>GNU-c (#include <stdlib.h&gt

-- <em>mumble(x,y)</em></li>
I mean to imply that the function mumble is defined in the GNU C
version of the stdlib library but not in the corresponding ANSI
version of that same-name library. If I make a mistake in such
annotation, feel free to correct me.

I'm thinking of changing my notation. Instead of saying "GNU-c" as
the language as I do there, instead just say "c", but have a
footnote that explains the situation regarding GNU vs. ANSI. I'll
be thinking more about it later today and maybe start updating the
file as soon as I have decided how exactly do do it.

I'd especially like to note differences between original/universal
c and C99, and differences between C99 and GNU, but still just
label it all 'c' with footnote, maybe, still thinking...

But I have no idea why the referenced web page applies the
"GNU-c" tag to the isblank() and strtoll() functions, both of
which are standard C (but both are new in C99).

I had the impression they were not in original c but are in GNU c,
so that was the distinction I was making. But if they're in C99 as
you claim, then I'd rather change that to show they're in C99
(instead of GNU c) but not in original c.

Is the C99 standard online in searchable/HTML format, for free, so
that I could consult it to verify fine points like this instead of
just taking your word for it? And are you referring to ANSI C99 or
ISO C99 anyway??

<http://en.wikipedia.org/wiki/C_(programming_language)#C99>
publication of ISO 9899:1999 in 1999. This standard is commonly
referred to as "C99." It was adopted as an ANSI standard in March
2000.
That's not clear whether it's both an ISO standard and ANSI
standard, from the same document, or an ISO document but only an
ANSI standard.

GCC, despite its extensive C99 support, is still not a completely
compliant implementation; several key features are missing or don't
work correctly.[2]
<http://gcc.gnu.org/c99status.html>
Is that where I should be checking for any differences between C99
standard and GNU C actuality?

Also, where in your opinion is the best online reference for
pre-C99 versions, especially the original K&R C, which presumably
*every* implementation has had plenty of time to get right already?
(I'm mostly interested in the standard libraries, functions therein.)

CBFalconer · Feb 19, 2007

robert maas said:
Does the C99 standard say anywhere whether a value is or is not
assigned in case of overflow? That is, does it say either of these:

No, it says the behaviour is undefined. Anything may happen,
including launching WWIII. I believe C99 adds the possibility of
causing a signal. If you look at the input parsers I have
published here you will see means of detecting incipient overflow
and returning an appropriate error.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews

CBFalconer · Feb 19, 2007

robert maas said:
Do you happen to know if there's a complete list of such violations
online in accessible format (plain text or simple HTML)? I'd like
to consult such a list as I develop my cookbook/matrix.

No. The standard says that anything that is not defined in the
standard causes undefined behaviour. You can (and should) read it
for yourself. Search for N869 or N1124.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews

Keith Thompson · Feb 19, 2007

Chris Dollin said:
Expressions that "denote a place" are those that can be evaluated
for their lvalues. Those that don't, can't.

[...]

Right, using the old definition of "lvalue", not the one in the C
standard (as you know).

In the C standard, an "lvalue" is not the result of evaluating an
expression; instead, certain expressions are themselves lvalues. I
suspect that if the C committee had stayed with the older meaning of
the term, they could have avoided some serious problems.

The C90 definition of an "lvalue" was (C90 6.2.2.1):

An _lvalue_ is an expression (with an object type or an incomplete
type other than void) that designates an object.

Consider:

int x; /* line 1 */
int *ptr = NULL; /* line 2 */
ptr = &x; /* line 3 */

Before line 3 is executed, the expression *ptr does not designate an
object, so by a literal reading of the definition, *ptr is not an
value, but it becomes one after line 3 is executed. This clearly was
not the intent, since the lvalue-ness of an exression, in many cases,
needs to be determined at compilation time. *ptr should be an lvalue
regardless of the current value of ptr; attempting to evaluate it *as
an lvalue* invokes undefined behavior if it doesn't *currently*
designate an object.

So the C99 committee attempted to solve this problem, but created a
bigger one. C99 6.3.2.1p1:

An _lvalue_ is an expression with an object type or an incomplete
type other than void; if an lvalue does not designate an object
when it is evaluated, the behavior is undefined.

So the lvalue-ness of an expression no longer depends on the current
value of the expression or any subexpression (solving the problem with
the C90 definition) -- *but* the definition no longer says that it
designates an object, which is the whole idea. By a literal reading
of the C99 definition, 42 is an lvalue (it's an expression of an
object type, namely int). Again, this clearly is not the intent.

Stating the actual intent in standardese is difficult, but not
impossible. An improvement would be to revert to the C90 definition
and add the word "potentially", with a footnote to explain what that
means:

An _lvalue_ is an expression (with an object type or an incomplete
type other than void) that potentially (footnote) designates an
object.

(footnote) An expression potentially designates an object either
if it actually does so, or if it would do so given appropriate
values for its subexpressions. For example, if ptr is an object
pointer, *ptr potentially designates an object (though it doesn't
actually designate an object unless ptr has an appropriate value).

That's off the top of my head; I'm sure it could be worded better.

Perhaps if the standard said, instead of an expression *being* an
lvalue, that its lvalue can be evaluated, this problem wouldn't have
occurred. We'd still need rules about which expressions can be
evaluated for their lvalues, and wording about when such an evaluation
invokes undefined behavior. And if such a change were made now, all
the references to "lvalue" in the standard would have to be modified
to reflect the new (old) meaning.

I suspect we're just stuck with the current meaning of lvalue (and we
have to read what the definition *should* say rather than what it
*does* say).

Keith Thompson · Feb 19, 2007

Does the C99 standard say anywhere whether a value is or is not
assigned in case of overflow? That is, does it say either of these:
- In case of overflow, assignment must be suppressed.
- In case of overflow, a truncated value must be assigned.
If it says neither (modulo wording of course, anything close would
suffice to define which behaviour is required), then I agree with:

As I already told you, it invokes undefined behavior. That means the
standard imposes no requirements.

C99 7.19.6.2p10:

If this object does not have an appropriate type, or if the result
of the conversion cannot be represented in the object, the
behavior is undefined.

I suggest you get your own copy of
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf
so you can look these things up yourself rather than depending on the
rest of us to do it for you.

Mark McIntyre · Feb 19, 2007

Is it anything like last year's new technical definition of "planet"
by the IAU (International Astronom... Union)? :-(

It defines it as "a region of data storage... the contents of which
can represent values". This seems an entirely reasonable definition to
me. As someone has already said, the word has a wide variety of exact
meanings in many walks of life, so being precise is /not/
inappropriate.

OK, then given your instance, I'd be glad to try very hard to
always qualify my use of the word "object" with one of:
- In the original lisp sense, any distinctly allocated block of
memory with a single handle on it, which would include c structs
allocated with malloc or calloc.

This is pretty close to the C definition, if you think it through. I
don't think Lisp required the block of memory to be complex.

What is the definition of "region" used there?

The standard itself doesn't define region. You would have to check
back in ISO/IEC 2832-1:1993 "Information Technology - Vocabulary
Part1: Fundamental Terms" to see what ISO defined it as.

if a particular compiler optimizes space by moving the long int
array ahead of either of the short int arrays to reduce amount of
padding needed to respect long boundaries, so that
a[7] a[8] c[0] c[1] c[2]
form a contiguous block of memory, is that considered a "region"
hence an "object"??

There's nothing which requires these to be contiguous, so I can't see
how they can be considered either an object or a single region.

I suspect that ISO hasn't defined the meaning of "region" any more
than the IAU defined the meaning of "clearing the neighborhood".

FWIW, the IAU had no need to define that since it can be inferred from
an amazing property known as "common sense".

Also, I'm not clear on the intended meaning of "contents" and
"values".

Egregious.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Keith Thompson · Feb 19, 2007

From: Keith Thompson <[email protected]> [...]

I'm just saying that *in this newsgroup*, the word has the
overriding meaning of a "region of data storage in the execution
environment, the contents of which can represent values".

Click to expand...

What is the definition of "region" used there? Does it refer *only*
to contiguous blocks of memory which are guaranteed to be
contiguous per the spec, or does it also include blocks of memory
which just happen to be contiguous per one implementation but not
necessarily another? For example, if the declarations are:
short int a[9];
long int b[5];
short int c[7];
if a particular compiler optimizes space by moving the long int
array ahead of either of the short int arrays to reduce amount of
padding needed to respect long boundaries, so that
a[7] a[8] c[0] c[1] c[2]
form a contiguous block of memory, is that considered a "region"
hence an "object"??

The C standard does not define the word "region". It does have a
normative reference to "ISO/IEC 2382?1:1993, Information technology --
Vocabulary -- Part 1: Fundamental terms". I don't know whether that
document defines "region" or not; if not, it should be understood to
have its usual English meaning.

The C standard is not a mathematically perfect formal definition. You
have to use some common sense in reading it.

As it happens, it's possible for two or more declared objects to be
adjacent in memory, and it's possible for a program to detect portably
whether they are or not. If two or more objects happen to be adjacent
in memory, I suppose the union of their memory regions could be
considered to be a single memory region, and therefore an object.
This is a mildly interesting technical point, but it's of no
particular use as far as I can see; for any program that tries to make
use of this, there are far better and more portable ways to do it.

Keith Thompson · Feb 19, 2007

Do you happen to know if there's a complete list of such violations
online in accessible format (plain text or simple HTML)? I'd like
to consult such a list as I develop my cookbook/matrix.

I don't know. gcc comes with extensive documentation, including a
section on gcc extensions. Its behavior on encountering a user of
such an extension depends on the command-line options. If it issues a
diagnostic (even just a warning) for anything that's a syntax error or
constraint violation in ISO C, that's probably enough for conformance.

Any gcc-specific questions not answered by the documentation should be
directed to gnu.gcc.help.

Yes, I understand that. Presumably if you compile a c program with
gcc, and specify the generic names of the headers for the various
libraries, for example:
#include <stdlib.h>
rather than specifying the actual path to the header to library, for example:
#include "/usr/local/bin/ansi/c/stdlib.h"
and you don't use a switch such as -ansi that forces gcc to use the
ansi instead of gnu version, then gcc will automatically arrange
that you get the GNU C version of each library rather than the ANSI
version. (Correct me if I'm wrong on this point!)

That's a gcc implementation detail, not a C language issue. <OT>The
gcc installation process creates modified versions of some of the
header files that already exist on the OS; the details are
off-topic.</OT>

And what exactly do you mean by "ANSI"? The current official C
standard, C99, was issued by ISO (and later adopted by ANSI). The
previous standard, which is still in wide use, is C90, also issued by
ISO (and adopted by ANSI). I suggest avoiding the use of "ANSI" as an
adjective; many people still use "ANSI C" to refer to the language
defined by the ANSI C89 and ISO C90 standard documents, but strictly
speaking that usage is incorrect. If you instead refer to "C90" or
"C99", you avoid the ambiguity.

When I have line that looks like this:
<li>GNU-c (#include <stdlib.h&gt -- <em>mumble(x,y)</em></li>
I mean to imply that the function mumble is defined in the GNU C
version of the stdlib library but not in the corresponding ANSI
version of that same-name library. If I make a mistake in such
annotation, feel free to correct me.

On the web page, I saw, there were two functions marked "GNU-c", both
of them incorrectly. Both functions are defined by C99, but not by
C90.

<OT>
The phrase "the GNU C version of the stdlib library" doesn't make much
sense, unless you're referring to glibc. I use gcc on Linux, where
the C runtime library is glibc. I also use gcc on Solaris, where the
C runtime library is the one provided by Solaris. gcc is a compiler,
not a complete implementation.

I'm thinking of changing my notation. Instead of saying "GNU-c" as
the language as I do there, instead just say "c", but have a
footnote that explains the situation regarding GNU vs. ANSI. I'll
be thinking more about it later today and maybe start updating the
file as soon as I have decided how exactly do do it.
[...]

I had the impression they were not in original c but are in GNU c,
so that was the distinction I was making. But if they're in C99 as
you claim, then I'd rather change that to show they're in C99
(instead of GNU c) but not in original c.

Be careful with the term "original c" (or, preferably, "original C").
Versions of C existed long before the first ANSI standard.

Is the C99 standard online in searchable/HTML format, for free, so
that I could consult it to verify fine points like this instead of
just taking your word for it? And are you referring to ANSI C99 or
ISO C99 anyway??

n1124.pdf, referenced above, is the C99 standard with two Technical
Corrigenda merged into it. Any post-C99 chanages are marked with
change bars.

<http://en.wikipedia.org/wiki/C_(programming_language)#C99>
publication of ISO 9899:1999 in 1999. This standard is commonly
referred to as "C99." It was adopted as an ANSI standard in March
2000.
That's not clear whether it's both an ISO standard and ANSI
standard, from the same document, or an ISO document but only an
ANSI standard.

I have no idea what you're asking.

[snip]

Yevgen Muntyan · Feb 20, 2007

robert said:
I'm trying to warn the reader about some functions which aren't
defined in the official standard, but *are* available in GNU C
which is perhaps the most popular implementation (of compiler and
loader-commands to use corresponding libraries), so the reader
won't be totally confused if he writes code relying on such a
library function but it's not available in the implementation he's
using.

Imagine your reader is Windows or *BSD or MacOSX or Solaris user.
Does he care about popular implementations? Just don't mention
GNU-only functions at all. Good news is that you won't have to,
there are not so many GNU-only, BSD-only, Whatever-only functions
of general use.

If you can suggest a better way to note this distinction
between functions guaranteed to be available in all conforming
implementations and those "extras" provided by major vendors,
please do. I don't like the idea of just having a caveat at the top
of the document saying "some of the c functions described below
aren't available in all implemetations of c". I'd rather
individually flag those very few which might not be available.

Now if the particular function is in the C99 spec, that would be
better to mention, but again I'd need some way of annotating those
functions which aren't in older C but are in C99 without confusing
the reader.

Better don't even try. Given that you don't know what's C99 and what's
GNU implementation of C library, imagine what reader will know after
reading your stuff. Just stick to C99.

[snip]

By the way, I hope my readers will look up the complete definition
of the function, using Google for example, any place my quickie
definition isn't quite complete in all details.

Is this how *you* get the documentation? You should consider something
better, like C libraries manuals, man pages, C standard. man pages
are pretty good, they tell you about the standards given function
conforms to. Then you can check that information if you like. You can
also use code samples from man pages. Whatever is there is likely
to be of high value. Higher than random stuff from google.

At some point I did that. I had no idea what documentation to use,
how to use it, where to get it; did googling. I got some
wrong things deep in my mind then, and it's pretty hard to get
rid of those. Some such things are what is standard and what
is not. Please don't "help" other people like this. Don't write
documentation about stuff you don't know. Oh well.

Yevgen

Flash Gordon · Feb 20, 2007

robert maas, see http://tinyurl.com/uh3t wrote, On 19/02/07 19:40:

Yes, I understand that.

You seem not to.

> Presumably if you compile a c program with
gcc, and specify the generic names of the headers for the various
libraries, for example:
#include <stdlib.h>
rather than specifying the actual path to the header to library, for example:
#include "/usr/local/bin/ansi/c/stdlib.h"
and you don't use a switch such as -ansi that forces gcc to use the
ansi instead of gnu version, then gcc will automatically arrange
that you get the GNU C version of each library rather than the ANSI
version. (Correct me if I'm wrong on this point!)

<snip>

What headers you include has no effect on what libraries you link to.
The headers you include are determined by the source code + compiler
options. The libraries you link to are determined by what options you
pass to the linker. If I use GCC on a machine where there is no GNU C
library then it does not use the GNU C library because there is not one
on the machine. It uses the C library that actually is there, such as
the MS one on Windows, the AIX one on AIX etc. If I use it on Linux then
it uses the GNU one because that is the only one installed. The compiler
option just affects the visibility of the extensions in the headers, not
what you link to.

Keith Thompson · Feb 20, 2007

CBFalconer said:
No, it says the behaviour is undefined. Anything may happen,
including launching WWIII. I believe C99 adds the possibility of
causing a signal. If you look at the input parsers I have
published here you will see means of detecting incipient overflow
and returning an appropriate error.

You're thinking of the behavior of overflow of arithmetic operations
on signed integers (e.g., MAX_INT + 1). In C90, it yields an
implementation-defined result; C99 added the possibility of raising an
implementation-defined signal.

Overflow in sscanf() for any numeric type invokes UB (which, of
course, includes the possibility of raising a signal).

Flash Gordon · Feb 20, 2007

Yevgen Muntyan wrote, On 20/02/07 03:54:

Imagine your reader is Windows or *BSD or MacOSX or Solaris user.
Does he care about popular implementations? Just don't mention
GNU-only functions at all. Good news is that you won't have to,
there are not so many GNU-only, BSD-only, Whatever-only functions
of general use.

As a strong supporter of sticking to standard C where possibly I
strongly disagree. There are a vast number of system specific functions
which are extremely useful, it's just that they are not topical here.

Better don't even try. Given that you don't know what's C99 and what's
GNU implementation of C library, imagine what reader will know after
reading your stuff. Just stick to C99.

Very poor advice. You will leave all the poor users of Visual Studio
Express wondering why they don't have snprintf (only _snprintf IIRC
which has significant differences) etc.

[snip]

By the way, I hope my readers will look up the complete definition
of the function, using Google for example, any place my quickie
definition isn't quite complete in all details.

Click to expand...

Is this how *you* get the documentation? You should consider something
better, like C libraries manuals, man pages, C standard. man pages
are pretty good, they tell you about the standards given function
conforms to.

When did you last look at the man pages on SCO, AIX, IRIX etc... I'm
sure some of them do what you say, but you do not know that all man
pages do not that the versions installed on the OPs system do. The
bibliography of the comp.lang.c FAQ references some good books and the
comp.lang.c FAQ is good.

> Then you can check that information if you like. You can
also use code samples from man pages.

You have to check the copywrite before copying and publishing code.

> Whatever is there is likely
to be of high value. Higher than random stuff from google.

Random stuff from google is not good for those who don't know the
subject already, I agree.

At some point I did that. I had no idea what documentation to use,
how to use it, where to get it; did googling. I got some
wrong things deep in my mind then, and it's pretty hard to get
rid of those. Some such things are what is standard and what
is not. Please don't "help" other people like this. Don't write
documentation about stuff you don't know. Oh well.

Indeed. Robert has already written documentation about what he does not
know.

Padding strings for a clean visual print out...	5	Dec 23, 2023
How can I view / open / render / display a pdf file with c code?	0	Sep 23, 2023
Arrays and Functions (how to clean up code)	3	Nov 2, 2009
How to try a range of hex values in C# code ?	0	Nov 19, 2022
Copy string from 2D array to a 1D array in C	1	Nov 1, 2023
If you need to code a Windows Forms software that uses C# software how do i make the design for a software that makes this Post Description function ?	0	Sep 21, 2022
How to debug every line of a c code with macros like functions ?	0	Aug 8, 2022
How to multiply two matrices of size in using inline assembly in C++	2	Mar 3, 2024

Requesting advice how to clean up C code for validating string represents integer

Keith Thompson

Yevgen Muntyan

Richard Bos

Chris Dollin

Chris Dollin

robert maas, see http://tinyurl.com/uh3t

robert maas, see http://tinyurl.com/uh3t

robert maas, see http://tinyurl.com/uh3t

robert maas, see http://tinyurl.com/uh3t

CBFalconer

CBFalconer

Keith Thompson

Keith Thompson

Mark McIntyre

Keith Thompson

Keith Thompson

Yevgen Muntyan

Flash Gordon

Keith Thompson

Flash Gordon

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads