My gets_ws function

A

Andrew Poelstra

int gets_ws (char *buff, int maxlen, int sc, FILE *fh);

This function reads up to maxlen characters from fh, stopping when it
encounters whitespace, sc, or EOF. If EOF is encountered, the function
returns EOF.

If buff is NULL, read characters are discarded. Otherwise, they are
put into buff. The function returns the number of characters read or
0 in the case of a NULL buffer.

I use it in my XML parser to read a tag name:
gets_ws (buff, max, '>', fh);
and if it reads max characters (by returning max), I simply delete
the remaining characters:
gets_ws (NULL, 0, '>', fh);

Here is the code:

gets_ws.c:
/* Reads from stream up until first whitespace or sc character. *
* If given a non-NULL buffer, it will fill that. Otherwise, it *
* will simply go through the characters and throw them away. *
* Returns number of characters read or EOF on failure. */
static int gets_ws (char *buff, size_t maxlen, int sc, FILE *fh)
{
int i = 0;
int ch = fgetc (fh);

assert (fh != NULL);
if (buff == NULL)
{
while (!isspace (ch) && ch != sc)
ch = fgetc (fh);

ungetc (ch);
return 0;
}

while (i < (maxlen - 1) && !isspace (ch) && ch != sc)
{
if (ch == EOF)
return EOF;

buff = ch;
++i;

ch = fgetc (fh);
}
ungetc (ch);

return i;
}

Comments or suggestions are welcome.
 
M

Michael Brennan

Andrew Poelstra said:
int gets_ws (char *buff, int maxlen, int sc, FILE *fh);

This function reads up to maxlen characters from fh, stopping when it
encounters whitespace, sc, or EOF. If EOF is encountered, the function
returns EOF.

If buff is NULL, read characters are discarded. Otherwise, they are
put into buff. The function returns the number of characters read or
0 in the case of a NULL buffer.

if (buff == NULL)
{
while (!isspace (ch) && ch != sc)
ch = fgetc (fh);

ungetc (ch);
return 0;
}
<snip>

This "discard loop" should also have a check for EOF, shouldn't it?

/Michael Brennan
 
R

Richard Heathfield

Andrew Poelstra said:
int gets_ws (char *buff, int maxlen, int sc, FILE *fh);

Let's have a size_t for maxlen, shall we? We don't want to go /there/ again!
gets_ws.c:
/* Reads from stream up until first whitespace or sc character. *
* If given a non-NULL buffer, it will fill that. Otherwise, it *
* will simply go through the characters and throw them away. *
* Returns number of characters read or EOF on failure. */
static int gets_ws (char *buff, size_t maxlen, int sc, FILE *fh)
{
int i = 0;
int ch = fgetc (fh);

There is a case for using getc rather than fgetc here. Because getc is given
explicit licence by the Standard to evaluate its argument more than once
when implemented as a macro (a licence that fgetc is *not* given), it may
be possible for the implementor to make getc a bit quicker. Probably not
much in it, but it's something to bear in mind.
assert (fh != NULL);
if (buff == NULL)
{
while (!isspace (ch) && ch != sc)

Someone else has already noted the lack of a check for EOF here.
ch = fgetc (fh);

ungetc (ch);
return 0;
}

Instead of ungetc-ing ch and returning 0 here, you could simply enclose the
following while loop in an 'else' block.
while (i < (maxlen - 1) && !isspace (ch) && ch != sc)

What if maxlen is 0?
{
if (ch == EOF)
return EOF;

buff = ch;
++i;


buff[i++] = ch; is semantically equivalent to the above two lines, and is
sufficiently simple and idiomatic that it won't confuse anyone maintaining
the code.

<snip>
 
A

Andrew Poelstra

Andrew Poelstra said:


Let's have a size_t for maxlen, shall we? We don't want to go /there/ again!

I did have a size_t, but then I was returning -1 in one version of the
function and didn't want my lint to nab me for comparing a size_t and
int, or something to that effect. (In retrospect, that may have been a
case where an explicit cast was merited.)

I'll fix it.
There is a case for using getc rather than fgetc here. Because getc is given
explicit licence by the Standard to evaluate its argument more than once
when implemented as a macro (a licence that fgetc is *not* given), it may
be possible for the implementor to make getc a bit quicker. Probably not
much in it, but it's something to bear in mind.

Intuition suggests that getc works from stdin in the same way that gets
does (and puts, putc work on stdout). It could potentially be confusing
for a maintenance programmer, IMHO.
Someone else has already noted the lack of a check for EOF here.

And /that/ is why I posted this to clc. Since for all my purposes, I
have personally validated the input, I never would have spotted that
in testing.
Instead of ungetc-ing ch and returning 0 here, you could simply enclose the
following while loop in an 'else' block.

Yes, that would be more clear, wouldn't it.
What if maxlen is 0?

i < -1 will fail, as i is initially 0. I did check that condition.
However, it will then set to buff[0] to '\0', even though I'm told
that buff has 0, not 1 bytes. I believe another assert() is in order
here.
{
if (ch == EOF)
return EOF;

buff = ch;
++i;


buff[i++] = ch; is semantically equivalent to the above two lines, and is
sufficiently simple and idiomatic that it won't confuse anyone maintaining
the code.


Okey doke. Thanks!
 
A

Andrew Poelstra

I took Michael Brennan and Richard Heathfield's suggestions, and
here's my new code (requires assert.h and stddef.h):

gets_ws.c:
/* Reads from stream up until first whitespace or sc character. *
* If given a non-NULL buffer, it will fill that. Otherwise, it *
* will simply go through the characters and throw them away. *
* Returns number of characters read or EOF on failure. */
static int gets_ws (char *buff, size_t maxlen, int sc, FILE *fh)
{
int i = 0;
int ch = fgetc (fh);
assert (fh != NULL);

if (buff == NULL)
while (!isspace (ch) && ch != sc)
{
ch = fgetc (fh);
if (ch == EOF)
return EOF;
}
else
while (i < (maxlen - 1) && !isspace (ch) && ch != sc)
{
if (ch == EOF)
return EOF;

buff[i++] = ch;
ch = fgetc (fh);
}

ungetc (ch);
return i;
}

This works fine in the case that maxlen is 0 (which is what it
should be if buff is NULL), and returns 0 as originally specified.
 
A

Andrew Poelstra

I took Michael Brennan and Richard Heathfield's suggestions, and
here's my new code (requires assert.h and stddef.h):

gets_ws.c:
/* Reads from stream up until first whitespace or sc character. *
* If given a non-NULL buffer, it will fill that. Otherwise, it *
* will simply go through the characters and throw them away. *
* Returns number of characters read or EOF on failure. */
static int gets_ws (char *buff, size_t maxlen, int sc, FILE *fh)
{
int i = 0;
int ch = fgetc (fh);
assert (fh != NULL);

if (buff == NULL)
while (!isspace (ch) && ch != sc)
{
ch = fgetc (fh);
if (ch == EOF)
return EOF;
}
else
while (i < (maxlen - 1) && !isspace (ch) && ch != sc)
{
if (ch == EOF)
return EOF;

buff[i++] = ch;
ch = fgetc (fh);
}

ungetc (ch);
return i;
}

This works fine in the case that maxlen is 0 (which is what it
should be if buff is NULL), and returns 0 as originally specified.

I just realized that as a standalone module, I need to remove the
static at the beginning. I also realized that I /didn't/ specify
that the function returns 0 when given a NULL buffer. I meant to,
though. ;-)
 
R

Richard Heathfield

Andrew Poelstra said:
I did have a size_t, but then I was returning -1 in one version of the
function and didn't want my lint to nab me for comparing a size_t and
int, or something to that effect. (In retrospect, that may have been a
case where an explicit cast was merited.)

The cast would be a nuisance, would it not? Better simply to separate data
processing from error handling. I recommend that you pass in a size_t *,
which may be NULL if the user isn't interested in the number of bytes read.
That leaves you free to return an int to indicate success or failure (and
even the nature of the failure).

Intuition suggests that getc works from stdin in the same way that gets
does (and puts, putc work on stdout).

Then intuition is wrong. It's getchar that works from stdin.
It could potentially be confusing for a maintenance programmer, IMHO.

Then fire him, and get someone who knows C. :)

<snip>
 
A

Andrew Poelstra

Andrew Poelstra said:


The cast would be a nuisance, would it not? Better simply to separate data
processing from error handling. I recommend that you pass in a size_t *,
which may be NULL if the user isn't interested in the number of bytes read.
That leaves you free to return an int to indicate success or failure (and
even the nature of the failure).

That makes sense, but if the user is passing the address of a variable,
that variable would have to be a size_t. Whereas passing by value, the
passed variable could be any integer type (provided it isn't negative).

Of course, I can't think of very many cases where I'd care at all how
many bytes were actually read, so perhaps requiring extra work is
okay.

Design, design...
 
R

Richard Heathfield

Andrew Poelstra said:
On 2006-07-11, Richard Heathfield <[email protected]> wrote:

That makes sense, but if the user is passing the address of a variable,
that variable would have to be a size_t. Whereas passing by value, the
passed variable could be any integer type (provided it isn't negative).

(a) All arguments to functions are passed by value, whether or not they have
pointer type.
(b) Passing this information /to/ the function is pointless, since the whole
point is that the function tells /you/ how many bytes were read. Are you
confusing it with maxlen?
Of course, I can't think of very many cases where I'd care at all how
many bytes were actually read, so perhaps requiring extra work is
okay.

A free strlen is never to be sneezed at.
Design, design...

Think twice; design once.
 
G

Guest

Andrew said:
I'll fix it.


i < -1 will fail, as i is initially 0. I did check that condition.

If you change maxlen to size_t, and SIZE_MAX > INT_MAX, (maxlen - 1)
will be a large positive number if maxlen is 0.
 
M

Michael Brennan

and ctype.h

while (i < (maxlen - 1) && !isspace (ch) && ch != sc)
{
if (ch == EOF)
return EOF;

buff[i++] = ch;
ch = fgetc (fh);
}

Still the problem remains if maxlen is 0 and buff is not NULL.
As someone already noted, now maxlen is a size_t, which is unsigned
and (maxlen - 1) will not be -1 but a large positive number, and
you will most likely overflow your buff.

I suppose you're subtracting maxlen by one to make place for the
terminating '\0' character, but your function does not place any
at the end of the string. If you don't want the '\0' you can
skip the subtracting and use all the maxlen chars in the string.

ungetc() takes two arguments, character and stream,
and also requires you to include stdio.h, unless you wrote your
own ungetc().

By the way, is that legal? To write a function with the same name
as a standard function?

<snip>

/Michael Brennan
 
A

Andrew Poelstra

Andrew Poelstra said:


(a) All arguments to functions are passed by value, whether or not they have
pointer type.
(b) Passing this information /to/ the function is pointless, since the whole
point is that the function tells /you/ how many bytes were read. Are you
confusing it with maxlen?

a) I know that; however passing a pointer by value and a variable by
value have two very different purposes, and I'm not sure that we
have decent terminology to distinguish between the two.

b) Yes, I was. I figured it out when I went to edit the code.
 
R

Richard Heathfield

Andrew Poelstra said:

passing a pointer by value and a variable by value have two
very different purposes

They do? Do tell. >:-D <---- hint: this evil grin is not for nought!
 
A

Andrew Poelstra

and ctype.h

while (i < (maxlen - 1) && !isspace (ch) && ch != sc)
{
if (ch == EOF)
return EOF;

buff[i++] = ch;
ch = fgetc (fh);
}

Still the problem remains if maxlen is 0 and buff is not NULL.
As someone already noted, now maxlen is a size_t, which is unsigned
and (maxlen - 1) will not be -1 but a large positive number, and
you will most likely overflow your buff.

I fixed it: it now compares (i + 1) and maxlen, which works out even
if maxlen is (size_t) -1.
I suppose you're subtracting maxlen by one to make place for the
terminating '\0' character, but your function does not place any
at the end of the string. If you don't want the '\0' you can
skip the subtracting and use all the maxlen chars in the string.

I added the '\0' at the end of the block where buff != NULL (in the
fixed code). I remembered to add braces as well.
ungetc() takes two arguments, character and stream,
and also requires you to include stdio.h, unless you wrote your
own ungetc().

No, I forgot the fh.
By the way, is that legal? To write a function with the same name
as a standard function?

I think that if you don't #include the appropriate header you can,
but I'm very, very unsure about that.
 
A

Andrew Poelstra

Andrew Poelstra said:



They do? Do tell. >:-D <---- hint: this evil grin is not for nought!

Passing a variable by value is useful for giving additional information
to the called function.

Passing a pointer by value is used for:
1) Passing large structures and arrays of information to a function.
2) Giving the function another memory location to place a "return value".
3) Giving the function the address of another function that it can call.

If you're concerned that I'll fall into the
void myfunc (char *p) { p++; }
bug, you don't need to be, but I'm not sure that that's the only reason
for your evil grin.
 
R

Richard Heathfield

Andrew Poelstra said:
Passing a variable by value is useful for giving additional information
to the called function.

Passing a /value/ is useful for giving additional information to the called
function. You can't pass a variable to a function. C is "pass by value",
not "pass variable by value".
Passing a pointer by value is used for:
1) Passing large structures and arrays of information to a function.

You can't pass a large structure to a function. You can, however, pass a
large structure's /value/ to a function. Or, as you obviously are aware,
you can pass the value of a pointer that is pointing at such a structure.

You certainly can't pass an array to a function. But you can pass the value
of a pointer that points at its first element.

But you see, in each case, this simply gives additional information to the
called function, so it's no different to your first case.
2) Giving the function another memory location to place a "return
value". 3) Giving the function the address of another function that it
can call.

In other words, passing additional information to a function by providing it
with a value of a given type.

The point I'm trying to make here is that there is no distinction between
"pass a pointer by value" and "pass a variable by value", except insofar as
the first is meaningful but the second is not. The *only* thing you can
pass to a function via the parameter mechanism is a value, which is the
result of an expression. The meaning of that value depends partly on its
type and partly on the semantics of your program.
If you're concerned that I'll fall into the
void myfunc (char *p) { p++; }
bug, you don't need to be,

I'm not concerned that *you* will fall into that trap. I'm concerned that
you may cause others to fall into it (or at least fail to climb out of it),
by perpetuating flawed concepts such as "pass a variable by value".
 
B

Bas Wassink

I think that if you don't #include the appropriate header you can, but
I'm very, very unsure about that.

I wondered about that some time ago too, and found this in n1124:

7.26 Future library directions
1 The following names are grouped under individual headers for
convenience. All external names described below are reserved
no matter what headers are included by the program.

What follows is a list of all standard headers, so I think you
can't redefine any standard functions, even if their respective
headers aren't included.

Bas Wassink.
 
A

Andrew Poelstra

I wondered about that some time ago too, and found this in n1124:

7.26 Future library directions
1 The following names are grouped under individual headers for
convenience. All external names described below are reserved
no matter what headers are included by the program.

What follows is a list of all standard headers, so I think you
can't redefine any standard functions, even if their respective
headers aren't included.

I know that you can't (re)define mem[a-z]*, str[a-z]*, is[a-z]*,
and to[a-z]*, because they're in the implementation's namespace.
When you think about it, all the standard functions are probably*
also in the implementation's namespace, and therefore may not be
redefined.

* Your quote of the Standard makes this `probably' a `certainly'.
 
G

Guest

Bas said:
I wondered about that some time ago too, and found this in n1124:

7.26 Future library directions
1 The following names are grouped under individual headers for
convenience. All external names described below are reserved
no matter what headers are included by the program.

This applies to only non-standard but reserved identifiers, and not to
redefining standard functions.
What follows is a list of all standard headers, so I think you
can't redefine any standard functions, even if their respective
headers aren't included.

You are correct here, though, except when you declare them static. The
relevant text is in 7.1.3:
"All identiï¬ers with external linkage in any of the following
subclauses (including the future library directions) are always
reserved for use as identiï¬ers with external linkage.157)"
"No other identiï¬ers are reserved. If the program declares or
deï¬nes an identiï¬er in a context in which it is reserved (other
than as allowed by 7.1.4), or deï¬nes a reserved identiï¬er as a
macro name, the behavior is undeï¬ned."
 
D

Dietmar Schindler

Richard said:
The point I'm trying to make here is that there is no distinction between
"pass a pointer by value" and "pass a variable by value", except insofar as
the first is meaningful but the second is not.

The first is no more or less meaningful than the second. You correctly
stated "The *only* thing you can pass to a function via the parameter
mechanism is a value...", so the "value" part is dispensable, but not
meaningless. The "pass a pointer" and "pass a variable" parts, on the
other hand, contain information about what is passed, as there can other
things than pointers as well as other things than variables be passed.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,216
Latest member
topweb3twitterchannels

Latest Threads

Top