How to force fscanf to find only data on a single input line?

D

David Mathog

Apologies if this is in the FAQ. I looked, but didn't find it.

In a particular program the input read from a file is supposed to be:

+ 100 200 name1
- 101 201 name2

It is parsed by reading the + character, and then sending the
remainder into fscanf() like

count = fscanf(fp,"%d %d %s",&first_int,&second_int,&string);

This works fine unless the input is bogus. In particular, if
"name1" is left off, fscanf happily reads past the EOL of the
first line and comes back with "-" from the second line
stored in the string. Effectively it sees the bogus line as:

+ 100 200 - 101 201 name2

since it makes no distinction between EOL and other white space.
So count is 3 but the wrong characters are stored in string.

What I want is for count to be 2 and string's contents to be
undefined. Is there some magic format specifier that tells fscanf()
not to go past the EOL when looking for data? Sure, it can be done by
reading a whole line into a buffer, and then using sscanf() on that. It
just seems that there should be a way to make fscanf() "line aware".

Possible?

Thanks,

David Mathog
 
M

Malcolm McLean

David Mathog said:
Apologies if this is in the FAQ. I looked, but didn't find it.

In a particular program the input read from a file is supposed to be:

+ 100 200 name1
- 101 201 name2

It is parsed by reading the + character, and then sending the
remainder into fscanf() like

count = fscanf(fp,"%d %d %s",&first_int,&second_int,&string);

This works fine unless the input is bogus. In particular, if
"name1" is left off, fscanf happily reads past the EOL of the
first line and comes back with "-" from the second line
stored in the string. Effectively it sees the bogus line as:

+ 100 200 - 101 201 name2

since it makes no distinction between EOL and other white space.
So count is 3 but the wrong characters are stored in string.

What I want is for count to be 2 and string's contents to be
undefined. Is there some magic format specifier that tells fscanf()
not to go past the EOL when looking for data? Sure, it can be done by
reading a whole line into a buffer, and then using sscanf() on that. It
just seems that there should be a way to make fscanf() "line aware".
Use fgets() or Chuck Falconer's ggets() (Google his name and ggets to find
it) to read in a line, and then parse it with sscanf().

The fact that newline is treated as whitepace is a recognised design flaw in
fscanf().
 
R

Richard Heathfield

Malcolm McLean said:

Use fgets() or Chuck Falconer's ggets() (Google his name and ggets to
find it) to read in a line, and then parse it with sscanf().

Chuck's ggets function suffers from at least two problems, one being
that every call creates a new buffer that must be managed, and another
being the absence of any way to specify an upper limit on memory
consumption.
The fact that newline is treated as whitepace is a recognised design
flaw in fscanf().

Recognised by whom?
 
M

Malcolm McLean

Richard Heathfield said:
Malcolm McLean said:



Chuck's ggets function suffers from at least two problems, one being
that every call creates a new buffer that must be managed, and another
being the absence of any way to specify an upper limit on memory
consumption.
It's a big improvement on fgets(). No one's going to try to crash David
Mathog's program by feeding it a 4 billion character .line, now, are they?
Recognised by whom?
I am I the only one who has realised this? I don't think so, it has been
discussed before, though I'm afraid I couldn't reference the threads.
 
C

CBFalconer

Malcolm said:
.... snip ...

The fact that newline is treated as whitepace is a recognised
design flaw in fscanf().

is a recognized feature, which may become helpful or a flaw,
dependent on the usage desired.
 
C

CBFalconer

You find it on my page. See sig.
It's a big improvement on fgets(). No one's going to try to crash
David Mathog's program by feeding it a 4 billion character .line,
now, are they?
.... snip ...

I am I the only one who has realised this? I don't think so, it has
been discussed before, though I'm afraid I couldn't reference the
threads.

It's been out there and used for about 5 years now, and nobody
worried about the possible infinite string until now.
 
R

Richard Heathfield

Malcolm McLean said:
It's a big improvement on fgets().

I'm not convinced of that. Convince me.
No one's going to try to crash
David Mathog's program by feeding it a 4 billion character .line, now,
are they?

I have no idea what David Mathog's threat model is. I do know, however,
that he will find buffer management under ggets either inconvenient,
inefficient, or both.
I am I the only one who has realised this?

I don't know. You're the one who says it's a recognised design flaw, so
it's up to you to come up with some recognisers.
 
R

Richard Heathfield

CBFalconer said:

[ggets has] been out there and used for about 5 years now, and nobody
worried about the possible infinite string until now.

Not so. Pat Foley raised the issue, here in comp.lang.c, on 25 June
2002. He was the first, as far as I can make out, but he is certainly
not the last.
 
M

Malcolm McLean

Richard Heathfield said:
Malcolm McLean said:


I don't know. You're the one who says it's a recognised design flaw, so
it's up to you to come up with some recognisers.
We used to have regular discussions about how to use the fscanf() format
string to do amazing things with the function. If I remember rightly these
were in the days of Dan Pop (anyone know what became of him after he left
CERN? He is sorely missed.) One thing that came out of this was that the
treatment of a newline as matching whitespace meant that there was no nice
way of doing line-based formatting.
 
C

CBFalconer

Richard said:
CBFalconer said:

[ggets has] been out there and used for about 5 years now, and nobody
worried about the possible infinite string until now.

Not so. Pat Foley raised the issue, here in comp.lang.c, on 25 June
2002. He was the first, as far as I can make out, but he is certainly
not the last.

Well, I certainly never saw it, and I have given my reasons for
rejecting any change to the functions header.
 
R

Richard Heathfield

Malcolm McLean said:
We used to have regular discussions about how to use the fscanf()
format string to do amazing things with the function. If I remember
rightly these were in the days of Dan Pop (anyone know what became of
him after he left CERN? He is sorely missed.) One thing that came out
of this was that the treatment of a newline as matching whitespace
meant that there was no nice way of doing line-based formatting.

Never forget the (non-rhyming, non-scanning) fscanf limerick:

The ability to process information
That is spread arbitrarily
Over a number of lines
Might reasonably be seen
As a feature instead of a flaw.
 
R

Richard Heathfield

CBFalconer said:
Richard said:
CBFalconer said:

[ggets has] been out there and used for about 5 years now, and
[nobody
worried about the possible infinite string until now.

Not so. Pat Foley raised the issue, here in comp.lang.c, on 25 June
2002. He was the first, as far as I can make out, but he is certainly
not the last.

Well, I certainly never saw it,

Oh, I see. There must be two CBFalconers then, since CBFalconer did in
fact post a prompt reply to Pat Foley.
and I have given my reasons for
rejecting any change to the functions header.

That's fine - but it makes your function less useful than it could be.
For example, it oughtn't to be used in environments that are open to
accidental or malicious data abuse, or in low memory situations
(because of its leak-encouraging design).
 
D

David Mathog

pete said:
If the case is that it is acceptable
to truncate any lines longer than LENGTH number of characters,
then you can make fscanf() "line aware" this way:

http://www.mindspring.com/~pfilandr/C/fscanf_input/fscanf_input.c
This example includes the line:

rc = fscanf(stdin, "%" xstr(LENGTH) "[^\n]%*[^\n]", array);

I don't see that as being an improvement over using fgetc() and storing
the characters one by one into array, checking for \n and LENGTH as it
goes. If the data is to be read into a buffer, then sscanf() can be
employed instead of fscanf(), and the problem goes away.

I already looked at the [] notation as a possible solution for this
but couldn't figure out how to force it into shape. For instance:

rc = fscanf(fp,"%d[ \t]%d[ \t]%s[\n]",&int1,&int2,%string);

and the input is (missing name1 the end of the first line):

+ 100 200 \n- 300 400 name2\n

and fscanf is called after the "+" is read, then string will be
"\n-300 400 name2", which is not at all the desired result.

Seems like to solve this cleanly one would need to amend the spec to either:

1. Add a new format specifier which tells fscanf to STOP at the first \n.
2. Or more generally, %[\n.:] - terminate input at any of the specified
characters. I believe the %[] syntax would generate an error now, so
extending that way should not break any current code, but you folks are
the experts.

Anyway, I guess the answer to my question is that there is no simple way
to make fscanf() treat an EOL as an input terminator. It seems slightly
bizarre to me that fscanf() has no concept of "end of input", other than
EOF!

Regards,

David Mathog
 
K

Kenneth Brody

Flash said:
Al Balmer wrote, On 28/08/07 23:50:

Even outside discussion of C I would consider newline to be whitespace.
See for example
http://www.google.co.uk/search?hl=en&q=define:+whitespace&btnG=Search&meta=

Even Whitespace considers a newline to be whitespace:

http://compsoc.dur.ac.uk/whitespace/

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:[email protected]>
 
D

David Mathog

Kenneth said:
Even Whitespace considers a newline to be whitespace:

http://compsoc.dur.ac.uk/whitespace/

The problem is not so much that fscanf() normally considers EOL to be
whitespace, but rather that fscanf()'s only concept of
"end of input" within the scope of an fscanf() call is either when
it sees an EOF or "all parts of the format string have been used up".
Using the [] method in the format string one can make EOL whitespace
or not (effectively), but it doesn't resolve the primary issue. As
I posted elsewhere in this thread, a more general "end of input"
specifier would allow much better control of parsing, for instance,
letting a colon, dash, or other normal character indicate the end of a
region of data.

Sadly a lot of real world data is organized in lines of text which are
terminated by an EOL. Since there's no way to tell fscanf() that the
EOL character (or any other character) is an input terminator, there's
no simple way to handle improperly formatted data using only fscanf().
It can certainly be done other ways, just not solely with this function.

Regards,

David Mathog
 
C

CBFalconer

Richard said:
CBFalconer said:
Richard said:
<snip>

[ggets has] been out there and used for about 5 years now, and
[nobody worried about the possible infinite string until now.

Not so. Pat Foley raised the issue, here in comp.lang.c, on 25
June 2002. He was the first, as far as I can make out, but he is
certainly not the last.

Well, I certainly never saw it,

Oh, I see. There must be two CBFalconers then, since CBFalconer
did in fact post a prompt reply to Pat Foley.

Well, maybe I should modify my answer to 'I don't remember'. This
also indicates how seriously I took any such objection at the time.
That's fine - but it makes your function less useful than it
could be. For example, it oughtn't to be used in environments
that are open to accidental or malicious data abuse, or in low
memory situations (because of its leak-encouraging design).

That's ridiculous. Similarly, you can say anything that uses
malloc to collect and store information is dangerous. Systems have
better methods of limiting overuse, such as memory maxima. Nor
should any recursive code be let out into the wild, since overuse
can crash. Ptui.

After all, it is just one more choice. You can use gets, ggets
fgets, getline (I think that is your routines name), getc, fscanf,
etc. as you wish. Scratch gets from that list. You pays your money
and takes your choice. Or write your own.
 
K

Keith Thompson

CBFalconer said:
Richard Heathfield wrote: [...]
That's fine - but it makes your function less useful than it
could be. For example, it oughtn't to be used in environments
that are open to accidental or malicious data abuse, or in low
memory situations (because of its leak-encouraging design).

That's ridiculous. Similarly, you can say anything that uses
malloc to collect and store information is dangerous. Systems have
better methods of limiting overuse, such as memory maxima. Nor
should any recursive code be let out into the wild, since overuse
can crash. Ptui.

A program can use malloc reasonably safely as long as the program can
control how much memory is allocated. Similarly for recursion, if the
program can control the depth of recursion.

gets() is dangerous because its misbehavior (buffer overflow) can be
triggered by factors that the program cannot control, namely the
contents of stdin.

ggets() is less dangerous, but nevertheless its misbehavior
(attempting to allocate more memory that it should) can likewise be
triggered by the contents of stdin. Once my program call ggets(), it
has *no control* over how much memory may be allocated.

If you consider that to be an acceptable price to pay for the relative
simplicity of ggets(), that's your call, but it's something that
anyone thinking about using ggets() should consider.

[...]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top