fscanf problem

T

Thomas Sourmail

Hi,

I hope I am missing something simple, but.. here is my problem:

I need my program to check the last column of a file, as in :

a b c d target ref
0 0 0 0 1 a
1 0 0 0 1.5 b
2 0 0 0 2 c
0 0 6 0 2 g
0 0 0 4 1.5 h
0 0 0 8 2 i
3 0 0 0 1 j
1 0 0 0 1.5 k

to do this, I read up to column 5, then apply the following

if(fscanf(ifp,"%f",&fref)==1) {
last_column_is_number=1;
} else if (fscanf(ifp,"%s",ref)!=1 ) {
error=1;
}

which.. works fine in most problems I've tried.
It is just by chance that I built the above example file and realised
that, when it comes to 'i' or 'n', fscanf simply skips to the next
point, reading, in the example above, '3'.

I've moved 'i' around, the error remains, 'ii' is also ignored, but
anything larger works fine.

Confused !
Any help greatly appreciated,

Thomas.
 
C

Chris Torek

... I read up to column 5, then apply the following

if(fscanf(ifp,"%f",&fref)==1) {
last_column_is_number=1;
} else if (fscanf(ifp,"%s",ref)!=1 ) {
error=1;
}

which.. works fine in most problems I've tried.
It is just by chance that I built the above example file and realised
that, when it comes to 'i' or 'n', fscanf simply skips to the next
point, reading, in the example above, '3'.

I will note here that 'i' is the first (case-insensitive) letter
of "Inf", and 'n' is the first letter of "NaN". I suspect this is
significant.

It is not completely clear to me whether you mean "the first scanf
call fails, returning 0, so that the second scanf call succeeds --
returning 1 -- but stores "3" in the ref[] array". If so, this
may be working the way ANSI/ISO C dictates. (I have never been quite
happy with the ISO rules for the scanf engine, and I know my stdio
does not behave according to the Standard -- input like "1.23e+whoops"
scans as 1.23, leaving "e+whoops" in the stream, while the Standard
says that at least the 'e' and '+' are eaten, and perhaps the 'w'.
I am not sure what is supposed to happen to the 1.23.)

The best approach is almost certainly the same one that is so often
best when dealing with either data files or interaction with users:
read complete lines, one at a time, and *then* pick them apart in
whatever way you like, possibly including sscanf(). Here the %n
directive may come in handy.
 
E

Eric Sosman

Thomas said:
Hi,

I hope I am missing something simple, but.. here is my problem:

I need my program to check the last column of a file, as in :

a b c d target ref
0 0 0 0 1 a
1 0 0 0 1.5 b
2 0 0 0 2 c
0 0 6 0 2 g
0 0 0 4 1.5 h
0 0 0 8 2 i
3 0 0 0 1 j
1 0 0 0 1.5 k

to do this, I read up to column 5, then apply the following

if(fscanf(ifp,"%f",&fref)==1) {
last_column_is_number=1;
} else if (fscanf(ifp,"%s",ref)!=1 ) {
error=1;
}

which.. works fine in most problems I've tried.
It is just by chance that I built the above example file and realised
that, when it comes to 'i' or 'n', fscanf simply skips to the next
point, reading, in the example above, '3'.

I've moved 'i' around, the error remains, 'ii' is also ignored, but
anything larger works fine.

Perhaps the problem has something to do with how
you "read up to column 5?" Please trim your code a
little less severely, and post a short, complete, and
compilable demonstration of the problem.
 
T

Thomas Sourmail

Eric said:
Perhaps the problem has something to do with how
you "read up to column 5?" Please trim your code a
little less severely, and post a short, complete, and
compilable demonstration of the problem.

Here is a compilable demo of the problem, with the input file below

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {
FILE *ifp;
int i,j;
float dummy;
char ref[10],filename[10]="test.csv",mode[10]="r";

ifp=fopen(filename,mode);
for (i=0;i<26;++i) {
for (j=0;j<5;++j) {
fscanf(ifp,"%f",&dummy);
printf("%1.1f ",dummy);
}
if(fscanf(ifp,"%f",&dummy)==0) {
printf(" Failed reading as number ");
printf(" dummy is now %f ",dummy);
fscanf(ifp,"%s",ref);
printf("%s\n",ref);
} else {
printf("Read as number !\n");
printf("%f\n",dummy);
}
}
fclose(ifp);
}


the file test.csv:

0 0 0 0 1 a
1 0 0 0 1.5 b
0 0 0 8 2 i
0 0 0 0 1 j
0 2 0 0 1.5 m
0 4 0 0 2 nn
0 0 3 0 1.5 o
0 0 0 4 1.5 NaN
0 0 0 8 2 r
0 0 0 0 1 Inf
1 0 0 0 1.5 t


and the output on my machine:

0.0 0.0 0.0 0.0 1.0 Failed reading as number dummy is now 1.000000 a
1.0 0.0 0.0 0.0 1.5 Failed reading as number dummy is now 1.500000 b
0.0 0.0 0.0 8.0 2.0 Failed reading as number dummy is now 2.000000 0
0.0 0.0 0.0 1.0 1.0 Failed reading as number dummy is now 1.000000 j
0.0 2.0 0.0 0.0 1.5 Failed reading as number dummy is now 1.500000 m
0.0 4.0 0.0 0.0 2.0 Failed reading as number dummy is now 2.000000 0
0.0 3.0 0.0 1.5 1.5 Failed reading as number dummy is now 1.500000 o
0.0 0.0 0.0 4.0 1.5 Read as number !
nan
0.0 0.0 0.0 8.0 2.0 Failed reading as number dummy is now 2.000000 r
0.0 0.0 0.0 0.0 1.0 Read as number !
inf
1.0 0.0 0.0 0.0 1.5 Failed reading as number dummy is now 1.500000 t



When I run this, there is no problem with NaN or Inf (fscanf converts ok
to float), but on 'i' and 'nn', the problem remains.
It seems that, instead of leaving the character in the input stream as
described in http://www.eskimo.com/~scs/C-faq/q12.19.html
fscanf jumps to the next one, but only on these particular characters
(i, ii, n, nn).
Interestingly, with 'iii' or 'nnn', fscanf reads 'i' and 'n' resp. and
similarly if you increase the number of 'i's and 'n's..

I am sure there are workarounds, but I am quite curious about what is
happening..

Just in case, I'm on RH9, gcc-3.2.2-5, glibc-2.3.2-27.9.7

Thomas.
 
A

Al Bowers

Thomas said:
Eric Sosman wrote:
Perhaps the problem has something to do with how
you "read up to column 5?" Please trim your code a
little less severely, and post a short, complete, and
compilable demonstration of the problem.

Here is a compilable demo of the problem, with the input file below

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {
FILE *ifp;
int i,j;
float dummy;
char ref[10],filename[10]="test.csv",mode[10]="r";

ifp=fopen(filename,mode);
for (i=0;i<26;++i) {
for (j=0;j<5;++j) {
fscanf(ifp,"%f",&dummy);
printf("%1.1f ",dummy);
}
if(fscanf(ifp,"%f",&dummy)==0) {
printf(" Failed reading as number ");
printf(" dummy is now %f ",dummy);
fscanf(ifp,"%s",ref);
printf("%s\n",ref);
} else {
printf("Read as number !\n");
printf("%f\n",dummy);
}
}
fclose(ifp);
}


the file test.csv:

0 0 0 0 1 a
1 0 0 0 1.5 b
0 0 0 8 2 i
0 0 0 0 1 j
0 2 0 0 1.5 m
0 4 0 0 2 nn
0 0 3 0 1.5 o
0 0 0 4 1.5 NaN
0 0 0 8 2 r
0 0 0 0 1 Inf
1 0 0 0 1.5 t


and the output on my machine:

0.0 0.0 0.0 0.0 1.0 Failed reading as number dummy is now 1.000000 a
....snip...

When I run this, there is no problem with NaN or Inf (fscanf converts ok
to float), but on 'i' and 'nn', the problem remains.
It seems that, instead of leaving the character in the input stream as
described in http://www.eskimo.com/~scs/C-faq/q12.19.html
fscanf jumps to the next one, but only on these particular characters
(i, ii, n, nn).
Interestingly, with 'iii' or 'nnn', fscanf reads 'i' and 'n' resp. and
similarly if you increase the number of 'i's and 'n's..

I am sure there are workarounds, but I am quite curious about what is
happening..

Just in case, I'm on RH9, gcc-3.2.2-5, glibc-2.3.2-27.9.7

Your loops do not appear correct or the loops may be getting
things out of sync. I would consider removing the loop and
use the suppression and scanset features of function fscanf.
This might give better control of troubleshooting problems.

Example:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
FILE *ifp;
int i,linenr;
float dummy;
char ref[10],filename[10]="test.csv",mode[10]="r";

if((ifp=fopen(filename,mode)) == NULL)
{
perror("Failed to open file test.csv");
exit(EXIT_FAILURE);
}
for(linenr = 1,*ref = '\0';
(i = fscanf(ifp,"%*f%*f%*f%*f%*[^1234567890\r\n]"
"%f%*[' ']%[^ \r\n]",&dummy, ref))!=EOF;
linenr++, *ref = '\0')
{
if(i == 2)
printf("line #: %d dummy = %.2f ref = \"%s\"\n",
linenr, dummy, ref);
else if(i == 1 && *ref == '\0' )
{
printf("line #: %d dummy = %.2f there is no ref\n"
"File Format failure: Exiting...\n",
linenr, dummy);
break;
}
else
{
printf("line #: %d No data read\n"
"File Format failure. Exiting...\n",linenr);
break;
}
}
fclose(ifp);
return 0;
}
 
E

Eric Sosman

Thomas said:
Here is a compilable demo of the problem, with the input file below

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {
FILE *ifp;
int i,j;
float dummy;
char ref[10],filename[10]="test.csv",mode[10]="r";

ifp=fopen(filename,mode);
for (i=0;i<26;++i) {
for (j=0;j<5;++j) {
fscanf(ifp,"%f",&dummy);
printf("%1.1f ",dummy);
}
if(fscanf(ifp,"%f",&dummy)==0) {
printf(" Failed reading as number ");
printf(" dummy is now %f ",dummy);
fscanf(ifp,"%s",ref);
printf("%s\n",ref);
} else {
printf("Read as number !\n");
printf("%f\n",dummy);
}
}
fclose(ifp);
}


the file test.csv:

0 0 0 0 1 a
1 0 0 0 1.5 b
0 0 0 8 2 i
0 0 0 0 1 j
0 2 0 0 1.5 m
0 4 0 0 2 nn
0 0 3 0 1.5 o
0 0 0 4 1.5 NaN
0 0 0 8 2 r
0 0 0 0 1 Inf
1 0 0 0 1.5 t


and the output on my machine:

0.0 0.0 0.0 0.0 1.0 Failed reading as number dummy is now 1.000000 a
1.0 0.0 0.0 0.0 1.5 Failed reading as number dummy is now 1.500000 b
0.0 0.0 0.0 8.0 2.0 Failed reading as number dummy is now 2.000000 0
0.0 0.0 0.0 1.0 1.0 Failed reading as number dummy is now 1.000000 j
0.0 2.0 0.0 0.0 1.5 Failed reading as number dummy is now 1.500000 m
0.0 4.0 0.0 0.0 2.0 Failed reading as number dummy is now 2.000000 0
0.0 3.0 0.0 1.5 1.5 Failed reading as number dummy is now 1.500000 o
0.0 0.0 0.0 4.0 1.5 Read as number !
nan
0.0 0.0 0.0 8.0 2.0 Failed reading as number dummy is now 2.000000 r
0.0 0.0 0.0 0.0 1.0 Read as number !
inf
1.0 0.0 0.0 0.0 1.5 Failed reading as number dummy is now 1.500000 t



When I run this, there is no problem with NaN or Inf (fscanf converts ok
to float), but on 'i' and 'nn', the problem remains.
It seems that, instead of leaving the character in the input stream as
described in http://www.eskimo.com/~scs/C-faq/q12.19.html
fscanf jumps to the next one, but only on these particular characters
(i, ii, n, nn).
Interestingly, with 'iii' or 'nnn', fscanf reads 'i' and 'n' resp. and
similarly if you increase the number of 'i's and 'n's..

I am sure there are workarounds, but I am quite curious about what is
happening..

I think Chris Torek's answer is the right one. When the
first non-white character encountered by "%f" is an 'i' or an
'n', it could be the beginning of "inf" or "nan". So fscanf()
reads the next character to try to match the remainder of the
"inf" or "nan", and if the next character is a newline the
match fails. However, the initial 'i' or 'n' has already
been read and accepted; here's what 7.19.6.2/9 has to say:

[...] An input item is defined as the longest sequence
of input characters [...] which is, or is a prefix of,
a matching input sequence. [...]

'i' and 'n' are prefixes of "inf" and "nan", so they are matched
and consumed by "%f". When the '\n' comes along the match fails,
but only the '\n' remains unconsumed: fprintf() can only push
back one character, and can't "rewind" the input to an arbitrary
position.

That explains what happens with "i\n" and "n\n", but it
doesn't explain the behavior on the "nn\n" line. I'd expect
the "%f" to consume the first 'n' as a prefix of "nan", then
choke on the second 'n' and push it back as a non-matching
character. Then your second attempt with "%s" should have
found the second 'n' again, followed by a newline, and should
have stored the one-character string "n" in `ref'. But it
looks like the second 'n' didn't get pushed back after the
matching failure, which may mean there's a bug in the fscanf()
implementation. (Or, of course, it may mean I've misread
what's supposed to happen; the possible forms of "nan" seem
to be pretty close to infinite ...)

For what it's worth, I tried your program on another
implementation and found what I think is a different incorrect
behavior: Both the "i" and the "nn" were read as strings by
the "%s" conversion. Thus, at least one of the implementations
is wrong -- and according to my (non-authoritative) reading of
the Standard, both are wrong!

So, what to do about your problem? Again, I think Chris'
suggestion is best: Don't use fscanf() to read lines of input.
Instead, use fgets() to read a line at a time and then use
other means -- possibly including sscanf() -- to pick them
apart. fscanf() doesn't always stop at a newline when you'd
want it to, but sscanf() absolutely *will* stop at a '\0',
and you won't "lose synchronization" with the input file.
 
T

Thomas Sourmail

Eric said:
Thomas said:
Here is a compilable demo of the problem, with the input file below

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {
FILE *ifp;
int i,j;
float dummy;
char ref[10],filename[10]="test.csv",mode[10]="r";

ifp=fopen(filename,mode);
for (i=0;i<26;++i) {
for (j=0;j<5;++j) {
fscanf(ifp,"%f",&dummy);
printf("%1.1f ",dummy);
}
if(fscanf(ifp,"%f",&dummy)==0) {
printf(" Failed reading as number ");
printf(" dummy is now %f ",dummy);
fscanf(ifp,"%s",ref);
printf("%s\n",ref);
} else {
printf("Read as number !\n");
printf("%f\n",dummy);
}
}
fclose(ifp);
}

snip..


When I run this, there is no problem with NaN or Inf (fscanf converts
ok to float), but on 'i' and 'nn', the problem remains.
It seems that, instead of leaving the character in the input stream as
described in http://www.eskimo.com/~scs/C-faq/q12.19.html
fscanf jumps to the next one, but only on these particular characters
(i, ii, n, nn).
Interestingly, with 'iii' or 'nnn', fscanf reads 'i' and 'n' resp. and
similarly if you increase the number of 'i's and 'n's..

I am sure there are workarounds, but I am quite curious about what is
happening..


I think Chris Torek's answer is the right one. When the
first non-white character encountered by "%f" is an 'i' or an
'n', it could be the beginning of "inf" or "nan". So fscanf()
reads the next character to try to match the remainder of the
"inf" or "nan", and if the next character is a newline the
match fails. However, the initial 'i' or 'n' has already
been read and accepted; here's what 7.19.6.2/9 has to say:

[...] An input item is defined as the longest sequence
of input characters [...] which is, or is a prefix of,
a matching input sequence. [...]

'i' and 'n' are prefixes of "inf" and "nan", so they are matched
and consumed by "%f". When the '\n' comes along the match fails,
but only the '\n' remains unconsumed: fprintf() can only push
back one character, and can't "rewind" the input to an arbitrary
position.

That explains what happens with "i\n" and "n\n", but it
doesn't explain the behavior on the "nn\n" line. I'd expect
the "%f" to consume the first 'n' as a prefix of "nan", then
choke on the second 'n' and push it back as a non-matching
character. Then your second attempt with "%s" should have
found the second 'n' again, followed by a newline, and should
have stored the one-character string "n" in `ref'. But it
looks like the second 'n' didn't get pushed back after the
matching failure, which may mean there's a bug in the fscanf()
implementation. (Or, of course, it may mean I've misread
what's supposed to happen; the possible forms of "nan" seem
to be pretty close to infinite ...)

For what it's worth, I tried your program on another
implementation and found what I think is a different incorrect
behavior: Both the "i" and the "nn" were read as strings by
the "%s" conversion. Thus, at least one of the implementations
is wrong -- and according to my (non-authoritative) reading of
the Standard, both are wrong!

So, what to do about your problem? Again, I think Chris'
suggestion is best: Don't use fscanf() to read lines of input.
Instead, use fgets() to read a line at a time and then use
other means -- possibly including sscanf() -- to pick them
apart. fscanf() doesn't always stop at a newline when you'd
want it to, but sscanf() absolutely *will* stop at a '\0',
and you won't "lose synchronization" with the input file.

Yes, that seems to be the right explanation. Strangely, the first
fscanf(ifp,"%f",&dummy) always seems to consume one more character than
should be necessary to distinguish whether the entry is inf or i**
something else.

For example, with 'inter' at the end, the second fscanf(ifp,"%s",ref)
returns 'er', but
'itter' -> 'ter'
'natto' -> 'to'
'nttto' -> 'tto'

Anyway, thanks a lot for all your help, I meant to use this method
because, in my real problem, the number of column is not fixed, and
sscanf does not 'move along the line' if I simply repeat it as I've done
above, I guess I have to try vsscanf..

Thomas.
 
C

Chris Torek

Anyway, thanks a lot for all your help, I meant to use this method
because, in my real problem, the number of column is not fixed, and
sscanf does not 'move along the line' if I simply repeat it as I've done
above ...

This is why I suggested that the "%n" conversion might also be
helpful.

Another method, of course, is to use strtod() and other "lower-level"
functions to take apart input lines.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top