Reading a data file

W

W. eWatson

[...] Here's what I have:
while(fgets(str,70,p)!=NULL){
n=sscanf(str,"%5f", &value);
printf("value = %5.1f %d\n", value,n);
^
one decimal place
}

value is float. str is char str[70];

Here's what the program produces:

value = 0.00 0
^^
two decimal places

The code you've shown isn't the code you're running.
Correct, but I found the mistake.
 
I

Ike Naar

The columns are fixed, so I would expect
20.3
1.45
to appear as they are.

In the sample input file that you posted earlier ("test.dat") the
columns were not fixed.
Interesting about %5f. It appears that a specifier with a decimal, e.g.,
%10.3, is only available when a printf (or similar cmd) is used.
Correct.

I decided to catch the output of sscanf. Here's what I have:
while(fgets(str,70,p)!=NULL){
n=sscanf(str,"%5f", &value);
printf("value = %5.1f %d\n", value,n);
}
value is float. str is char str[70];
Here's what the program produces:

Unfortunately you haven't shown what the input looks like.
value = 0.00 0
[...]
value = 0.00 -1

About 39 lines!!!
n goes negative?

7.21.6.7 The sscanf function
[...]
3 The sscanf function returns the value of the macro EOF if an input
failure occurs before the first conversion (if any) has completed.
Otherwise, the sscanf function returns the number of input items
assigned, which can be fewer than provided for, or even zero,
in the event of an early matching failure.

Apparently the value of the macro EOF equals -1 in your situation
(which is the usual value for EOF). With your given sscanf call,

n=sscanf(str,"%5f", &value);

if the input string str represents a floating-point number
such as, say, "123.4", 123.4 is assigned to value and sscanf
returns 1, indicating it has successfully assigned 1 item.

if the input string str, looks like, say, "" (the empty string),
or " " (whitespace only), an input failure occurs before the
first conversion has completed (the end of the string is reached
while skipping initial whitespace), and sscanf returns EOF.

if the input string str looks like, say, "xyz", an input failure
occurs during the first conversion ("xyz" does not represent a
floating-point number), and sscanf returns 0 since no input items
were assigned.
 
W

W. eWatson

I was right when I mentioned the above post as "Maybe the culprit is
that I'm using a data file that was produced in Win7? End of line
incompatibility?"

I created a new dat file with vi, and the results were as expected.
Output was correct.

I'm using gcc under MinGW, and it's been 10 years since I used vi. My
wife is familiar with it, and give me a little help. I'm now brushing up
on it.


$ cat tst_array.dat
123.1 42.1 1.23
321.0 2.44 8,9

value = 123.10 1
value = 321.00 1

Now my problem is to read an entire line. The current code is:
while(fgets(str,70,p)!=NULL){
n=sscanf(str,"%5f", &value);
printf("value = %5.2f %d\n", value,n);
}

The real data I ultimately need is from a fortran name list. Here's a
bit of it. For array xa:
8927.0000 , 8415.4004 , 8037.0000 , 7579.0000 , 7133.7998 ,
6680.7998 , 6229.2002 , 5784.6001 , 5291.7998 , 4819.7998 , 4328.3999 ,
3854.3999 , 3361.0000 , 2840.6001 , 2332.0000 , 1814.0000 , 1290.0000 ,

For array xb:
Just like the above but with different numbers. There are about 80
numbers in each column of the arrays.


Columns are fixed, and occasionally one encounters no data in the last
column. Actually, it's always the first row.

Off to pondering that issue.
 
E

Eric Sosman

I was right when I mentioned the above post as "Maybe the culprit is
that I'm using a data file that was produced in Win7? End of line
incompatibility?"

Possible, but not likely. Feed a line ending with \r\n to a
POSIX text stream, and it will understand the \r as a data character,
not as part of the line-ending protocol. But the "%f" specifier
skips white space, and \r is a white space character -- it should
behave just like a trailing blank for the purposes of your code.
I strongly suspect something else was going on.
Now my problem is to read an entire line. [...]
The real data I ultimately need is from a fortran name list. Here's a
bit of it. For array xa:
8927.0000 , 8415.4004 , 8037.0000 , 7579.0000 , 7133.7998 ,
6680.7998 , 6229.2002 , 5784.6001 , 5291.7998 , 4819.7998 , 4328.3999 ,
3854.3999 , 3361.0000 , 2840.6001 , 2332.0000 , 1814.0000 , 1290.0000 ,

For array xb:
Just like the above but with different numbers. There are about 80
numbers in each column of the arrays.

Columns are fixed, and occasionally one encounters no data in the last
column. Actually, it's always the first row.

Off to pondering that issue.

I'll point you toward James Kuyper's suggestion: Use the
strtod() function (or strtof(), if you prefer). It will skip
leading white space, convert a number, and tell you where it
stopped. You can restart from that spot, check for and/or
ignore the comma, and repeat until you've eaten the whole line.

I'll also put in a plug for the strtok() function. It's
got its drawbacks, but takes more flak than it deserves -- and
for the input you have, it'll work just fine. The outline:
Read a whole line, use strtok() to divide it into stretches that
contain no spaces, commas, \t, \r, or \n, and use strtod() to
convert what you find in those stretches:

char buffer[WHATEVER];
// read a line into buffer[]
for (char *p = buffer;
(p = strtok(p, " ,\t\r\n")) != NULL;
p = NULL)
{
// p points to a stretch of "important" characters
char *q;
double value = strtod(p, &q);
if (q == '\0') {
// converted the whole stretch; use value
} else {
// conversion stopped early; bad input
}
}
 
W

W. eWatson

Possible, but not likely. Feed a line ending with \r\n to a
POSIX text stream, and it will understand the \r as a data character,
not as part of the line-ending protocol. But the "%f" specifier
skips white space, and \r is a white space character -- it should
behave just like a trailing blank for the purposes of your code.
I strongly suspect something else was going on.
.... You might be right. I've written the some code using jEdit using it
in Win 7. I then carry it over to gcc Linux, and compile it. Apparently,
the compiler isn't bothered by the Win CR.
I'll point you toward James Kuyper's suggestion: Use the
strtod() function (or strtof(), if you prefer). It will skip
leading white space, convert a number, and tell you where it
stopped. You can restart from that spot, check for and/or
ignore the comma, and repeat until you've eaten the whole line.

I'll also put in a plug for the strtok() function. It's
got its drawbacks, but takes more flak than it deserves -- and
for the input you have, it'll work just fine. The outline:
Read a whole line, use strtok() to divide it into stretches that
contain no spaces, commas, \t, \r, or \n, and use strtod() to
convert what you find in those stretches:

char buffer[WHATEVER];
// read a line into buffer[]
for (char *p = buffer;
(p = strtok(p, " ,\t\r\n")) != NULL;
p = NULL)
{
// p points to a stretch of "important" characters
char *q;
double value = strtod(p, &q);
if (q == '\0') {
// converted the whole stretch; use value
} else {
// conversion stopped early; bad input
}
}
I printed James's comments from above. As it turns out my old C book
has an example of using strok.
 
J

James Kuyper

I was right when I mentioned the above post as "Maybe the culprit is
that I'm using a data file that was produced in Win7? End of line
incompatibility?"

I created a new dat file with vi, and the results were as expected.
Output was correct.

I'm using gcc under MinGW, and it's been 10 years since I used vi. My
wife is familiar with it, and give me a little help. I'm now brushing up
on it.


$ cat tst_array.dat
123.1 42.1 1.23
321.0 2.44 8,9

value = 123.10 1
value = 321.00 1

Now my problem is to read an entire line. The current code is:
while(fgets(str,70,p)!=NULL){
n=sscanf(str,"%5f", &value);
printf("value = %5.2f %d\n", value,n);
}

The real data I ultimately need is from a fortran name list. Here's a
bit of it. For array xa:
8927.0000 , 8415.4004 , 8037.0000 , 7579.0000 , 7133.7998 ,
6680.7998 , 6229.2002 , 5784.6001 , 5291.7998 , 4819.7998 , 4328.3999 ,
3854.3999 , 3361.0000 , 2840.6001 , 2332.0000 , 1814.0000 , 1290.0000 ,

For array xb:
Just like the above but with different numbers. There are about 80
numbers in each column of the arrays.


Columns are fixed, and occasionally one encounters no data in the last
column. Actually, it's always the first row.

From what you've said, I don't know how your program should figure out
when it's reached the end of array xa. The following suggestion may need
modification, based upon the answer to that question.

In order to use sscanf() for this purpose, you need a format string with
sscanf that includes the comma after each number and the following
comma: "%f ,".

scanf() directives normally take the form of conversion specifications
that start with a % character, but every character in a format string
that is not part of a conversion specification counts as one of two
possible kinds of directives. If it is white space character, the
directive "is executed by reading input up to the first non-white-space
character (which remains unread), or until no more characters can be
read. The directive nev er fails." That is what the space character in
"%f ," is for. All other characters that are not part of a conversion
specification must match the input file exactly, or there is a
conversion failure - that is what the comma is for.

Depending upon what you're doing with this data, it may be inappropriate
to have sscanf() fail just because something other than a comma appears
in that location. In that case, reach the character using a %c
conversion specifier, and figure out what you want your code to do if
it's not a comma.
 
W

W. eWatson

I merged your code into mine. It appears I muffed something in line 31.

gcc NL_pxm-array.c
_pxm-array.c: In function 'main':
_pxm-array.c:31:5: error: 'for' loop initial declarations are only
allowed in C99 mode
_pxm-array.c:31:5: note: use option -std=c99 or -std=gnu99 to compile
your code
_pxm-array.c:32:13: warning: assignment makes pointer from integer
without a cast [enabled by default]

line 31 is: for (char *p = buffer;

#include<stdio.h>
#include<stdlib.h>

int
main(void)
{
float pxm[2][80];
int i,j,k,n;
float value;
char str[70];
FILE *p;

if((p=fopen("pxm_array-test.dat","r"))==NULL){
printf("\nUnable to open file pxm_array-test.dat");
exit(1);
}
/*
for (j = 0; j < 80; j++) {
for (i = 0; i < 2; i++){
pxm[j] = i + j;
printf("%5.1f ", pxm[j]);
}
printf("\n");
}
*/
#define delims
#define WHATEVER 80
char buffer[WHATEVER];
while(fgets(buffer,70,p)!=NULL){
/* read a line into buffer[] */
for (char *p = buffer; <- line 31
(p = strtok(p, " ,\t\r\n")) != NULL;
p = NULL)
{
/* p points to a stretch of "important" characters */
char *q;
double value = strtod(p, &q);
if (q == '\0') {
/* converted the whole stretch; use value */
} else {
/* conversion stopped early; bad input */
}
}

}
fclose(p);
exit(0);
}
 
J

James Kuyper

I merged your code into mine. It appears I muffed something in line 31.

gcc NL_pxm-array.c
_pxm-array.c: In function 'main':
_pxm-array.c:31:5: error: 'for' loop initial declarations are only
allowed in C99 mode
_pxm-array.c:31:5: note: use option -std=c99 or -std=gnu99 to compile
your code

So - follow the instructions. Add the option -std=c99 to your compiler
command line.
_pxm-array.c:32:13: warning: assignment makes pointer from integer
without a cast [enabled by default]

This implies that the compiler thinks that the call to strtok() returns
an integer, which is not the case. Why would it think that? Because
strtok() is declared in <string.h>, and your code doesn't include that
header. In C90, if you used an undeclared identifier as if it were the
name of a function, it get implicitly declared as a function returning
'int'. C99 has more reasonable behavior: it's a constraint violation to
attempt calling an undeclared function.
 
W

W. eWatson

I merged your code into mine. It appears I muffed something in line 31.

gcc NL_pxm-array.c
_pxm-array.c: In function 'main':
_pxm-array.c:31:5: error: 'for' loop initial declarations are only
allowed in C99 mode
_pxm-array.c:31:5: note: use option -std=c99 or -std=gnu99 to compile
your code

So - follow the instructions. Add the option -std=c99 to your compiler
command line.
_pxm-array.c:32:13: warning: assignment makes pointer from integer
without a cast [enabled by default]

This implies that the compiler thinks that the call to strtok() returns
an integer, which is not the case. Why would it think that? Because
strtok() is declared in <string.h>, and your code doesn't include that
header. In C90, if you used an undeclared identifier as if it were the
name of a function, it get implicitly declared as a function returning
'int'. C99 has more reasonable behavior: it's a constraint violation to
attempt calling an undeclared function.
#include<stdio.h>
#include<stdlib.h> ...
for (char *p = buffer; <- line 31
(p = strtok(p, " ,\t\r\n")) != NULL;
Ah, missed the "instructions". C90, C99? What are they? Modes?? I used
c99 amd added #include <string>, and it compiled successfully.

It could not open pxm_array-test.dat, but it looks like it clearly in
the same folder.

Unable to open file pxm_array-test.dat
Wayne@solarblast1 /home/wayne/MeteorProject/SampleCode
$ cat pxm_data-test.dat
8927.0000 , 8415.4004 , 8037.0000 , 7579.0000 ,
7133.7998 ,
6680.7998 , 6229.2002 , 5784.6001 , 5291.7998 ,
4819.7998 , 4328.3999 ,
3854.3999 , 3361.0000 , 2840.6001 , 2332.0000 ,
1814.0000 , 1290.0000 ,
741.00000 , 213.20000 , -340.39999 , -931.40002 ,
-1494.8000 , -2079.6001 ,
-2669.6001 , -3256.3999 , -3868.0000 , -4513.2002 ,
-5128.7998 , -5783.6001 ,


Wayne@solarblast1 /home/wayne/MeteorProject/SampleCode
$ a

Unable to open file pxm_array-test.dat
Wayne@solarblast1 /home/wayne/MeteorProject/SampleCode
$ ls -l pxm_data-test.dat
-rw-r--r-- 1 Wayne Administrators 477 Jul 20 15:28 pxm_data-test.dat
 
W

W. eWatson

From what you've said, I don't know how your program should figure out
when it's reached the end of array xa. The following suggestion may need
modification, based upon the answer to that question.

In order to use sscanf() for this purpose, you need a format string with
sscanf that includes the comma after each number and the following
comma: "%f ,".

scanf() directives normally take the form of conversion specifications
that start with a % character, but every character in a format string
that is not part of a conversion specification counts as one of two
possible kinds of directives. If it is white space character, the
directive "is executed by reading input up to the first non-white-space
character (which remains unread), or until no more characters can be
read. The directive nev er fails." That is what the space character in
"%f ," is for. All other characters that are not part of a conversion
specification must match the input file exactly, or there is a
conversion failure - that is what the comma is for.

Depending upon what you're doing with this data, it may be inappropriate
to have sscanf() fail just because something other than a comma appears
in that location. In that case, reach the character using a %c
conversion specifier, and figure out what you want your code to do if
it's not a comma.
As it turns out, the comma is the end of the array, but the very next
line contains the name of another variable. I should be able to detect
that.

&INSTRUMENT
BN= 2*6.1999998 ,
FL= 2*3.2000000 ,
PXM=
8927.0000 , 8415.4004 , 8037.0000 , 7579.0000 , 7133.7998 ,
6680.7998 , 6229.2002 , 5784.6001 , 5291.7998 , 4819.7998 , 4328.3999 ,
3854.3999 , 3361.0000 , 2840.6001 , 2332.0000 , 1814.0000 , 1290.0000 ,
741.00000 , 213.20000 , -340.39999 , -931.40002 , -1494.8000 ,
...
-21194.600 , -22386.801 , -23614.600 , 41*0.0000000 , <- note 41
PXQ= -1800.0000 , -2500.0000 ,
PYM=
-11341.000 , -11482.400 , -11592.600 , -11735.400 , -11875.400 ,
-12014.400 , -12146.000 , -12274.800 , -12433.400 , -12576.400 ,
-12733.800 ,
....

In this case the PXM array is followed by PXQ and PYM. Note the
41*0.000. That tells the namelist there are 41 zero elements next. One
of the good things about the data lines is there are only 99 lines
total. That makes it easy for me to manually modify the arrays. That is,
I could take out the dangling comma, or change the zero notation to
simplify matters.

PXM is a 80x2 array, and, as luck would have it, the array is divided
into two array columns at zero. That is, 1 to 39 contains data for the
first column, and 40 to 80 the second column.

This namelist data is something of a standard for what I'm doing. It has
become a test for the program that reads it, progB (.f90). The reason
I'm juggling data around is that in the future, another program, progA
(written in an unusual language), will generate it. It will need to be
modified to produce a namelist for B. It's NL needs to be checked out
against whether it can generate a proper NL, the standard. Each program
is about 2000 lines of code. Don't if that's helpful, but anyway this is
not a small effort.
 
W

W. eWatson

Strange. I changed the name of the file, and the program ran. Now I need
to sit back and think how I'm going to set this up for pgrmA that I
mentioned as post or two or above.

Thanks to all for the help.
 
W

W. eWatson

Strange. I changed the name of the file, and the program ran. Now I need
to sit back and think how I'm going to set this up for pgrmA that I
mentioned as post or two or above.

Thanks to all for the help.
BTW, I asked my colleague who wrote "the program A", what language he
used. c++ with Open Computer Vision, OCV, libraries. Program A
interfaces with cameras.
 
I

Ike Naar

It could not open pxm_array-test.dat, but it looks like it clearly in
the same folder.
Unable to open file pxm_array-test.dat
Wayne@solarblast1 /home/wayne/MeteorProject/SampleCode
$ cat pxm_data-test.dat

pxm_data-test.dat

vs.

pxm_array-test.dat
 
J

James Kuyper

I was in a bit of hurry when I wrote that response - I should have
explained what the problem is.

In C90, declarations are allowed only at file scope, or at the start of
a block. When Bjarne Stroustrup designed C++, he thought it would be a
good idea to allow declarations in a wider variety of places. One of
those places is in the first part of a for() statement. The C committee
agreed that this was a good idea, and put it into C99. I agree with
them, but you'll find other people who don't. Some are even stricter
than the C90 standard - they won't declare variables in inner blocks of
a function, only in the outermost block.
_pxm-array.c:32:13: warning: assignment makes pointer from integer
without a cast [enabled by default]

This implies that the compiler thinks that the call to strtok() returns
an integer, which is not the case. Why would it think that? Because
strtok() is declared in <string.h>, and your code doesn't include that
header. In C90, if you used an undeclared identifier as if it were the
name of a function, it get implicitly declared as a function returning
'int'. C99 has more reasonable behavior: it's a constraint violation to
attempt calling an undeclared function.
#include<stdio.h>
#include<stdlib.h> ...
for (char *p = buffer; <- line 31
(p = strtok(p, " ,\t\r\n")) != NULL;
Ah, missed the "instructions". C90, C99? What are they? Modes?? I used
c99 amd added #include <string>, and it compiled successfully.

The first widely used version of C was the one described by Kernighan
and Ritchie in "The C Programming Language", and that version is called
K&R C. It was not, however, a single version, but a different versions
for each compiler. The first standard for the C programming language was
an ANSI (US) standard that was approved in 1989; the language defined by
that standard is often called C89. Essentially the same document was
approved as an ISO (international) standard in 1990 - the only changes
were the addition of three sections at the beginning of the document to
conform to ISO requirement. That language is often called C90. People
were tired of having to write different code for different compilers, so
the C90 standard was widely and rather quickly adopted. For almost every
platform for which any kind of compiler is available, there's one that
will compile a variant of C, and on most of those platforms, there's a
compiler that can be put into a mode that conforms to C90.

A major revision of the standard occurred in 1999, which was fully
implemented only by a small number of compilers, but parts of C99 are
widely supported. Another update occurred in 2011, but it has not had
time to be widely adopted yet. The languages described by those versions
of the standard are usually called C99 and C11, respectively. I
personally prefer to call it C2011, to avoid Y2K issues, but I seem to
be the only one.

In it's default mode, gcc compiles a language called GnuC, a language
closely related to C, but having many non-conforming extensions to it.
It's possible for an extension to be fully conforming, but many of
GnuC's extensions are not. The -ansi option is equivalent to -std=c99.
The combination -std=c90 -pedantic puts gcc into a mode where it fully
conforms to C90. With -std=C99, it conforms pretty well, but not
completely, with C99. The option for C2011 -std=C1X, because it wasn't
clear at the time that they added that option exactly when the new
standard would be approved.
 
J

James Kuyper

As it turns out, the comma is the end of the array, but the very next
line contains the name of another variable. I should be able to detect
that.

&INSTRUMENT
BN= 2*6.1999998 ,
FL= 2*3.2000000 ,
PXM=
8927.0000 , 8415.4004 , 8037.0000 , 7579.0000 , 7133.7998 ,
6680.7998 , 6229.2002 , 5784.6001 , 5291.7998 , 4819.7998 , 4328.3999 ,
3854.3999 , 3361.0000 , 2840.6001 , 2332.0000 , 1814.0000 , 1290.0000 ,
741.00000 , 213.20000 , -340.39999 , -931.40002 , -1494.8000 ,
...
-21194.600 , -22386.801 , -23614.600 , 41*0.0000000 , <- note 41
PXQ= -1800.0000 , -2500.0000 ,
PYM=

OK - this is good - every number is followed by a comma, so you don't
need to write special case code to handle the last number. Terminate the
loop when sscanf() returns 0, and try to parse the buffer as the start
of new array.
 
K

Keith Thompson

In it's default mode, gcc compiles a language called GnuC, a language
closely related to C, but having many non-conforming extensions to it.
It's possible for an extension to be fully conforming, but many of
GnuC's extensions are not. The -ansi option is equivalent to -std=c99.
The combination -std=c90 -pedantic puts gcc into a mode where it fully
conforms to C90. With -std=C99, it conforms pretty well, but not
completely, with C99. The option for C2011 -std=C1X, because it wasn't
clear at the time that they added that option exactly when the new
standard would be approved.

Correction: -ansi is equivalent to -std=c90 (that was probably just
a typo).

In recent versions of gcc, -ansi, -std=c89, and -std=c90 are all
equivalent. The name "-ansi" is strictly incorrect, since ANSI
(the American National Standards Institute) currently recognizes
the 2011 ISO C standard and no earlier ones, but the name has stuck
around for historical reasons. Some older versions of gcc do not
recognize -std=c90, but they do accept -ansi and -std=c89.

Newer versions do recognize -std=c11.

The default, with no -std=... option, is equivalent to -std=gnu90,
which specifies C90 plus some GNU extensions, some of which conflict
with the C90 standard. There are also -std=gnu99 and -std=gnu11
options. Eventually the default behavior will probably change to
one of those, once C99 or C11 support is complete.

If you use one of the -std=c?? options *without* -pedantic, the compiler
doesn't (attempt to) fully conform to the specified standard; it quietly
accepts some constructs for which the standard requires diagnostics.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top