Reading a data file

Discussion in 'C Programming' started by W. eWatson, Jul 19, 2013.

  1. What book are you using?
     
    Keith Thompson, Jul 20, 2013
    #21
    1. Advertisements

  2. W. eWatson

    W. eWatson Guest

    Correct, but I found the mistake.
     
    W. eWatson, Jul 20, 2013
    #22
    1. Advertisements

  3. W. eWatson

    W. eWatson Guest

    Comprehensive C.
     
    W. eWatson, Jul 20, 2013
    #23
  4. W. eWatson

    Ike Naar Guest

    In the sample input file that you posted earlier ("test.dat") the
    columns were not fixed.
    Unfortunately you haven't shown what the input looks like.
    7.21.6.7 The sscanf function
    [...]
    3 The sscanf function returns the value of the macro EOF if an input
    failure occurs before the first conversion (if any) has completed.
    Otherwise, the sscanf function returns the number of input items
    assigned, which can be fewer than provided for, or even zero,
    in the event of an early matching failure.

    Apparently the value of the macro EOF equals -1 in your situation
    (which is the usual value for EOF). With your given sscanf call,

    n=sscanf(str,"%5f", &value);

    if the input string str represents a floating-point number
    such as, say, "123.4", 123.4 is assigned to value and sscanf
    returns 1, indicating it has successfully assigned 1 item.

    if the input string str, looks like, say, "" (the empty string),
    or " " (whitespace only), an input failure occurs before the
    first conversion has completed (the end of the string is reached
    while skipping initial whitespace), and sscanf returns EOF.

    if the input string str looks like, say, "xyz", an input failure
    occurs during the first conversion ("xyz" does not represent a
    floating-point number), and sscanf returns 0 since no input items
    were assigned.
     
    Ike Naar, Jul 20, 2013
    #24
  5. W. eWatson

    W. eWatson Guest

    I was right when I mentioned the above post as "Maybe the culprit is
    that I'm using a data file that was produced in Win7? End of line
    incompatibility?"

    I created a new dat file with vi, and the results were as expected.
    Output was correct.

    I'm using gcc under MinGW, and it's been 10 years since I used vi. My
    wife is familiar with it, and give me a little help. I'm now brushing up
    on it.


    $ cat tst_array.dat
    123.1 42.1 1.23
    321.0 2.44 8,9

    value = 123.10 1
    value = 321.00 1

    Now my problem is to read an entire line. The current code is:
    while(fgets(str,70,p)!=NULL){
    n=sscanf(str,"%5f", &value);
    printf("value = %5.2f %d\n", value,n);
    }

    The real data I ultimately need is from a fortran name list. Here's a
    bit of it. For array xa:
    8927.0000 , 8415.4004 , 8037.0000 , 7579.0000 , 7133.7998 ,
    6680.7998 , 6229.2002 , 5784.6001 , 5291.7998 , 4819.7998 , 4328.3999 ,
    3854.3999 , 3361.0000 , 2840.6001 , 2332.0000 , 1814.0000 , 1290.0000 ,

    For array xb:
    Just like the above but with different numbers. There are about 80
    numbers in each column of the arrays.


    Columns are fixed, and occasionally one encounters no data in the last
    column. Actually, it's always the first row.

    Off to pondering that issue.
     
    W. eWatson, Jul 20, 2013
    #25
  6. W. eWatson

    Eric Sosman Guest

    Possible, but not likely. Feed a line ending with \r\n to a
    POSIX text stream, and it will understand the \r as a data character,
    not as part of the line-ending protocol. But the "%f" specifier
    skips white space, and \r is a white space character -- it should
    behave just like a trailing blank for the purposes of your code.
    I strongly suspect something else was going on.
    I'll point you toward James Kuyper's suggestion: Use the
    strtod() function (or strtof(), if you prefer). It will skip
    leading white space, convert a number, and tell you where it
    stopped. You can restart from that spot, check for and/or
    ignore the comma, and repeat until you've eaten the whole line.

    I'll also put in a plug for the strtok() function. It's
    got its drawbacks, but takes more flak than it deserves -- and
    for the input you have, it'll work just fine. The outline:
    Read a whole line, use strtok() to divide it into stretches that
    contain no spaces, commas, \t, \r, or \n, and use strtod() to
    convert what you find in those stretches:

    char buffer[WHATEVER];
    // read a line into buffer[]
    for (char *p = buffer;
    (p = strtok(p, " ,\t\r\n")) != NULL;
    p = NULL)
    {
    // p points to a stretch of "important" characters
    char *q;
    double value = strtod(p, &q);
    if (q == '\0') {
    // converted the whole stretch; use value
    } else {
    // conversion stopped early; bad input
    }
    }
     
    Eric Sosman, Jul 20, 2013
    #26
  7. W. eWatson

    W. eWatson Guest

    .... You might be right. I've written the some code using jEdit using it
    in Win 7. I then carry it over to gcc Linux, and compile it. Apparently,
    the compiler isn't bothered by the Win CR.
    I printed James's comments from above. As it turns out my old C book
    has an example of using strok.
     
    W. eWatson, Jul 20, 2013
    #27
  8. W. eWatson

    James Kuyper Guest

    From what you've said, I don't know how your program should figure out
    when it's reached the end of array xa. The following suggestion may need
    modification, based upon the answer to that question.

    In order to use sscanf() for this purpose, you need a format string with
    sscanf that includes the comma after each number and the following
    comma: "%f ,".

    scanf() directives normally take the form of conversion specifications
    that start with a % character, but every character in a format string
    that is not part of a conversion specification counts as one of two
    possible kinds of directives. If it is white space character, the
    directive "is executed by reading input up to the first non-white-space
    character (which remains unread), or until no more characters can be
    read. The directive nev er fails." That is what the space character in
    "%f ," is for. All other characters that are not part of a conversion
    specification must match the input file exactly, or there is a
    conversion failure - that is what the comma is for.

    Depending upon what you're doing with this data, it may be inappropriate
    to have sscanf() fail just because something other than a comma appears
    in that location. In that case, reach the character using a %c
    conversion specifier, and figure out what you want your code to do if
    it's not a comma.
     
    James Kuyper, Jul 20, 2013
    #28
  9. Keith Thompson, Jul 20, 2013
    #29
  10. W. eWatson

    W. eWatson Guest

    I merged your code into mine. It appears I muffed something in line 31.

    gcc NL_pxm-array.c
    _pxm-array.c: In function 'main':
    _pxm-array.c:31:5: error: 'for' loop initial declarations are only
    allowed in C99 mode
    _pxm-array.c:31:5: note: use option -std=c99 or -std=gnu99 to compile
    your code
    _pxm-array.c:32:13: warning: assignment makes pointer from integer
    without a cast [enabled by default]

    line 31 is: for (char *p = buffer;

    #include<stdio.h>
    #include<stdlib.h>

    int
    main(void)
    {
    float pxm[2][80];
    int i,j,k,n;
    float value;
    char str[70];
    FILE *p;

    if((p=fopen("pxm_array-test.dat","r"))==NULL){
    printf("\nUnable to open file pxm_array-test.dat");
    exit(1);
    }
    /*
    for (j = 0; j < 80; j++) {
    for (i = 0; i < 2; i++){
    pxm[j] = i + j;
    printf("%5.1f ", pxm[j]);
    }
    printf("\n");
    }
    */
    #define delims
    #define WHATEVER 80
    char buffer[WHATEVER];
    while(fgets(buffer,70,p)!=NULL){
    /* read a line into buffer[] */
    for (char *p = buffer; <- line 31
    (p = strtok(p, " ,\t\r\n")) != NULL;
    p = NULL)
    {
    /* p points to a stretch of "important" characters */
    char *q;
    double value = strtod(p, &q);
    if (q == '\0') {
    /* converted the whole stretch; use value */
    } else {
    /* conversion stopped early; bad input */
    }
    }

    }
    fclose(p);
    exit(0);
    }
     
    W. eWatson, Jul 21, 2013
    #30
  11. W. eWatson

    James Kuyper Guest

    So - follow the instructions. Add the option -std=c99 to your compiler
    command line.
    This implies that the compiler thinks that the call to strtok() returns
    an integer, which is not the case. Why would it think that? Because
    strtok() is declared in <string.h>, and your code doesn't include that
    header. In C90, if you used an undeclared identifier as if it were the
    name of a function, it get implicitly declared as a function returning
    'int'. C99 has more reasonable behavior: it's a constraint violation to
    attempt calling an undeclared function.
     
    James Kuyper, Jul 21, 2013
    #31
  12. W. eWatson

    W. eWatson Guest

    Ah, missed the "instructions". C90, C99? What are they? Modes?? I used
    c99 amd added #include <string>, and it compiled successfully.

    It could not open pxm_array-test.dat, but it looks like it clearly in
    the same folder.

    Unable to open file pxm_array-test.dat
    Wayne@solarblast1 /home/wayne/MeteorProject/SampleCode
    $ cat pxm_data-test.dat
    8927.0000 , 8415.4004 , 8037.0000 , 7579.0000 ,
    7133.7998 ,
    6680.7998 , 6229.2002 , 5784.6001 , 5291.7998 ,
    4819.7998 , 4328.3999 ,
    3854.3999 , 3361.0000 , 2840.6001 , 2332.0000 ,
    1814.0000 , 1290.0000 ,
    741.00000 , 213.20000 , -340.39999 , -931.40002 ,
    -1494.8000 , -2079.6001 ,
    -2669.6001 , -3256.3999 , -3868.0000 , -4513.2002 ,
    -5128.7998 , -5783.6001 ,


    Wayne@solarblast1 /home/wayne/MeteorProject/SampleCode
    $ a

    Unable to open file pxm_array-test.dat
    Wayne@solarblast1 /home/wayne/MeteorProject/SampleCode
    $ ls -l pxm_data-test.dat
    -rw-r--r-- 1 Wayne Administrators 477 Jul 20 15:28 pxm_data-test.dat
     
    W. eWatson, Jul 21, 2013
    #32
  13. W. eWatson

    W. eWatson Guest

    As it turns out, the comma is the end of the array, but the very next
    line contains the name of another variable. I should be able to detect
    that.

    &INSTRUMENT
    BN= 2*6.1999998 ,
    FL= 2*3.2000000 ,
    PXM=
    8927.0000 , 8415.4004 , 8037.0000 , 7579.0000 , 7133.7998 ,
    6680.7998 , 6229.2002 , 5784.6001 , 5291.7998 , 4819.7998 , 4328.3999 ,
    3854.3999 , 3361.0000 , 2840.6001 , 2332.0000 , 1814.0000 , 1290.0000 ,
    741.00000 , 213.20000 , -340.39999 , -931.40002 , -1494.8000 ,
    ...
    -21194.600 , -22386.801 , -23614.600 , 41*0.0000000 , <- note 41
    PXQ= -1800.0000 , -2500.0000 ,
    PYM=
    -11341.000 , -11482.400 , -11592.600 , -11735.400 , -11875.400 ,
    -12014.400 , -12146.000 , -12274.800 , -12433.400 , -12576.400 ,
    -12733.800 ,
    ....

    In this case the PXM array is followed by PXQ and PYM. Note the
    41*0.000. That tells the namelist there are 41 zero elements next. One
    of the good things about the data lines is there are only 99 lines
    total. That makes it easy for me to manually modify the arrays. That is,
    I could take out the dangling comma, or change the zero notation to
    simplify matters.

    PXM is a 80x2 array, and, as luck would have it, the array is divided
    into two array columns at zero. That is, 1 to 39 contains data for the
    first column, and 40 to 80 the second column.

    This namelist data is something of a standard for what I'm doing. It has
    become a test for the program that reads it, progB (.f90). The reason
    I'm juggling data around is that in the future, another program, progA
    (written in an unusual language), will generate it. It will need to be
    modified to produce a namelist for B. It's NL needs to be checked out
    against whether it can generate a proper NL, the standard. Each program
    is about 2000 lines of code. Don't if that's helpful, but anyway this is
    not a small effort.
     
    W. eWatson, Jul 21, 2013
    #33
  14. W. eWatson

    W. eWatson Guest

    Strange. I changed the name of the file, and the program ran. Now I need
    to sit back and think how I'm going to set this up for pgrmA that I
    mentioned as post or two or above.

    Thanks to all for the help.
     
    W. eWatson, Jul 21, 2013
    #34
  15. W. eWatson

    W. eWatson Guest

    BTW, I asked my colleague who wrote "the program A", what language he
    used. c++ with Open Computer Vision, OCV, libraries. Program A
    interfaces with cameras.
     
    W. eWatson, Jul 21, 2013
    #35
  16. W. eWatson

    Ike Naar Guest

    pxm_data-test.dat

    vs.

    pxm_array-test.dat
     
    Ike Naar, Jul 21, 2013
    #36
  17. W. eWatson

    James Kuyper Guest

    I was in a bit of hurry when I wrote that response - I should have
    explained what the problem is.

    In C90, declarations are allowed only at file scope, or at the start of
    a block. When Bjarne Stroustrup designed C++, he thought it would be a
    good idea to allow declarations in a wider variety of places. One of
    those places is in the first part of a for() statement. The C committee
    agreed that this was a good idea, and put it into C99. I agree with
    them, but you'll find other people who don't. Some are even stricter
    than the C90 standard - they won't declare variables in inner blocks of
    a function, only in the outermost block.
    The first widely used version of C was the one described by Kernighan
    and Ritchie in "The C Programming Language", and that version is called
    K&R C. It was not, however, a single version, but a different versions
    for each compiler. The first standard for the C programming language was
    an ANSI (US) standard that was approved in 1989; the language defined by
    that standard is often called C89. Essentially the same document was
    approved as an ISO (international) standard in 1990 - the only changes
    were the addition of three sections at the beginning of the document to
    conform to ISO requirement. That language is often called C90. People
    were tired of having to write different code for different compilers, so
    the C90 standard was widely and rather quickly adopted. For almost every
    platform for which any kind of compiler is available, there's one that
    will compile a variant of C, and on most of those platforms, there's a
    compiler that can be put into a mode that conforms to C90.

    A major revision of the standard occurred in 1999, which was fully
    implemented only by a small number of compilers, but parts of C99 are
    widely supported. Another update occurred in 2011, but it has not had
    time to be widely adopted yet. The languages described by those versions
    of the standard are usually called C99 and C11, respectively. I
    personally prefer to call it C2011, to avoid Y2K issues, but I seem to
    be the only one.

    In it's default mode, gcc compiles a language called GnuC, a language
    closely related to C, but having many non-conforming extensions to it.
    It's possible for an extension to be fully conforming, but many of
    GnuC's extensions are not. The -ansi option is equivalent to -std=c99.
    The combination -std=c90 -pedantic puts gcc into a mode where it fully
    conforms to C90. With -std=C99, it conforms pretty well, but not
    completely, with C99. The option for C2011 -std=C1X, because it wasn't
    clear at the time that they added that option exactly when the new
    standard would be approved.
     
    James Kuyper, Jul 21, 2013
    #37
  18. W. eWatson

    James Kuyper Guest

    OK - this is good - every number is followed by a comma, so you don't
    need to write special case code to handle the last number. Terminate the
    loop when sscanf() returns 0, and try to parse the buffer as the start
    of new array.
     
    James Kuyper, Jul 21, 2013
    #38
  19. Correction: -ansi is equivalent to -std=c90 (that was probably just
    a typo).

    In recent versions of gcc, -ansi, -std=c89, and -std=c90 are all
    equivalent. The name "-ansi" is strictly incorrect, since ANSI
    (the American National Standards Institute) currently recognizes
    the 2011 ISO C standard and no earlier ones, but the name has stuck
    around for historical reasons. Some older versions of gcc do not
    recognize -std=c90, but they do accept -ansi and -std=c89.

    Newer versions do recognize -std=c11.

    The default, with no -std=... option, is equivalent to -std=gnu90,
    which specifies C90 plus some GNU extensions, some of which conflict
    with the C90 standard. There are also -std=gnu99 and -std=gnu11
    options. Eventually the default behavior will probably change to
    one of those, once C99 or C11 support is complete.

    If you use one of the -std=c?? options *without* -pedantic, the compiler
    doesn't (attempt to) fully conform to the specified standard; it quietly
    accepts some constructs for which the standard requires diagnostics.
     
    Keith Thompson, Jul 21, 2013
    #39
  20. W. eWatson

    James Kuyper Guest

    Yes - and it's one I remember (apparently incorrectly) correcting.
     
    James Kuyper, Jul 21, 2013
    #40
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.