parsing from file

Discussion in 'C Programming' started by Darius Fatakia, May 16, 2004.

  1. Hello,

    I have a file that I have opened for reading and this file contains lines
    with several different types of constraint information.
    For example, here are a few lines:
    length(0) = 10 Duration of task 0 is 10.

    needs(16,1) Operation 16 uses resource 1.

    before(49,9) Operation 49 must be before operation 9.

    release(17) = 0 Operation 17 can start at or after time 0.

    due(0) = 149 Operation 0 must be done no later than time 149.

    The part before the parentheses is the constraint_type (a string) and then i
    have either one or 2 parameters (both integers) inside the parentheses, and
    then possibly (for due, release, and length) an integer value.

    I am wondering what the best way to parse this input would be, given that I
    don't know what type of constraint I will encounter when I read in the line.
    Thanks!

    ~Darius
     
    Darius Fatakia, May 16, 2004
    #1
    1. Advertising

  2. Darius Fatakia

    CBFalconer Guest

    Darius Fatakia wrote:
    >
    > I have a file that I have opened for reading and this file
    > contains lines with several different types of constraint
    > information. For example, here are a few lines:
    >
    > length(0) = 10 Duration of task 0 is 10.
    > needs(16,1) Operation 16 uses resource 1.
    > before(49,9) Operation 49 must be before operation 9.
    > release(17) = 0 Operation 17 can start at or after time 0.
    > due(0) = 149 Operation 0 must be done no later than time 149.
    >
    > The part before the parentheses is the constraint_type (a string)
    > and then i have either one or 2 parameters (both integers) inside
    > the parentheses, and then possibly (for due, release, and length)
    > an integer value.
    >
    > I am wondering what the best way to parse this input would be,
    > given that I don't know what type of constraint I will encounter
    > when I read in the line.


    If you can change the file format, it would be simplified by a
    single format, such as:

    <constraint> '(' <integer> [',' <integer>] ')'

    Then you could read the initial string up to the '(', check it
    against a list of valid values, and either flush the line with an
    error message or read the appropriate parameters. The '=' chars
    in your list seem totally unnecessary, and the simple parentheses
    delimited parameters enable flushing the (assumed) comment portion
    of the line easy.

    Then you would have:

    length(0,10)
    release(17,0)
    due(0,149)

    At any rate, I would build anything around getc() and a few tests.

    --
    "I'm a war president. I make decisions here in the Oval Office
    in foreign policy matters with war on my mind." - Bush.
    "Churchill and Bush can both be considered wartime leaders, just
    as Secretariat and Mr Ed were both horses." - James Rhodes.
     
    CBFalconer, May 17, 2004
    #2
    1. Advertising

  3. Darius Fatakia wrote:
    > Hello,
    >
    > I have a file that I have opened for reading and this file contains lines
    > with several different types of constraint information.
    > For example, here are a few lines:
    > length(0) = 10 Duration of task 0 is 10.
    >
    > needs(16,1) Operation 16 uses resource 1.
    >
    > before(49,9) Operation 49 must be before operation 9.
    >
    > release(17) = 0 Operation 17 can start at or after time 0.
    >
    > due(0) = 149 Operation 0 must be done no later than time 149.
    >
    > The part before the parentheses is the constraint_type (a string) and then i
    > have either one or 2 parameters (both integers) inside the parentheses, and
    > then possibly (for due, release, and length) an integer value.
    >
    > I am wondering what the best way to parse this input would be, given that I
    > don't know what type of constraint I will encounter when I read in the line.
    > Thanks!
    >
    > ~Darius
    >
    >


    Here is my recommendation:

    1/ Read the entire line into a buffer.
    2. Extract the constraint type.
    3. Execute a function for the restraint type. Pass the string
    and optionally the position (after the parenthesis). This
    function will take care of parsing the rest of the parameters
    for the constraint type.
    Since "switch" statements don't work with strings, I recommend
    using a table of <constraint_name, function_pointer>.


    --
    Thomas Matthews

    C++ newsgroup welcome message:
    http://www.slack.net/~shiva/welcome.txt
    C++ Faq: http://www.parashift.com/c -faq-lite
    C Faq: http://www.eskimo.com/~scs/c-faq/top.html
    alt.comp.lang.learn.c-c++ faq:
    http://www.raos.demon.uk/acllc-c /faq.html
    Other sites:
    http://www.josuttis.com -- C++ STL Library book
    http://www.sgi.com/tech/stl -- Standard Template Library
     
    Thomas Matthews, May 17, 2004
    #3
  4. "Darius Fatakia" <> a écrit dans le message de
    news:c88pk5$7h9$...
    > Hello,


    Hi,

    >
    > I have a file that I have opened for reading and this file contains lines
    > with several different types of constraint information.
    > For example, here are a few lines:
    > length(0) = 10 Duration of task 0 is 10.
    >
    > needs(16,1) Operation 16 uses resource 1.
    >
    > before(49,9) Operation 49 must be before operation 9.
    >
    > release(17) = 0 Operation 17 can start at or after time 0.
    >
    > due(0) = 149 Operation 0 must be done no later than time 149.
    >
    > The part before the parentheses is the constraint_type (a string) and then

    i
    > have either one or 2 parameters (both integers) inside the parentheses,

    and
    > then possibly (for due, release, and length) an integer value.
    >
    > I am wondering what the best way to parse this input would be, given that

    I
    > don't know what type of constraint I will encounter when I read in the

    line.
    > Thanks!


    If your lines stricly follow a format such
    constraint_type_name(a<opt>,b</opt>) <opt>= c</opt> (the opt tags meaning
    optional parts of the line) I would process the file line after line (with
    fgets()) and use fscanf() with the corresponding format specifier, this
    latter being built according to if the ',' and/or '=' characters have been
    found or not thanks to the strchr() function.

    Another way is to use strchr() and strtol(). e.g:

    /* Ugly example, not modularized, not safe, but it's able to parse according
    to your specs */

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <limits.h>

    int main(int argc, char *argv[])
    {
    FILE * fp;

    char linebuffer[50];
    char constraint_type[50] = { 0 };

    char *p_left, *comma, *equal;

    if (argc < 2)
    {
    fprintf(stderr, "Usage : %s <file_to_parse>\n", argv[0]);
    return EXIT_FAILURE;
    }

    fp = fopen(argv[1],"r");

    if (fp)
    {
    linebuffer[49] = '\0';

    while (fgets(linebuffer, 50, fp))
    {
    int a, b, c;

    /* Using INT_MIN as dummy value*/
    a = b = c = INT_MIN;

    if (linebuffer[0] == '\n') continue;

    p_left = strchr(linebuffer,'(');

    if(p_left)
    {
    memset(constraint_type,0,50);
    strncpy(constraint_type,linebuffer,p_left-linebuffer);
    a = strtol(p_left+1, NULL, 10);
    comma = strchr(p_left,',');
    b = (comma) ? strtol(comma+1, NULL, 10) : INT_MIN;
    equal = strchr(p_left,'=');
    c = (equal) ? strtol(equal+1, NULL, 10) : INT_MIN;
    }

    if (c != INT_MIN)
    {
    if (b != INT_MIN)
    {
    printf("%s => parameters: %d,%d ; "
    "assignement: %d\n",
    constraint_type, a, b, c);
    }
    else
    {
    printf("%s => parameter: %d ; "
    "assignement: %d\n",
    constraint_type, a, c);
    }
    }
    else
    {
    if (b != INT_MIN)
    {
    printf("%s => parameters: %d,%d\n",
    constraint_type, a, b);
    }
    else
    {
    printf("%s => parameter: %d\n",
    constraint_type, a);
    }
    }
    }

    }
    else
    {
    fprintf(stderr, "Unable to open : %s\n", argv[1]);
    return EXIT_FAILURE;
    }

    return EXIT_SUCCESS;
    }

    Given this text file:
    length(0) = 10
    needs(16,1)
    before(49,9)
    release(17) = 0
    due(0) = 149

    The program outputs:
    length => parameter: 0 ; assignement: 10
    needs => parameters: 16,1
    before => parameters: 49,9
    release => parameter: 17 ; assignement: 0
    due => parameter: 0 ; assignement: 149


    Regis
     
    Régis Troadec, May 17, 2004
    #4
  5. > If your lines stricly follow a format such
    > constraint_type_name(a<opt>,b</opt>) <opt>= c</opt> (the opt tags meaning
    > optional parts of the line) I would process the file line after line (with
    > fgets()) and use fscanf() with the corresponding format specifier, this

    [...]
    ^^^^^^
    I meant sscanf()
     
    Régis Troadec, May 17, 2004
    #5
  6. Darius Fatakia

    Karthik Guest

    Darius Fatakia wrote:
    > Hello,
    >
    > I have a file that I have opened for reading and this file contains lines
    > with several different types of constraint information.
    > For example, here are a few lines:
    > length(0) = 10 Duration of task 0 is 10.
    >
    > needs(16,1) Operation 16 uses resource 1.
    >
    > before(49,9) Operation 49 must be before operation 9.
    >
    > release(17) = 0 Operation 17 can start at or after time 0.
    >
    > due(0) = 149 Operation 0 must be done no later than time 149.
    >
    > The part before the parentheses is the constraint_type (a string) and then i
    > have either one or 2 parameters (both integers) inside the parentheses, and
    > then possibly (for due, release, and length) an integer value.
    >
    > I am wondering what the best way to parse this input would be, given that I
    > don't know what type of constraint I will encounter when I read in the line.
    > Thanks!
    >
    > ~Darius
    >
    >

    A thumb rule to deal with files is as follows -

    Copy all file contents to memory.
    Close the file
    Process the file contents from data saved in Step 1.

    This would give a big performance boost.

    For eg-


    while (!feof(fp) ) {
    fscanf( fp, "%s", buff);
    }


    --
    Karthik.
    Humans please 'removeme_' for my real email.
     
    Karthik, May 17, 2004
    #6
  7. "Darius Fatakia" <> wrote in message news:c88pk5$7h9$...
    > Hello,
    >
    > I have a file that I have opened for reading and this file contains lines
    > with several different types of constraint information.
    > For example, here are a few lines:
    > length(0) = 10 Duration of task 0 is 10.
    >
    > needs(16,1) Operation 16 uses resource 1.
    >
    > before(49,9) Operation 49 must be before operation 9.
    >
    > release(17) = 0 Operation 17 can start at or after time 0.
    >
    > due(0) = 149 Operation 0 must be done no later than time 149.
    >
    > The part before the parentheses is the constraint_type (a string) and then i
    > have either one or 2 parameters (both integers) inside the parentheses, and
    > then possibly (for due, release, and length) an integer value.
    >
    > I am wondering what the best way to parse this input would be, given that I
    > don't know what type of constraint I will encounter when I read in the line.
    > Thanks!
    >
    > ~Darius
    >
    >


    The format of the file needs to be pretty uniform in order to use
    the following method:

    F:\Vijay\C> type scanf.c
    #include <stdio.h>
    #include <stdlib.h>

    int
    main ( void )
    {
    int i, j, k, l, n;

    n = scanf ( "length(%d) = %d duration of task %d is %d", &i, &j, &k, &l );
    if ( n == 4 )
    printf ( "n = %d\ni = %d\nj = %d\nk = %d\nl = %d\n", n, i, j, k, l );
    return EXIT_SUCCESS;
    }

    F:\Vijay\C> gcc scanf.c
    F:\Vijay\C> a.exe
    length(0) = 10 duration of task 0 is 10
    n = 4
    i = 0
    j = 10
    k = 0
    l = 10

    Z.
     
    Vijay Kumar R Zanvar, May 17, 2004
    #7
  8. Darius Fatakia

    Ralmin Guest

    "Karthik" <> wrote:
    > A thumb rule to deal with files is as follows -
    >
    > Copy all file contents to memory.
    > Close the file
    > Process the file contents from data saved in Step 1.


    I would only suggest that approach if the algorithm requires moving back and
    forth across the whole file's data. Even in that case, for particularly
    large files where that approach is not viable, you may be better off using
    fseek() or something.

    > This would give a big performance boost.


    I don't see how it does give a big performance boost. It might make your
    program require much more memory than is necessary.

    > For eg-
    >
    > while (!feof(fp) ) {
    > fscanf( fp, "%s", buff);
    > }


    This is a terrible example. Seeing while(!feof(fp)) should flag problems
    immediately. A while loop should depend on the success or failure of the
    actual file reading function, not the secondary feof test. The problem with
    this is that it often causes out-by-one errors in the number of times it
    loops.

    scanf or fscanf with plain "%s" are just as bad as the gets function. It has
    no way to prevent going outside the bounds of the buffer given. You must
    always specify a maximum field width with the %s specifier. In addition,
    your loop never checks the returned value of fscanf, and it just keeps
    overwriting the same buffer with each (whitespace-delimited) string read,
    without separating those out into memory properly.

    In this case I'd parse one line at a time:

    while(fgets(buff, sizeof buff, fp))
    {
    /* work on the current line in buff */
    }

    --
    Simon.
     
    Ralmin, May 17, 2004
    #8
  9. Karthik wrote:

    > Darius Fatakia wrote:
    >

    [snip]

    > A thumb rule to deal with files is as follows -
    >
    > Copy all file contents to memory.
    > Close the file
    > Process the file contents from data saved in Step 1.
    >
    > This would give a big performance boost.
    >
    > For eg-
    >
    >
    > while (!feof(fp) ) {
    > fscanf( fp, "%s", buff);
    > }


    Yes, this would give a better performance boost, but
    many applications cannot fit an entire data file into
    memory. A trade-off is to read the data file into
    large "chunks", where a chunk is sufficiently large
    to reduce the I/O overhead time (such as starting
    and stopping a harddrive). Small buffer sizes may
    not provide any performance benefits due to buffering
    by the operating system and perhaps by the I/O device.

    --
    Thomas Matthews

    C++ newsgroup welcome message:
    http://www.slack.net/~shiva/welcome.txt
    C++ Faq: http://www.parashift.com/c -faq-lite
    C Faq: http://www.eskimo.com/~scs/c-faq/top.html
    alt.comp.lang.learn.c-c++ faq:
    http://www.raos.demon.uk/acllc-c /faq.html
    Other sites:
    http://www.josuttis.com -- C++ STL Library book
     
    Thomas Matthews, May 17, 2004
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. GIMME
    Replies:
    2
    Views:
    893
    GIMME
    Feb 11, 2004
  2. Naren
    Replies:
    0
    Views:
    591
    Naren
    May 11, 2004
  3. Christopher Diggins
    Replies:
    0
    Views:
    616
    Christopher Diggins
    Jul 9, 2007
  4. Christopher Diggins
    Replies:
    0
    Views:
    445
    Christopher Diggins
    Jul 9, 2007
  5. John Levine
    Replies:
    0
    Views:
    747
    John Levine
    Feb 2, 2012
Loading...

Share This Page