Splitting text files?

Discussion in 'C Programming' started by MM, Jul 8, 2003.

  1. MM

    MM Guest

    Hi

    I have never written any C programs before, but it seems that I need to do
    so now. Hope some of you out there can spend a few minutes and help me by
    writing a simple example of something fairly similar to what I need. I
    really think it is a simple matter if you know C programming, but to me it
    is not easy at all. An example from some "professional" C programmer will
    probably give me all I need to complete it into exactly what I need.

    Basically I need it to, in a specific way, split large text files containing
    experimental data (stored in a known "form", see example below) into some
    smaller files. The smaller files I will later use MATLAB to handle.
    Theoretically I could use MATLAB to do it all (split the data file as well),
    but when trying this it took WAY to long time (not possible, since I will
    use this in another system). MATLAB is not really optimized to read/write
    large text files (if the files are not structured in some ways...). And yes,
    I need to do it all in C (not C++, VB, Fortran, Perl...).

    Below is an example of the structure of the type of text file I will need to
    split. Suppose the file name of this file is "simdata.txt". Open this file
    for reading is probably one of the first things to do.

    First there are some header lines. The header ends when the word "\Data:"
    (without quotes) is found. All header lines are to be saved into a new file,
    say "header.dat".

    When "\Data:" has been identified, the first word "Time" is to be
    identified. Probably it follows on the next row (after "\Data:"), but one
    cannot be absolutely sure of this. Though, "Time" can be assumed to be the
    first word in the row. So, when the word "Time" is identified, then starts
    (including that row!) the first data block. This block ends when the next
    block is identified in a similar way. Each data block is to be saved as
    individual files, say "data1.dat", "data2.dat", and "data3.dat". We can
    assue there are three blocks.

    Hope this information is sufficient and that someone can help me with this.
    I really need it, and cannot do much more without it.

    Best regards,

    MM

    ########################################
    ########### Example of file to split ###########
    ########################################

    header line 1
    header line 2
    header line 3
    .......
    .......
    .......
    header line (last one)
    \Data:
    Time parameter2 parameter3 parameter4 ...
    ....... This is data block 1
    ....... This is data block 1
    ....... This is data block 1
    ....... This is data block 1
    ....... This is data block 1
    ....... This is data block 1
    ....... This is data block 1
    ....... This is data block 1
    Time parameter5 parameter6 parameter7 ...
    ....... This is data block 2
    ....... This is data block 2
    ....... This is data block 2
    ....... This is data block 2
    ....... This is data block 2
    ....... This is data block 2
    ....... This is data block 2
    ....... This is data block 2
    Time parameter8 parameter9 parameter10 ...
    ....... This is data block 3
    ....... This is data block 3
    ....... This is data block 3
    ....... This is data block 3
    ....... This is data block 3
    ....... This is data block 3
    ....... This is data block 3
    ....... This is data block 3

    ########################################
    ############# End of example #############
    ########################################
    MM, Jul 8, 2003
    #1
    1. Advertising

  2. On Tue, 8 Jul 2003 15:55:05 +0200, MM wrote:
    > "Tom St Denis" <> wrote in message
    > news:_FzOa.83651$...
    > > MM wrote:
    > > > Hi

    > >
    > > How is summer school going?
    > >
    > > Fail much?
    > >
    > > Tom
    > >

    > So, what's wrong with you? Tired of your tedious job? I'm not, which
    > is why I take on (for me) challenging tasks in my job.


    Like asking a newsgroup to solve your problem?

    Oh, and top posting is severely frowned upon.

    --
    main(int c,char*k,char*s){c>0?main(0,"adceoX$_k6][^hn","-7\
    0#05&'40$.6'+).3+1%30"),puts(""):*s?c=!c?-*s:(putchar(45),c
    ),putchar(main(c,k+=*s-c*-1,s+1)):(s=0);return!s?10:10+*k;}
    Pieter Droogendijk, Jul 8, 2003
    #2
    1. Advertising

  3. MM

    David Rubin Guest

    MM wrote:

    The following is untested...

    [snip - split this]

    #include <stdio.h>
    #include <string.h>

    int
    main(void)
    {
    FILE *fp;
    char fname[4+2+4+1]; /* dataNN.txt */
    char buf[256]; /* max line length is 255 characters */
    int i = 0;

    /* find start of data segment */
    while(fgets(buf, sizeof buf, stdio) != 0){
    if(strcmp("\\Data:", buf) == 0)
    break;
    }

    while(fgets(buf, sizeof buf, stdio) != 0){
    /* lines starting with '#' are skipped as comments */
    /* blank lines are also skipped */
    if(buf[0] == '#' || buf[0] == '\n')
    continue;

    /* write each block to a separate file */
    if(strncmp("Time", buf, 4) == 0){
    if(i > 0)
    fclose(fp);
    sprintf(fname, "data%02d.txt", ++i);
    if((fp=fopen(fname, "w")) == 0){
    perror(fname);
    exit(EXIT_FAILURE);
    }
    }
    fputs(buf, fp);
    }
    fclose(fp);
    return 0;
    }

    HTH,

    /david

    --
    Andre, a simple peasant, had only one thing on his mind as he crept
    along the East wall: 'Andre, creep... Andre, creep... Andre, creep.'
    -- unknown
    David Rubin, Jul 8, 2003
    #3
  4. MM wrote:
    > Hi
    >
    > I have never written any C programs before, but it seems that I need to do
    > so now. Hope some of you out there can spend a few minutes and help me by
    > writing a simple example of something fairly similar to what I need. I
    > really think it is a simple matter if you know C programming, but to me it
    > is not easy at all. An example from some "professional" C programmer will
    > probably give me all I need to complete it into exactly what I need.
    >
    > Basically I need it to, in a specific way, split large text files containing
    > experimental data (stored in a known "form", see example below) into some
    > smaller files. The smaller files I will later use MATLAB to handle.
    > Theoretically I could use MATLAB to do it all (split the data file as well),
    > but when trying this it took WAY to long time (not possible, since I will
    > use this in another system). MATLAB is not really optimized to read/write
    > large text files (if the files are not structured in some ways...). And yes,
    > I need to do it all in C (not C++, VB, Fortran, Perl...).
    >

    Don't pay too much attention to Tom StDenis, he has a pretty wide mouth.

    As others have pointed out, bottom-posting is the rule in c.l.c, and so
    is not doing people's work for them. On the other hand, here's a handful
    of advice:

    - it might be presomptuous to take on a C project without having a
    few basic notions of the language. If you are as serious as you claim
    about your job and taking on challenging tasks, do get Kernighan &
    Ritchie 2nd ed. to learn about the language. I would even think that
    when you are through with the book, you should be way able to solve your
    little problem by yourself.
    - nonetheless, if you want to skip on the concepts part and start
    fighting with your little program, you should definitely explore the
    functions fopen, fgets, strcmp, fputs, fclose. Have a look at, say, the
    ggets library, if only to get an idea of the common issues involved with
    I/O in C.

    --
    Bertrand Mollinier Toublet
    "Reality exists" - Richard Heathfield, 1 July 2003
    Bertrand Mollinier Toublet, Jul 8, 2003
    #4
  5. MM

    MM Guest

    Ok, I get it. But, the alternative for me would be to say "Now, I cannot do
    this - it will have to wait until after summer". Of course there are people
    in my company that could help me with this, but since it is summer and
    pretty much everyone is on holidays, then I have to try to find other ways
    to solve the problems I encounter. I thought one way was to ask people who
    really knows C programming. Maybe I was wrong... But I still hope that there
    ARE people who can understand what I need and are willing to help me.

    "Pieter Droogendijk" <> wrote in message
    news:...
    > Like asking a newsgroup to solve your problem?
    >
    > Oh, and top posting is severely frowned upon.
    >
    > --
    > main(int c,char*k,char*s){c>0?main(0,"adceoX$_k6][^hn","-7\
    > 0#05&'40$.6'+).3+1%30"),puts(""):*s?c=!c?-*s:(putchar(45),c
    > ),putchar(main(c,k+=*s-c*-1,s+1)):(s=0);return!s?10:10+*k;}
    MM, Jul 8, 2003
    #5
  6. MM

    MM Guest

    Many thanks to both David for his code (I will have a look at it and see if
    I can get it all to work) and Bertrand (yes, I will get to learn much more
    of C, starting right away) for his advice.

    If I have had a lot of time I would not have asked the HG for all this.
    Instead I would have begun trying to write the program all from the
    beginning myself, and only asking the NG for specific parts. But I really
    don't have the time now.

    By the way, what is "bottom-posting"?

    MM
    MM, Jul 8, 2003
    #6
  7. Evil top-posted text.

    On Tue, 8 Jul 2003 17:04:56 +0200, MM wrote:
    > Many thanks to both David for his code (I will have a look at it and
    > see if I can get it all to work) and Bertrand (yes, I will get to
    > learn much more of C, starting right away) for his advice.


    Good Non-top-posted text.

    > If I have had a lot of time I would not have asked the HG for all
    > this. Instead I would have begun trying to write the program all from
    > the beginning myself, and only asking the NG for specific parts. But I
    > really don't have the time now.
    >
    > By the way, what is "bottom-posting"?
    >
    > MM


    Bottom posting (as in opposite of top-posting) is replying to a post
    where your own comments appear BELOW some amount of quoted text. like
    this.

    --
    main(int c,char*k,char*s){c>0?main(0,"adceoX$_k6][^hn","-7\
    0#05&'40$.6'+).3+1%30"),puts(""):*s?c=!c?-*s:(putchar(45),c
    ),putchar(main(c,k+=*s-c*-1,s+1)):(s=0);return!s?10:10+*k;}
    Pieter Droogendijk, Jul 8, 2003
    #7
  8. This is top-posting (my reply above yours), frowned upon in c.l.c.

    MM wrote:
    > Many thanks to both David for his code (I will have a look at it and see if
    > I can get it all to work) and Bertrand (yes, I will get to learn much more
    > of C, starting right away) for his advice.
    >
    > If I have had a lot of time I would not have asked the HG for all this.
    > Instead I would have begun trying to write the program all from the
    > beginning myself, and only asking the NG for specific parts. But I really
    > don't have the time now.
    >
    > By the way, what is "bottom-posting"?
    >

    This is bottom-posting (my reply below yours), de facto standard in c.l.c.


    --
    Bertrand Mollinier Toublet
    "Reality exists" - Richard Heathfield, 1 July 2003
    Bertrand Mollinier Toublet, Jul 8, 2003
    #8
  9. MM

    Mike Wahler Guest

    MM <> wrote in message
    news:WUAOa.828$...
    > Ok, I get it. But, the alternative for me would be to say "Now, I cannot

    do
    > this - it will have to wait until after summer". Of course there are

    people
    > in my company that could help me with this, but since it is summer and
    > pretty much everyone is on holidays, then I have to try to find other ways
    > to solve the problems I encounter. I thought one way was to ask people who
    > really knows C programming. Maybe I was wrong... But I still hope that

    there
    > ARE people who can understand what I need and are willing to help me.


    Again, please don't top post.

    Then please note that most folks don't consider
    'helping' and 'doing it for you' to be the same
    thing.

    Post the code of your best attempt, and then I
    suspect you'll get plenty of assistance.

    -Mike
    Mike Wahler, Jul 8, 2003
    #9
  10. MM

    David Rubin Guest

    David Rubin wrote:
    >
    > MM wrote:
    >
    > The following is untested...
    >
    > [snip - split this]
    >
    > #include <stdio.h>


    #include <stdlib.h>

    > #include <string.h>


    > int
    > main(void)
    > {
    > FILE *fp;
    > char fname[4+2+4+1]; /* dataNN.txt */
    > char buf[256]; /* max line length is 255 characters */
    > int i = 0;


    > /* find start of data segment */
    > while(fgets(buf, sizeof buf, stdio) != 0){


    while(fgets(buf, sizeof buf, stdin) != 0){

    > if(strcmp("\\Data:", buf) == 0)


    if(strncmp("\\Data:", buf, 6) == 0)

    > break;
    > }
    >
    > while(fgets(buf, sizeof buf, stdio) != 0){


    while(fgets(buf, sizeof buf, stdin) != 0){

    /david

    --
    Andre, a simple peasant, had only one thing on his mind as he crept
    along the East wall: 'Andre, creep... Andre, creep... Andre, creep.'
    -- unknown
    David Rubin, Jul 8, 2003
    #10
  11. MM

    Joe Wright Guest

    MM wrote:
    >
    > Ok, I get it. But, the alternative for me would be to say "Now, I cannot do
    > this - it will have to wait until after summer". Of course there are people
    > in my company that could help me with this, but since it is summer and
    > pretty much everyone is on holidays, then I have to try to find other ways
    > to solve the problems I encounter. I thought one way was to ask people who
    > really knows C programming. Maybe I was wrong... But I still hope that there
    > ARE people who can understand what I need and are willing to help me.
    >
    > "Pieter Droogendijk" <> wrote in message
    > news:...
    > > Like asking a newsgroup to solve your problem?
    > >
    > > Oh, and top posting is severely frowned upon.
    > >

    No MM, I suppose you still don't get it. Not only did you top post over
    the message asking you not to, you still expect someone here to do the
    job for you. As you mention above, you only came here because you
    couldn't get anyone in your company to do it for you until after summer.

    This sounds like a job for "Consultant Dude" and that you get to pay
    for.
    --
    Joe Wright mailto:
    "Everything should be made as simple as possible, but not simpler."
    --- Albert Einstein ---
    Joe Wright, Jul 8, 2003
    #11
  12. MM

    MM Guest

    Ok, I've learned a lot, both from all the critics given, and from the nice
    code by David Rubin (many thanks again, David!).

    I've looked at the code, understood it, and adjusted it a little (for
    example to create the header file and to read data from an input file
    instead of from the "stdio") and no I have three questions:

    1) How do I change the code so that I use "input arguments" to specify the
    file names (the name of the input file and maybe also of the output files)?
    For example, if I compile the code and that the application then gets the
    name "splitdata", then I want to be able to call my application with
    something like this:
    splitdata datafile.txt header.dat dblock.dat
    The last two arguments are not very important to be able to specify, but it
    would of course be nice.
    In the code as I have it now, the name of the input file is specified in
    line 13 with
    char tname[] = "Example.txt";
    So, I want to skip this "hard coded" name specification. Also the length of
    the input file name is unknown.

    2) I cannot figure out why in line 12 I have to specify the length of the
    char array (is it such?) 'fname', since if I don't, then output data block
    files later than number 9 will not be written correctly or not written at
    all. Not very important for me, but I'm interested.

    3) In line 70 I want to include the number of data blocks found, i.e. the
    value of the counter 'i' after "NumDataBlocks=". How do I do this, "append"
    a string with an integer?

    Many thanks in advance!

    MM

    =================================================================
    === Code, including line numbers (the code without line numbers is included
    below this one) ===
    =================================================================

    1: #include <stdio.h>
    2: #include <stdlib.h>
    3: #include <string.h>
    4:
    5: #define DATASTART "\\Data:"
    6: #define BLOCKSTART "Time"
    7:
    8: int main()
    9: {
    10: FILE *fh, *fp, *fq;
    11: char hname[] = "header.dat"; /* name of header file to write */
    12: char fname[6+2+4+1]; /* dblockNN.dat */
    13: char tname[] = "Example.txt"; /* name of input file to split */
    14: char buf[1001]; /* max line length is 1000 characters */
    15: int i = 0;
    16:
    17: /* open input file for reading */
    18: if((fq=fopen(tname, "r")) == 0) {
    19: perror(tname);
    20: exit(EXIT_FAILURE);
    21: }
    22:
    23: /* open header output file */
    24: if((fh=fopen(hname, "w")) == 0) {
    25: perror(fname);
    26: exit(EXIT_FAILURE);
    27: }
    28:
    29: /* print data to header file */
    30: /* if start of data segment is found then close header file */
    31: // while(fgets(buf, sizeof buf, stdin) != 0) {
    32: while(fgets(buf, sizeof buf, fq) != 0) {
    33: // if(strncmp("\\Data:", buf, 6) == 0) {
    34: if(strncmp(DATASTART, buf, 6) == 0) {
    35: fclose(fh);
    36: break;
    37: }
    38: fputs(buf, fh);
    39: }
    40:
    41: // while(fgets(buf, sizeof buf, stdin) != 0) {
    42: while(fgets(buf, sizeof buf, fq) != 0) {
    43: /* lines starting with '#' are skipped as comments */
    44: /* blank lines are also skipped */
    45: /*
    46: if(buf[0] == '#' || buf[0] == '\n')
    47: continue;
    48: */
    49:
    50: /* write each block to a separate file */
    51: // if(strncmp("Time", buf, 4) == 0) {
    52: if(strncmp(BLOCKSTART, buf, 4) == 0) {
    53:
    54: if(i > 0)
    55: fclose(fp);
    56: sprintf(fname, "dblock%02d.dat", ++i);
    57: if((fp=fopen(fname, "w")) == 0) {
    58: perror(fname);
    59: exit(EXIT_FAILURE);
    60: }
    61: }
    62: fputs(buf, fp);
    63: }
    64: /* open header output file again */
    65: if((fh=fopen(hname, "a")) == 0) {
    66: perror(fname);
    67: exit(EXIT_FAILURE);
    68: }
    69: /* print the number of data blocks found last in the header file */
    70: fputs("NumDataBlocks=", fh);
    71: fclose(fh);
    72:
    73: /* close the other files */
    74: fclose(fp);
    75: fclose(fq);
    76: return 0;
    77: }

    =========================
    === Code without line numbers ===
    =========================

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    #define DATASTART "\\Data:"
    #define BLOCKSTART "Time"

    int main()
    {
    FILE *fh, *fp, *fq;
    char hname[] = "header.dat"; /* name of header file to write */
    char fname[6+2+4+1]; /* dblockNN.dat */
    char tname[] = "Example.txt"; /* name of input file to split */
    char buf[1001]; /* max line length is 1000 characters */
    int i = 0;

    /* open input file for reading */
    if((fq=fopen(tname, "r")) == 0) {
    perror(tname);
    exit(EXIT_FAILURE);
    }

    /* open header output file */
    if((fh=fopen(hname, "w")) == 0) {
    perror(fname);
    exit(EXIT_FAILURE);
    }

    /* print data to header file */
    /* if start of data segment is found then close header file */
    // while(fgets(buf, sizeof buf, stdin) != 0) {
    while(fgets(buf, sizeof buf, fq) != 0) {
    // if(strncmp("\\Data:", buf, 6) == 0) {
    if(strncmp(DATASTART, buf, 6) == 0) {
    fclose(fh);
    break;
    }
    fputs(buf, fh);
    }

    // while(fgets(buf, sizeof buf, stdin) != 0) {
    while(fgets(buf, sizeof buf, fq) != 0) {
    /* lines starting with '#' are skipped as comments */
    /* blank lines are also skipped */
    /*
    if(buf[0] == '#' || buf[0] == '\n')
    continue;
    */

    /* write each block to a separate file */
    // if(strncmp("Time", buf, 4) == 0) {
    if(strncmp(BLOCKSTART, buf, 4) == 0) {

    if(i > 0)
    fclose(fp);
    sprintf(fname, "dblock%02d.dat", ++i);
    if((fp=fopen(fname, "w")) == 0) {
    perror(fname);
    exit(EXIT_FAILURE);
    }
    }
    fputs(buf, fp);
    }
    /* open header output file again */
    if((fh=fopen(hname, "a")) == 0) {
    perror(fname);
    exit(EXIT_FAILURE);
    }
    /* print the number of data blocks found last in the header file */
    fputs("NumDataBlocks=", fh);
    fclose(fh);

    /* close the other files */
    fclose(fp);
    fclose(fq);
    return 0;
    }
    MM, Jul 9, 2003
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.

Share This Page