Splitting text files?

M

MM

Hi

I have never written any C programs before, but it seems that I need to do
so now. Hope some of you out there can spend a few minutes and help me by
writing a simple example of something fairly similar to what I need. I
really think it is a simple matter if you know C programming, but to me it
is not easy at all. An example from some "professional" C programmer will
probably give me all I need to complete it into exactly what I need.

Basically I need it to, in a specific way, split large text files containing
experimental data (stored in a known "form", see example below) into some
smaller files. The smaller files I will later use MATLAB to handle.
Theoretically I could use MATLAB to do it all (split the data file as well),
but when trying this it took WAY to long time (not possible, since I will
use this in another system). MATLAB is not really optimized to read/write
large text files (if the files are not structured in some ways...). And yes,
I need to do it all in C (not C++, VB, Fortran, Perl...).

Below is an example of the structure of the type of text file I will need to
split. Suppose the file name of this file is "simdata.txt". Open this file
for reading is probably one of the first things to do.

First there are some header lines. The header ends when the word "\Data:"
(without quotes) is found. All header lines are to be saved into a new file,
say "header.dat".

When "\Data:" has been identified, the first word "Time" is to be
identified. Probably it follows on the next row (after "\Data:"), but one
cannot be absolutely sure of this. Though, "Time" can be assumed to be the
first word in the row. So, when the word "Time" is identified, then starts
(including that row!) the first data block. This block ends when the next
block is identified in a similar way. Each data block is to be saved as
individual files, say "data1.dat", "data2.dat", and "data3.dat". We can
assue there are three blocks.

Hope this information is sufficient and that someone can help me with this.
I really need it, and cannot do much more without it.

Best regards,

MM

########################################
########### Example of file to split ###########
########################################

header line 1
header line 2
header line 3
.......
.......
.......
header line (last one)
\Data:
Time parameter2 parameter3 parameter4 ...
....... This is data block 1
....... This is data block 1
....... This is data block 1
....... This is data block 1
....... This is data block 1
....... This is data block 1
....... This is data block 1
....... This is data block 1
Time parameter5 parameter6 parameter7 ...
....... This is data block 2
....... This is data block 2
....... This is data block 2
....... This is data block 2
....... This is data block 2
....... This is data block 2
....... This is data block 2
....... This is data block 2
Time parameter8 parameter9 parameter10 ...
....... This is data block 3
....... This is data block 3
....... This is data block 3
....... This is data block 3
....... This is data block 3
....... This is data block 3
....... This is data block 3
....... This is data block 3

########################################
############# End of example #############
########################################
 
P

Pieter Droogendijk

So, what's wrong with you? Tired of your tedious job? I'm not, which
is why I take on (for me) challenging tasks in my job.

Like asking a newsgroup to solve your problem?

Oh, and top posting is severely frowned upon.
 
D

David Rubin

MM wrote:

The following is untested...

[snip - split this]

#include <stdio.h>
#include <string.h>

int
main(void)
{
FILE *fp;
char fname[4+2+4+1]; /* dataNN.txt */
char buf[256]; /* max line length is 255 characters */
int i = 0;

/* find start of data segment */
while(fgets(buf, sizeof buf, stdio) != 0){
if(strcmp("\\Data:", buf) == 0)
break;
}

while(fgets(buf, sizeof buf, stdio) != 0){
/* lines starting with '#' are skipped as comments */
/* blank lines are also skipped */
if(buf[0] == '#' || buf[0] == '\n')
continue;

/* write each block to a separate file */
if(strncmp("Time", buf, 4) == 0){
if(i > 0)
fclose(fp);
sprintf(fname, "data%02d.txt", ++i);
if((fp=fopen(fname, "w")) == 0){
perror(fname);
exit(EXIT_FAILURE);
}
}
fputs(buf, fp);
}
fclose(fp);
return 0;
}

HTH,

/david
 
B

Bertrand Mollinier Toublet

MM said:
Hi

I have never written any C programs before, but it seems that I need to do
so now. Hope some of you out there can spend a few minutes and help me by
writing a simple example of something fairly similar to what I need. I
really think it is a simple matter if you know C programming, but to me it
is not easy at all. An example from some "professional" C programmer will
probably give me all I need to complete it into exactly what I need.

Basically I need it to, in a specific way, split large text files containing
experimental data (stored in a known "form", see example below) into some
smaller files. The smaller files I will later use MATLAB to handle.
Theoretically I could use MATLAB to do it all (split the data file as well),
but when trying this it took WAY to long time (not possible, since I will
use this in another system). MATLAB is not really optimized to read/write
large text files (if the files are not structured in some ways...). And yes,
I need to do it all in C (not C++, VB, Fortran, Perl...).
Don't pay too much attention to Tom StDenis, he has a pretty wide mouth.

As others have pointed out, bottom-posting is the rule in c.l.c, and so
is not doing people's work for them. On the other hand, here's a handful
of advice:

- it might be presomptuous to take on a C project without having a
few basic notions of the language. If you are as serious as you claim
about your job and taking on challenging tasks, do get Kernighan &
Ritchie 2nd ed. to learn about the language. I would even think that
when you are through with the book, you should be way able to solve your
little problem by yourself.
- nonetheless, if you want to skip on the concepts part and start
fighting with your little program, you should definitely explore the
functions fopen, fgets, strcmp, fputs, fclose. Have a look at, say, the
ggets library, if only to get an idea of the common issues involved with
I/O in C.
 
M

MM

Ok, I get it. But, the alternative for me would be to say "Now, I cannot do
this - it will have to wait until after summer". Of course there are people
in my company that could help me with this, but since it is summer and
pretty much everyone is on holidays, then I have to try to find other ways
to solve the problems I encounter. I thought one way was to ask people who
really knows C programming. Maybe I was wrong... But I still hope that there
ARE people who can understand what I need and are willing to help me.
 
M

MM

Many thanks to both David for his code (I will have a look at it and see if
I can get it all to work) and Bertrand (yes, I will get to learn much more
of C, starting right away) for his advice.

If I have had a lot of time I would not have asked the HG for all this.
Instead I would have begun trying to write the program all from the
beginning myself, and only asking the NG for specific parts. But I really
don't have the time now.

By the way, what is "bottom-posting"?

MM
 
P

Pieter Droogendijk

Evil top-posted text.

Many thanks to both David for his code (I will have a look at it and
see if I can get it all to work) and Bertrand (yes, I will get to
learn much more of C, starting right away) for his advice.

Good Non-top-posted text.
If I have had a lot of time I would not have asked the HG for all
this. Instead I would have begun trying to write the program all from
the beginning myself, and only asking the NG for specific parts. But I
really don't have the time now.

By the way, what is "bottom-posting"?

MM

Bottom posting (as in opposite of top-posting) is replying to a post
where your own comments appear BELOW some amount of quoted text. like
this.
 
B

Bertrand Mollinier Toublet

This is top-posting (my reply above yours), frowned upon in c.l.c.
Many thanks to both David for his code (I will have a look at it and see if
I can get it all to work) and Bertrand (yes, I will get to learn much more
of C, starting right away) for his advice.

If I have had a lot of time I would not have asked the HG for all this.
Instead I would have begun trying to write the program all from the
beginning myself, and only asking the NG for specific parts. But I really
don't have the time now.

By the way, what is "bottom-posting"?
This is bottom-posting (my reply below yours), de facto standard in c.l.c.
 
M

Mike Wahler

MM said:
Ok, I get it. But, the alternative for me would be to say "Now, I cannot do
this - it will have to wait until after summer". Of course there are people
in my company that could help me with this, but since it is summer and
pretty much everyone is on holidays, then I have to try to find other ways
to solve the problems I encounter. I thought one way was to ask people who
really knows C programming. Maybe I was wrong... But I still hope that there
ARE people who can understand what I need and are willing to help me.

Again, please don't top post.

Then please note that most folks don't consider
'helping' and 'doing it for you' to be the same
thing.

Post the code of your best attempt, and then I
suspect you'll get plenty of assistance.

-Mike
 
D

David Rubin

David said:
MM wrote:

The following is untested...

[snip - split this]

#include <stdio.h>

#include said:
#include <string.h>
int
main(void)
{
FILE *fp;
char fname[4+2+4+1]; /* dataNN.txt */
char buf[256]; /* max line length is 255 characters */
int i = 0;
/* find start of data segment */
while(fgets(buf, sizeof buf, stdio) != 0){

while(fgets(buf, sizeof buf, stdin) != 0){
if(strcmp("\\Data:", buf) == 0)

if(strncmp("\\Data:", buf, 6) == 0)
break;
}

while(fgets(buf, sizeof buf, stdio) != 0){

while(fgets(buf, sizeof buf, stdin) != 0){

/david
 
J

Joe Wright

MM said:
Ok, I get it. But, the alternative for me would be to say "Now, I cannot do
this - it will have to wait until after summer". Of course there are people
in my company that could help me with this, but since it is summer and
pretty much everyone is on holidays, then I have to try to find other ways
to solve the problems I encounter. I thought one way was to ask people who
really knows C programming. Maybe I was wrong... But I still hope that there
ARE people who can understand what I need and are willing to help me.
No MM, I suppose you still don't get it. Not only did you top post over
the message asking you not to, you still expect someone here to do the
job for you. As you mention above, you only came here because you
couldn't get anyone in your company to do it for you until after summer.

This sounds like a job for "Consultant Dude" and that you get to pay
for.
 
M

MM

Ok, I've learned a lot, both from all the critics given, and from the nice
code by David Rubin (many thanks again, David!).

I've looked at the code, understood it, and adjusted it a little (for
example to create the header file and to read data from an input file
instead of from the "stdio") and no I have three questions:

1) How do I change the code so that I use "input arguments" to specify the
file names (the name of the input file and maybe also of the output files)?
For example, if I compile the code and that the application then gets the
name "splitdata", then I want to be able to call my application with
something like this:
splitdata datafile.txt header.dat dblock.dat
The last two arguments are not very important to be able to specify, but it
would of course be nice.
In the code as I have it now, the name of the input file is specified in
line 13 with
char tname[] = "Example.txt";
So, I want to skip this "hard coded" name specification. Also the length of
the input file name is unknown.

2) I cannot figure out why in line 12 I have to specify the length of the
char array (is it such?) 'fname', since if I don't, then output data block
files later than number 9 will not be written correctly or not written at
all. Not very important for me, but I'm interested.

3) In line 70 I want to include the number of data blocks found, i.e. the
value of the counter 'i' after "NumDataBlocks=". How do I do this, "append"
a string with an integer?

Many thanks in advance!

MM

=================================================================
=== Code, including line numbers (the code without line numbers is included
below this one) ===
=================================================================

1: #include <stdio.h>
2: #include <stdlib.h>
3: #include <string.h>
4:
5: #define DATASTART "\\Data:"
6: #define BLOCKSTART "Time"
7:
8: int main()
9: {
10: FILE *fh, *fp, *fq;
11: char hname[] = "header.dat"; /* name of header file to write */
12: char fname[6+2+4+1]; /* dblockNN.dat */
13: char tname[] = "Example.txt"; /* name of input file to split */
14: char buf[1001]; /* max line length is 1000 characters */
15: int i = 0;
16:
17: /* open input file for reading */
18: if((fq=fopen(tname, "r")) == 0) {
19: perror(tname);
20: exit(EXIT_FAILURE);
21: }
22:
23: /* open header output file */
24: if((fh=fopen(hname, "w")) == 0) {
25: perror(fname);
26: exit(EXIT_FAILURE);
27: }
28:
29: /* print data to header file */
30: /* if start of data segment is found then close header file */
31: // while(fgets(buf, sizeof buf, stdin) != 0) {
32: while(fgets(buf, sizeof buf, fq) != 0) {
33: // if(strncmp("\\Data:", buf, 6) == 0) {
34: if(strncmp(DATASTART, buf, 6) == 0) {
35: fclose(fh);
36: break;
37: }
38: fputs(buf, fh);
39: }
40:
41: // while(fgets(buf, sizeof buf, stdin) != 0) {
42: while(fgets(buf, sizeof buf, fq) != 0) {
43: /* lines starting with '#' are skipped as comments */
44: /* blank lines are also skipped */
45: /*
46: if(buf[0] == '#' || buf[0] == '\n')
47: continue;
48: */
49:
50: /* write each block to a separate file */
51: // if(strncmp("Time", buf, 4) == 0) {
52: if(strncmp(BLOCKSTART, buf, 4) == 0) {
53:
54: if(i > 0)
55: fclose(fp);
56: sprintf(fname, "dblock%02d.dat", ++i);
57: if((fp=fopen(fname, "w")) == 0) {
58: perror(fname);
59: exit(EXIT_FAILURE);
60: }
61: }
62: fputs(buf, fp);
63: }
64: /* open header output file again */
65: if((fh=fopen(hname, "a")) == 0) {
66: perror(fname);
67: exit(EXIT_FAILURE);
68: }
69: /* print the number of data blocks found last in the header file */
70: fputs("NumDataBlocks=", fh);
71: fclose(fh);
72:
73: /* close the other files */
74: fclose(fp);
75: fclose(fq);
76: return 0;
77: }

=========================
=== Code without line numbers ===
=========================

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define DATASTART "\\Data:"
#define BLOCKSTART "Time"

int main()
{
FILE *fh, *fp, *fq;
char hname[] = "header.dat"; /* name of header file to write */
char fname[6+2+4+1]; /* dblockNN.dat */
char tname[] = "Example.txt"; /* name of input file to split */
char buf[1001]; /* max line length is 1000 characters */
int i = 0;

/* open input file for reading */
if((fq=fopen(tname, "r")) == 0) {
perror(tname);
exit(EXIT_FAILURE);
}

/* open header output file */
if((fh=fopen(hname, "w")) == 0) {
perror(fname);
exit(EXIT_FAILURE);
}

/* print data to header file */
/* if start of data segment is found then close header file */
// while(fgets(buf, sizeof buf, stdin) != 0) {
while(fgets(buf, sizeof buf, fq) != 0) {
// if(strncmp("\\Data:", buf, 6) == 0) {
if(strncmp(DATASTART, buf, 6) == 0) {
fclose(fh);
break;
}
fputs(buf, fh);
}

// while(fgets(buf, sizeof buf, stdin) != 0) {
while(fgets(buf, sizeof buf, fq) != 0) {
/* lines starting with '#' are skipped as comments */
/* blank lines are also skipped */
/*
if(buf[0] == '#' || buf[0] == '\n')
continue;
*/

/* write each block to a separate file */
// if(strncmp("Time", buf, 4) == 0) {
if(strncmp(BLOCKSTART, buf, 4) == 0) {

if(i > 0)
fclose(fp);
sprintf(fname, "dblock%02d.dat", ++i);
if((fp=fopen(fname, "w")) == 0) {
perror(fname);
exit(EXIT_FAILURE);
}
}
fputs(buf, fp);
}
/* open header output file again */
if((fh=fopen(hname, "a")) == 0) {
perror(fname);
exit(EXIT_FAILURE);
}
/* print the number of data blocks found last in the header file */
fputs("NumDataBlocks=", fh);
fclose(fh);

/* close the other files */
fclose(fp);
fclose(fq);
return 0;
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top