parsing from file

D

Darius Fatakia

Hello,

I have a file that I have opened for reading and this file contains lines
with several different types of constraint information.
For example, here are a few lines:
length(0) = 10 Duration of task 0 is 10.

needs(16,1) Operation 16 uses resource 1.

before(49,9) Operation 49 must be before operation 9.

release(17) = 0 Operation 17 can start at or after time 0.

due(0) = 149 Operation 0 must be done no later than time 149.

The part before the parentheses is the constraint_type (a string) and then i
have either one or 2 parameters (both integers) inside the parentheses, and
then possibly (for due, release, and length) an integer value.

I am wondering what the best way to parse this input would be, given that I
don't know what type of constraint I will encounter when I read in the line.
Thanks!

~Darius
 
C

CBFalconer

Darius said:
I have a file that I have opened for reading and this file
contains lines with several different types of constraint
information. For example, here are a few lines:

length(0) = 10 Duration of task 0 is 10.
needs(16,1) Operation 16 uses resource 1.
before(49,9) Operation 49 must be before operation 9.
release(17) = 0 Operation 17 can start at or after time 0.
due(0) = 149 Operation 0 must be done no later than time 149.

The part before the parentheses is the constraint_type (a string)
and then i have either one or 2 parameters (both integers) inside
the parentheses, and then possibly (for due, release, and length)
an integer value.

I am wondering what the best way to parse this input would be,
given that I don't know what type of constraint I will encounter
when I read in the line.

If you can change the file format, it would be simplified by a
single format, such as:

<constraint> '(' <integer> [',' <integer>] ')'

Then you could read the initial string up to the '(', check it
against a list of valid values, and either flush the line with an
error message or read the appropriate parameters. The '=' chars
in your list seem totally unnecessary, and the simple parentheses
delimited parameters enable flushing the (assumed) comment portion
of the line easy.

Then you would have:

length(0,10)
release(17,0)
due(0,149)

At any rate, I would build anything around getc() and a few tests.
 
T

Thomas Matthews

Darius said:
Hello,

I have a file that I have opened for reading and this file contains lines
with several different types of constraint information.
For example, here are a few lines:
length(0) = 10 Duration of task 0 is 10.

needs(16,1) Operation 16 uses resource 1.

before(49,9) Operation 49 must be before operation 9.

release(17) = 0 Operation 17 can start at or after time 0.

due(0) = 149 Operation 0 must be done no later than time 149.

The part before the parentheses is the constraint_type (a string) and then i
have either one or 2 parameters (both integers) inside the parentheses, and
then possibly (for due, release, and length) an integer value.

I am wondering what the best way to parse this input would be, given that I
don't know what type of constraint I will encounter when I read in the line.
Thanks!

~Darius

Here is my recommendation:

1/ Read the entire line into a buffer.
2. Extract the constraint type.
3. Execute a function for the restraint type. Pass the string
and optionally the position (after the parenthesis). This
function will take care of parsing the rest of the parameters
for the constraint type.
Since "switch" statements don't work with strings, I recommend
using a table of <constraint_name, function_pointer>.


--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book
http://www.sgi.com/tech/stl -- Standard Template Library
 
R

Régis Troadec

Darius Fatakia said:
Hello,
Hi,


I have a file that I have opened for reading and this file contains lines
with several different types of constraint information.
For example, here are a few lines:
length(0) = 10 Duration of task 0 is 10.

needs(16,1) Operation 16 uses resource 1.

before(49,9) Operation 49 must be before operation 9.

release(17) = 0 Operation 17 can start at or after time 0.

due(0) = 149 Operation 0 must be done no later than time 149.

The part before the parentheses is the constraint_type (a string) and then i
have either one or 2 parameters (both integers) inside the parentheses, and
then possibly (for due, release, and length) an integer value.

I am wondering what the best way to parse this input would be, given that I
don't know what type of constraint I will encounter when I read in the line.
Thanks!

If your lines stricly follow a format such
constraint_type_name(a<opt>,b</opt>) <opt>= c</opt> (the opt tags meaning
optional parts of the line) I would process the file line after line (with
fgets()) and use fscanf() with the corresponding format specifier, this
latter being built according to if the ',' and/or '=' characters have been
found or not thanks to the strchr() function.

Another way is to use strchr() and strtol(). e.g:

/* Ugly example, not modularized, not safe, but it's able to parse according
to your specs */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>

int main(int argc, char *argv[])
{
FILE * fp;

char linebuffer[50];
char constraint_type[50] = { 0 };

char *p_left, *comma, *equal;

if (argc < 2)
{
fprintf(stderr, "Usage : %s <file_to_parse>\n", argv[0]);
return EXIT_FAILURE;
}

fp = fopen(argv[1],"r");

if (fp)
{
linebuffer[49] = '\0';

while (fgets(linebuffer, 50, fp))
{
int a, b, c;

/* Using INT_MIN as dummy value*/
a = b = c = INT_MIN;

if (linebuffer[0] == '\n') continue;

p_left = strchr(linebuffer,'(');

if(p_left)
{
memset(constraint_type,0,50);
strncpy(constraint_type,linebuffer,p_left-linebuffer);
a = strtol(p_left+1, NULL, 10);
comma = strchr(p_left,',');
b = (comma) ? strtol(comma+1, NULL, 10) : INT_MIN;
equal = strchr(p_left,'=');
c = (equal) ? strtol(equal+1, NULL, 10) : INT_MIN;
}

if (c != INT_MIN)
{
if (b != INT_MIN)
{
printf("%s => parameters: %d,%d ; "
"assignement: %d\n",
constraint_type, a, b, c);
}
else
{
printf("%s => parameter: %d ; "
"assignement: %d\n",
constraint_type, a, c);
}
}
else
{
if (b != INT_MIN)
{
printf("%s => parameters: %d,%d\n",
constraint_type, a, b);
}
else
{
printf("%s => parameter: %d\n",
constraint_type, a);
}
}
}

}
else
{
fprintf(stderr, "Unable to open : %s\n", argv[1]);
return EXIT_FAILURE;
}

return EXIT_SUCCESS;
}

Given this text file:
length(0) = 10
needs(16,1)
before(49,9)
release(17) = 0
due(0) = 149

The program outputs:
length => parameter: 0 ; assignement: 10
needs => parameters: 16,1
before => parameters: 49,9
release => parameter: 17 ; assignement: 0
due => parameter: 0 ; assignement: 149


Regis
 
R

Régis Troadec

If your lines stricly follow a format such
constraint_type_name(a<opt>,b</opt>) <opt>= c</opt> (the opt tags meaning
optional parts of the line) I would process the file line after line (with
fgets()) and use fscanf() with the corresponding format specifier, this
[...]
^^^^^^
I meant sscanf()
 
K

Karthik

Darius said:
Hello,

I have a file that I have opened for reading and this file contains lines
with several different types of constraint information.
For example, here are a few lines:
length(0) = 10 Duration of task 0 is 10.

needs(16,1) Operation 16 uses resource 1.

before(49,9) Operation 49 must be before operation 9.

release(17) = 0 Operation 17 can start at or after time 0.

due(0) = 149 Operation 0 must be done no later than time 149.

The part before the parentheses is the constraint_type (a string) and then i
have either one or 2 parameters (both integers) inside the parentheses, and
then possibly (for due, release, and length) an integer value.

I am wondering what the best way to parse this input would be, given that I
don't know what type of constraint I will encounter when I read in the line.
Thanks!

~Darius
A thumb rule to deal with files is as follows -

Copy all file contents to memory.
Close the file
Process the file contents from data saved in Step 1.

This would give a big performance boost.

For eg-


while (!feof(fp) ) {
fscanf( fp, "%s", buff);
}
 
V

Vijay Kumar R Zanvar

Darius Fatakia said:
Hello,

I have a file that I have opened for reading and this file contains lines
with several different types of constraint information.
For example, here are a few lines:
length(0) = 10 Duration of task 0 is 10.

needs(16,1) Operation 16 uses resource 1.

before(49,9) Operation 49 must be before operation 9.

release(17) = 0 Operation 17 can start at or after time 0.

due(0) = 149 Operation 0 must be done no later than time 149.

The part before the parentheses is the constraint_type (a string) and then i
have either one or 2 parameters (both integers) inside the parentheses, and
then possibly (for due, release, and length) an integer value.

I am wondering what the best way to parse this input would be, given that I
don't know what type of constraint I will encounter when I read in the line.
Thanks!

~Darius

The format of the file needs to be pretty uniform in order to use
the following method:

F:\Vijay\C> type scanf.c
#include <stdio.h>
#include <stdlib.h>

int
main ( void )
{
int i, j, k, l, n;

n = scanf ( "length(%d) = %d duration of task %d is %d", &i, &j, &k, &l );
if ( n == 4 )
printf ( "n = %d\ni = %d\nj = %d\nk = %d\nl = %d\n", n, i, j, k, l );
return EXIT_SUCCESS;
}

F:\Vijay\C> gcc scanf.c
F:\Vijay\C> a.exe
length(0) = 10 duration of task 0 is 10
n = 4
i = 0
j = 10
k = 0
l = 10

Z.
 
R

Ralmin

Karthik said:
A thumb rule to deal with files is as follows -

Copy all file contents to memory.
Close the file
Process the file contents from data saved in Step 1.

I would only suggest that approach if the algorithm requires moving back and
forth across the whole file's data. Even in that case, for particularly
large files where that approach is not viable, you may be better off using
fseek() or something.
This would give a big performance boost.

I don't see how it does give a big performance boost. It might make your
program require much more memory than is necessary.
For eg-

while (!feof(fp) ) {
fscanf( fp, "%s", buff);
}

This is a terrible example. Seeing while(!feof(fp)) should flag problems
immediately. A while loop should depend on the success or failure of the
actual file reading function, not the secondary feof test. The problem with
this is that it often causes out-by-one errors in the number of times it
loops.

scanf or fscanf with plain "%s" are just as bad as the gets function. It has
no way to prevent going outside the bounds of the buffer given. You must
always specify a maximum field width with the %s specifier. In addition,
your loop never checks the returned value of fscanf, and it just keeps
overwriting the same buffer with each (whitespace-delimited) string read,
without separating those out into memory properly.

In this case I'd parse one line at a time:

while(fgets(buff, sizeof buff, fp))
{
/* work on the current line in buff */
}
 
T

Thomas Matthews

Karthik said:
Darius Fatakia wrote:
[snip]

A thumb rule to deal with files is as follows -

Copy all file contents to memory.
Close the file
Process the file contents from data saved in Step 1.

This would give a big performance boost.

For eg-


while (!feof(fp) ) {
fscanf( fp, "%s", buff);
}

Yes, this would give a better performance boost, but
many applications cannot fit an entire data file into
memory. A trade-off is to read the data file into
large "chunks", where a chunk is sufficiently large
to reduce the I/O overhead time (such as starting
and stopping a harddrive). Small buffer sizes may
not provide any performance benefits due to buffering
by the operating system and perhaps by the I/O device.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top