newbie-question: container for a "string" of variable length

M

merman

Hi,

The problem:

One function in my program reads line by line from a file (with fgets).
Then I chomp (like Perl) every newline at the end of line:

line[strlen(line)-1] = '\0';

The result is a "string" of variable length (it depends from file-size).

Is there a data-structure which dynamically grows for saving this
"string" of variable length? How can solve my problem?

Please keep the solution simple. I'm a newbie;-).

Thanks for help.

o-o

Thomas
 
M

Martin Ambuhl

merman said:
Hi,

The problem:

One function in my program reads line by line from a file (with fgets).
Then I chomp (like Perl) every newline at the end of line:

line[strlen(line)-1] = '\0';

The above is dangerous. Try something like
{
char *nl;
if ((nl = strchr(line,'\n'))) *nl = 0;
}
The result is a "string" of variable length (it depends from file-size).

Is there a data-structure which dynamically grows for saving this
"string" of variable length? How can solve my problem?

malloc and strcpy (or strncpy) are your friends.
 
P

pete

merman said:
Hi,

The problem:

One function in my program reads line by line from
a file (with fgets).
Then I chomp (like Perl) every newline at the end of line:

line[strlen(line)-1] = '\0';

The result is a "string" of variable length
(it depends from file-size).

Is there a data-structure which dynamically grows for saving this
"string" of variable length? How can solve my problem?

Please keep the solution simple. I'm a newbie;-).

It seems like a job for a linked list.
I don't know how simple that is for you.

It makes things simpler if you have a hard coded line length limit.
I use these functions to get the next nonblank line from a text file:

#include <stdio.h>
#include <ctype.h>

#define LINE_LEN 65
#define str(s) # s
#define xstr(s) str(s)

int nonblank_line(FILE *fd, char *line)
{
int rc;

do {
rc = fscanf(fd, "%" xstr(LINE_LEN) "[^\n]%*[^\n]", line);
if (!feof(fd)) {
getc(fd);
}
} while (rc == 0 || rc == 1 && blank(line));
return rc;
}

int blank(char *line)
{
while (isspace(*line)) {
++line;
}
return *line == '\0';
}

line should be declared this way in the calling function:
char line[LINE_LEN + 1];
nonblank_line has two possible return values, EOF and 1.
 
?

=?ISO-8859-1?Q?=22Nils_O=2E_Sel=E5sdal=22?=

Martin said:
merman said:
Hi,

The problem:

One function in my program reads line by line from a file (with
fgets). Then I chomp (like Perl) every newline at the end of line:

line[strlen(line)-1] = '\0';


The above is dangerous. Try something like
{
char *nl;
if ((nl = strchr(line,'\n'))) *nl = 0;
}
Why would it be dangerous ?
At any rate, man fgets
"fgets() reads in at most one less than size characters ..."
....
"A '\0' is stored after the last character in the buffer."
 
A

Al Bowers

merman said:
Hi,

The problem:

One function in my program reads line by line from a file (with fgets).
Then I chomp (like Perl) every newline at the end of line:

line[strlen(line)-1] = '\0';

This would be bad if function strlen returned 0.
Use function strrchr.

#include <string.h>
char *s1;
if((s1 = strrchr(line,'\n'))!= NULL) *s1 = '\0';

The result is a "string" of variable length (it depends from file-size).

Is there a data-structure which dynamically grows for saving this
"string" of variable length? How can solve my problem?

Please keep the solution simple. I'm a newbie;-).

Design a function, that uses function realloc that will
dynamically allocate your need storage.

A simple definition and useage is listed below.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define BLOCK 16

char *fgetline(FILE *fp)
{
char *s, *tmp, buf[BLOCK];
size_t count;

for(count = 0, s = NULL; (fgets(buf,sizeof buf,fp));count++)
{
if((tmp = realloc(s,(count+1)*BLOCK)) == NULL)
{
free(s);
return NULL;
}
s = tmp;
if(count == 0) *s = '\0';
strcat(s,buf);
if((tmp = strrchr(s,'\n')) != NULL)
{
*tmp = '\0';
break;
}
}
return s;
}

int main(void)
{
char *mystring;

printf("Enter a long sentence: ");
fflush(stdout);
mystring = fgetline();
if(mystring) printf("mystring = \"%s\"\n", mystring);
else puts("Failure with function fgetline");
free(mystring);
printf("\nLets try another. Enter another sentence: ");
fflush(stdout);
mystring = fgetline(stdin);
if(mystring) printf("mystring = \"%s\"\n", mystring);
else puts("Failure with function fgetline");
free(mystring);
return 0;
}
 
P

Paul Hsieh

merman said:
The problem:

One function in my program reads line by line from a file (with fgets).
Then I chomp (like Perl) every newline at the end of line:

line[strlen(line)-1] = '\0';

fgets() will not concatenate a '\n' if the buffer is filled to the
limit or the file closes without a terminating '\n'. This would be
kind of disappointing if strlen(line) were equal to 0.
The result is a "string" of variable length (it depends from file-size).

Is there a data-structure which dynamically grows for saving this
"string" of variable length? How can solve my problem?

The C language by itself is kind of a useless language for the
behavior you want. This is a very frequently asked here, but of
course its not addressed by the FAQ for this group.

There are two main solutions that I can recommend. If you are
concerned solely with the problem of variable length input, then I
have written an article on the subject here:

http://www.pobox.com/~qed/userInput.html

If you have more general variable length string needs then you can use
my "Better String Library" solution to deal with them:

http://bstring.sf.net/

The basic idea is that C by itself *always* requires that you know the
length of the input before you read. But you can read input in fixed
length sections, so you can use a strategy of allocating increasing
amounts of memory interleaved with fetching blocks of input in an
iterative manner. I personally endorse exponentially increasing
successive block sizes (as can be seen in both solutions above) for
speed, and heap pressure reasons.
 
K

Keith Thompson

Nils O. Selåsdal said:
Martin said:
merman said:
Hi,

The problem:

One function in my program reads line by line from a file (with
fgets). Then I chomp (like Perl) every newline at the end of line:

line[strlen(line)-1] = '\0';


The above is dangerous. Try something like
{
char *nl;
if ((nl = strchr(line,'\n'))) *nl = 0;
}
Why would it be dangerous ?

It's dangerous because if the input line is too long for the provided
buffer, fgets() will give you a partial line whose last character is
not a newline. Setting that character to '\0' can write over
significant data.
At any rate, man fgets
"fgets() reads in at most one less than size characters ..."
...
"A '\0' is stored after the last character in the buffer."

Yes, the line is guaranteed to be terminated by a '\0' character. The
point of the assignment is to replace the '\n' character preceding the
'\0' with a '\0', making the string 1 character shorter. The danger
is that the character preceding the '\0' may not be a '\n'.
 
E

Eric Sosman

Nils said:
Martin said:
merman wrote:

Hi,

The problem:

One function in my program reads line by line from a file (with
fgets). Then I chomp (like Perl) every newline at the end of line:

line[strlen(line)-1] = '\0';


The above is dangerous. Try something like
{
char *nl;
if ((nl = strchr(line,'\n'))) *nl = 0;
}

Why would it be dangerous ?

Because if `line' is too short, fgets() will stop
storing characters in it before it gets to the '\n':

char line[10];
fgets (line, sizeof line, stream);

... and the input is "supercalifragilisticexpialidocious\n".
Chopping the final stored character without first making
sure it's actually the '\n' gives you "supercal," wipes out
the following 'i' irretrievably, and leaves you with no
clue that the line isn't finished yet.

Some implementations may permit the very last line in
a file to omit its terminating '\n' altogether. This can
make trouble for the blind chop even if `line' is big enough:

char line[10737];
fgets (line, sizeof line, stream);

... and the input is "supercalifragilisticexpialidocious"
without a newline and followed by end-of-input. In this
case, you'll get "supercalifragilisticexpialidociou" and
lose the final 's'.
 
M

merman

Hi,

thanks for all the help. Wow - so much opinions;-).

I think learning C needs a lot of time.

Best regards

o-o

Thomas
 
M

Martin Ambuhl

Nils said:
Martin said:
merman said:
Hi,

The problem:

One function in my program reads line by line from a file (with
fgets). Then I chomp (like Perl) every newline at the end of line:

line[strlen(line)-1] = '\0';



The above is dangerous. Try something like
{
char *nl;
if ((nl = strchr(line,'\n'))) *nl = 0;
}

Why would it be dangerous ?
At any rate, man fgets
"fgets() reads in at most one less than size characters ..."
...
"A '\0' is stored after the last character in the buffer."

Because you have no guarantee of a '\n' in the string, so setting the
last character of the string to 0 may not do what you want.
 
C

CBFalconer

Martin said:
merman said:
One function in my program reads line by line from a file (with
fgets). Then I chomp (like Perl) every newline at the end of line:

line[strlen(line)-1] = '\0';

The above is dangerous. Try something like
{
char *nl;
if ((nl = strchr(line,'\n'))) *nl = 0;
}
The result is a "string" of variable length (it depends from
file-size.

Is there a data-structure which dynamically grows for saving
this "string" of variable length? How can solve my problem?

malloc and strcpy (or strncpy) are your friends.

Or use the techniques in ggets, avoiding the data copying. See:

<http://cbfalconer.home.att.net/download/ggets.zip>
 
C

CBFalconer

Paul said:
.... snip ...

The basic idea is that C by itself *always* requires that you know
the length of the input before you read. But you can read input
in fixed length sections, so you can use a strategy of allocating
increasing amounts of memory interleaved with fetching blocks of
input in an iterative manner. I personally endorse exponentially
increasing successive block sizes (as can be seen in both solutions
above) for speed, and heap pressure reasons.

No, it doesn't require preknowledge of input length. C input is in
the form of streams, so you can use getc (and putc) and never need
to know the input stream length. Since getc is often available as
a macro doing such may not even represent any inefficiency.

One large advantage of doing so is that, combined with ungetc, you
have the consistent option of 1 char read-ahead, which in turn
solves many parsing problems.

ISO Standard Pascal programmers have known this forever. Users of
C'ified variations of Pascal, such as Borland and Turbo, do not.
Yet properly used C can provide some of the advantages of Pascal.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top