Problem with realloc (I think)

D

DSF

Hello all,

The following code is driving me crazy. I'm sure the problem with
it must be obvious but I can't see it.
Basically, this code searches any given file for and ASCII 'words'
it can find and stores each word in a dynamically allocated char array
pointed to by a dynamically allocated array of pointers to char*. The
program will crash while freeing the char array.

A few notes.

I have not been able to make it crash using a file that produces a
small number of 'words' (< 70,000).

Setting ALLOCSIZE to larger than the total number of 'words' found
(so the realloc section is never called) seems to prevent the crash.

If I use a file that produces a very large number of 'words'
(>300,000), realloc will fail upon trying to allocate approximately
1.2M of memory. However, realloc succeeds if I replace "allocsize *
sizeof(char *)" with a much larger number, such as 10 million.

Changing ALLOCSIZE will change the array index number where the crash
ocurrs.


Thanks for any help,
DSF

The source:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>


#define BUFFERSIZE 128
#define ALLOCSIZE 100

#define MINWORDLEN 3
#define MAXWORDLEN 20


int main(void)
{
FILE *iptr;
int i, c;
int allocsize = ALLOCSIZE;
int minwordlen = MINWORDLEN;
int maxwordlen = MAXWORDLEN;
int wordcount;
char buffer[BUFFERSIZE];
char **words, **temp;

if(maxwordlen > BUFFERSIZE)
maxwordlen = BUFFERSIZE - 1;

if((iptr = fopen("filename", "rb")) == NULL)
return 1;

words = malloc(allocsize * sizeof(char *));
if(words == NULL)
{
fclose(iptr);
return 3;
}

wordcount = 0;
i = fgetc(iptr);
while(i != EOF)
{
if(isalpha(i))
{
/* get 'word' */
c = 0;
do
{
buffer[c] = (char)tolower(i);
i = fgetc(iptr);
c++;
if(c >= BUFFERSIZE)
break;
} while(isalpha(i) && i != EOF);
buffer[c] = 0;
/*allocate / store current 'word' */
if(strlen(buffer) >= minwordlen && strlen(buffer) <= maxwordlen)
{
words[wordcount] = malloc(strlen(buffer) + 1);
if(words[wordcount] == NULL)
{
for(i = 0; i < wordcount; i++)
free(words);
free(words);
fclose(iptr);
return 3;
}
strcpy(words[wordcount], buffer);
wordcount++;
/* realloc section */
if(wordcount >= allocsize)
{
allocsize += ALLOCSIZE;
temp = realloc(words, allocsize * sizeof(char *));
if(temp == NULL)
{
for(i = 0; i < wordcount; i++)
free(words);
free(words);
fclose(iptr);
return 3;
}
words = temp;
}
/* end realloc section */
}
}
else
{
do
{
i = fgetc(iptr);
} while(!isalpha(i) && i != EOF);
}
}
fclose(iptr);

/* printf added to test code to see which free() index crashes */
for(i = 0; i < wordcount; i++)
{
printf("i: %05d\t%s\n", i, words);
free(words);
}
free(words);

return 0;
}
 
E

Eric Sosman

DSF said:
Hello all,

The following code is driving me crazy. I'm sure the problem with
it must be obvious but I can't see it.
Basically, this code searches any given file for and ASCII 'words'
it can find and stores each word in a dynamically allocated char array
pointed to by a dynamically allocated array of pointers to char*. The
program will crash while freeing the char array.

A few notes.

I have not been able to make it crash using a file that produces a
small number of 'words' (< 70,000).

Setting ALLOCSIZE to larger than the total number of 'words' found
(so the realloc section is never called) seems to prevent the crash.

If I use a file that produces a very large number of 'words'
(>300,000), realloc will fail upon trying to allocate approximately
1.2M of memory. However, realloc succeeds if I replace "allocsize *
sizeof(char *)" with a much larger number, such as 10 million.

Changing ALLOCSIZE will change the array index number where the crash
ocurrs.


Thanks for any help,

I'm failing to see the smoking gun, but there's a whiff
of cordite in the air. I'll comment on a few of the more
sulfurous spots.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>


#define BUFFERSIZE 128
#define ALLOCSIZE 100

#define MINWORDLEN 3
#define MAXWORDLEN 20


int main(void)
{
FILE *iptr;
int i, c;
int allocsize = ALLOCSIZE;
int minwordlen = MINWORDLEN;
int maxwordlen = MAXWORDLEN;
int wordcount;

Some of these (particularly `allocsize' and `wordcount')
would be better as `size_t' variables than as `int's. Also,
I'm nervous about the dual use of `i': Sometimes it stores
the `int' result of fgetc(), but sometimes it's used as a
loop counter that marches up to `wordcount' and ought to be
a `size_t' instead. This probably isn't the problem (unless
you're using a system with a small `int'), but it makes me
nervous anyhow.
char buffer[BUFFERSIZE];
char **words, **temp;

if(maxwordlen > BUFFERSIZE)
maxwordlen = BUFFERSIZE - 1;

Looks like an off-by-one error here: The `>' should
probably be `>='. I don't think that's the problem, though.
if((iptr = fopen("filename", "rb")) == NULL)
return 1;

Since you're reading text, it seems peculiar to use
"rb" instead of "r". But perhaps you know best.

Here and below, a completion status of 1 or 3 may be
meaningful on the system you happen to be using, but is
not guaranteed to make sense everywhere. The portable
completion statuses are 0, EXIT_SUCCESS, and EXIT_FAILURE.
The first two indicate successful completion (perhaps
different "flavors" of success, perhaps not), and the third
indicates unsuccessful completion.
words = malloc(allocsize * sizeof(char *));
if(words == NULL)
{
fclose(iptr);
return 3;
}

wordcount = 0;
i = fgetc(iptr);
while(i != EOF)
{
if(isalpha(i))
{
/* get 'word' */
c = 0;
do
{
buffer[c] = (char)tolower(i);

The cast is unnecessary, but harmless.
i = fgetc(iptr);
c++;
if(c >= BUFFERSIZE)
break;
} while(isalpha(i) && i != EOF);
buffer[c] = 0;

If the loop exits because of the `break', this line stores
to `buffer[BUFFERSIZE]', an array element that does not exist.
The effect is undefined, but on many machines the outcome will
be that a variable adjacent to `buffer' (quite likely `wordcount')
will be clobbered. If your input file contains any "words" of
BUFFERSIZE or more characters, this could well be the problem.
/*allocate / store current 'word' */
if(strlen(buffer) >= minwordlen && strlen(buffer) <= maxwordlen)
{
words[wordcount] = malloc(strlen(buffer) + 1);

You could use `c' instead of `strlen(buffer)' in these
three places.
if(words[wordcount] == NULL)
{
for(i = 0; i < wordcount; i++)
free(words);
free(words);
fclose(iptr);
return 3;
}
strcpy(words[wordcount], buffer);
wordcount++;
/* realloc section */
if(wordcount >= allocsize)
{
allocsize += ALLOCSIZE;
temp = realloc(words, allocsize * sizeof(char *));
if(temp == NULL)
{
for(i = 0; i < wordcount; i++)
free(words);
free(words);
fclose(iptr);
return 3;
}
words = temp;
}
/* end realloc section */
}
}
else
{
do
{
i = fgetc(iptr);
} while(!isalpha(i) && i != EOF);
}
}
fclose(iptr);

/* printf added to test code to see which free() index crashes */
for(i = 0; i < wordcount; i++)
{
printf("i: %05d\t%s\n", i, words);
free(words);
}
free(words);

return 0;
}


Unfortunately, I can't escape the feeling that I, too,
am failing to see the obvious problem ... Still, when one
is confronted with a mystery, it is often a good idea to
clean up *everything*, even the things that "obviously" have
nothing to do with the difficulty. Even if the exercise does
nothing more than eliminate red herrings, it can help.
 
D

DSF

On Tue, 23 Dec 2008 23:50:17 -0500, Eric Sosman

{snipped}

Thanks for answering so quickly.
I'm failing to see the smoking gun, but there's a whiff
of cordite in the air. I'll comment on a few of the more
sulfurous spots.


Some of these (particularly `allocsize' and `wordcount')
would be better as `size_t' variables than as `int's. Also,
I'm nervous about the dual use of `i': Sometimes it stores
the `int' result of fgetc(), but sometimes it's used as a
loop counter that marches up to `wordcount' and ought to be
a `size_t' instead. This probably isn't the problem (unless
you're using a system with a small `int'), but it makes me
nervous anyhow.

Ahh, the tangled web of portability! In this case, int is 32 bits
which is more than large enough for my purposes. size_t in my case
only changes that int to an unsigned, making little difference.
This code is designed for a *very* peculiar application and I doubt
I'll be using it anywhere else, but thanks. I'll try to keep the
above in mind for other applications. Variables are cheap! (Usually.)
char buffer[BUFFERSIZE];
char **words, **temp;

if(maxwordlen > BUFFERSIZE)
maxwordlen = BUFFERSIZE - 1;

Looks like an off-by-one error here: The `>' should
probably be `>='. I don't think that's the problem, though.

You're right, in both cases.
Since you're reading text, it seems peculiar to use
"rb" instead of "r". But perhaps you know best.

Reading text, but not necessarily from text files. This works best
for my system and what I'm doing.
Here and below, a completion status of 1 or 3 may be
meaningful on the system you happen to be using, but is
not guaranteed to make sense everywhere. The portable
completion statuses are 0, EXIT_SUCCESS, and EXIT_FAILURE.
The first two indicate successful completion (perhaps
different "flavors" of success, perhaps not), and the third
indicates unsuccessful completion.

This code in it's original form is a function. min & maxwordlen are
parameters passed to it, hence the check above. This code is a
simplification (but with the exact same problem). The return codes
are meaningful to the rest of the program.
words = malloc(allocsize * sizeof(char *));
if(words == NULL)
{
fclose(iptr);
return 3;
}

wordcount = 0;
i = fgetc(iptr);
while(i != EOF)
{
if(isalpha(i))
{
/* get 'word' */
c = 0;
do
{
buffer[c] = (char)tolower(i);

The cast is unnecessary, but harmless.

Stops the "Conversion may lose significant digits" warning from my
compiler.
i = fgetc(iptr);
c++;
if(c >= BUFFERSIZE)
break;
} while(isalpha(i) && i != EOF);
buffer[c] = 0;

If the loop exits because of the `break', this line stores
to `buffer[BUFFERSIZE]', an array element that does not exist.
The effect is undefined, but on many machines the outcome will
be that a variable adjacent to `buffer' (quite likely `wordcount')
will be clobbered. If your input file contains any "words" of
BUFFERSIZE or more characters, this could well be the problem.

Thanks for spotting this, but unfortunately it's not the problem.
/*allocate / store current 'word' */
if(strlen(buffer) >= minwordlen && strlen(buffer) <= maxwordlen)
{
words[wordcount] = malloc(strlen(buffer) + 1);

You could use `c' instead of `strlen(buffer)' in these
three places.
Thanks.
if(words[wordcount] == NULL)
{
for(i = 0; i < wordcount; i++)
free(words);
free(words);
fclose(iptr);
return 3;
}
strcpy(words[wordcount], buffer);
wordcount++;
/* realloc section */
if(wordcount >= allocsize)
{
allocsize += ALLOCSIZE;
temp = realloc(words, allocsize * sizeof(char *));
if(temp == NULL)
{
for(i = 0; i < wordcount; i++)
free(words);
free(words);
fclose(iptr);
return 3;
}
words = temp;
}
/* end realloc section */
}
}
else
{
do
{
i = fgetc(iptr);
} while(!isalpha(i) && i != EOF);
}
}
fclose(iptr);

/* printf added to test code to see which free() index crashes */
for(i = 0; i < wordcount; i++)
{
printf("i: %05d\t%s\n", i, words);
free(words);
}
free(words);

return 0;
}


Unfortunately, I can't escape the feeling that I, too,
am failing to see the obvious problem ... Still, when one
is confronted with a mystery, it is often a good idea to
clean up *everything*, even the things that "obviously" have
nothing to do with the difficulty. Even if the exercise does
nothing more than eliminate red herrings, it can help.


DSF
 
T

Tim Rentsch

DSF said:
On Tue, 23 Dec 2008 23:50:17 -0500, Eric Sosman
Looks like an off-by-one error here: The `>' should
probably be `>='. I don't think that's the problem, though.

You're right, in both cases.
[snip]
i = fgetc(iptr);
c++;
if(c >= BUFFERSIZE)
break;
} while(isalpha(i) && i != EOF);
buffer[c] = 0;

If the loop exits because of the `break', this line stores
to `buffer[BUFFERSIZE]', an array element that does not exist.
The effect is undefined, but on many machines the outcome will
be that a variable adjacent to `buffer' (quite likely `wordcount')
will be clobbered. If your input file contains any "words" of
BUFFERSIZE or more characters, this could well be the problem.

Thanks for spotting this, but unfortunately it's not the problem.

Just curious - did you ever find out what is?
 
Q

qarnos

Just curious - did you ever find out what is?

I wonder if it was this:

do
{
buffer[c] = (char)tolower(i);
i = fgetc(iptr);
c++;
if(c >= BUFFERSIZE)
break;
} while(isalpha(i) && i != EOF);
buffer[c] = 0;

If (c >= BUFFERSIZE) then he will be writing a NULL at one past the
buffer. And the next variable after the declaration of 'buffer' is
'words'.
 
D

DSF

On Wed, 7 Jan 2009 17:22:30 -0800 (PST), qarnos

Sorry this answer is a bit late, but I haven't been perusing this
group in a while.
I wonder if it was this:

do
{
buffer[c] = (char)tolower(i);
i = fgetc(iptr);
c++;
if(c >= BUFFERSIZE)
break;
} while(isalpha(i) && i != EOF);
buffer[c] = 0;

If (c >= BUFFERSIZE) then he will be writing a NULL at one past the
buffer. And the next variable after the declaration of 'buffer' is
'words'.
This was a potential problem, but not THE problem!



They say it's a poor programmer who blames his compiler, but in this
case, I'm justified...I think. Two tests I've done indicate the
compiler has a problem with realloc.

*** Warning! Platform specific information follows! *** :)

The first test:
I'm programming under Windows and I tried replacing malloc and
realloc with the API equivalents. The program runs fine.

The second test:
I finally got a variant of GCC installed and compiled the original
code as posted here with it. It also runs with no errors.

To be fair, the compiler I'm using is Borland C++ 5.x (circa 1996)
so I'm not too surprised the memory allocation routines might have a
problem with XP. But it's served me well for the small command line
utilities I write. Come to think of it, the last program I wrote that
needed multiple realloc calls uses the Windows API memory routines.
Maybe I've encountered this before and forgot about it? Hmm...

DSF
 
N

Nate Eldredge

DSF said:
They say it's a poor programmer who blames his compiler, but in this
case, I'm justified...I think. Two tests I've done indicate the
compiler has a problem with realloc.

*** Warning! Platform specific information follows! *** :)

The first test:
I'm programming under Windows and I tried replacing malloc and
realloc with the API equivalents. The program runs fine.

The second test:
I finally got a variant of GCC installed and compiled the original
code as posted here with it. It also runs with no errors.

To be fair, the compiler I'm using is Borland C++ 5.x (circa 1996)
so I'm not too surprised the memory allocation routines might have a
problem with XP. But it's served me well for the small command line
utilities I write. Come to think of it, the last program I wrote that
needed multiple realloc calls uses the Windows API memory routines.
Maybe I've encountered this before and forgot about it? Hmm...

If changing your malloc implementation makes your code work, my
suspicion is that you're doing something illegal that the new
implementation happens to let you get away with. A very common one is
writing past the end of a malloc'ed array; often this will make the
program crash at some later time, if you overwrite some of malloc's
internal data. But depending on the implementation, maybe that space
isn't used for malloc's data, or malloc doesn't come back to look at
it. Either way the bug remains. And you've already had at least one
similar bug...

You might see if your malloc implementation has some debugging options,
which might make it easier to find which part of your code is offending.
 
T

Tim Rentsch

DSF said:
On Wed, 7 Jan 2009 17:22:30 -0800 (PST), qarnos

Sorry this answer is a bit late, but I haven't been perusing this
group in a while.
I wonder if it was this:

do
{
buffer[c] = (char)tolower(i);
i = fgetc(iptr);
c++;
if(c >= BUFFERSIZE)
break;
} while(isalpha(i) && i != EOF);
buffer[c] = 0;

If (c >= BUFFERSIZE) then he will be writing a NULL at one past the
buffer. And the next variable after the declaration of 'buffer' is
'words'.
This was a potential problem, but not THE problem!



They say it's a poor programmer who blames his compiler, but in this
case, I'm justified...I think. Two tests I've done indicate the
compiler has a problem with realloc.

*** Warning! Platform specific information follows! *** :)

The first test:
I'm programming under Windows and I tried replacing malloc and
realloc with the API equivalents. The program runs fine.

The second test:
I finally got a variant of GCC installed and compiled the original
code as posted here with it. It also runs with no errors.

To be fair, the compiler I'm using is Borland C++ 5.x (circa 1996)
so I'm not too surprised the memory allocation routines might have a
problem with XP. But it's served me well for the small command line
utilities I write. Come to think of it, the last program I wrote that
needed multiple realloc calls uses the Windows API memory routines.
Maybe I've encountered this before and forgot about it? Hmm...

What happens if you fix the buffer overflow problem and run the
program in the original (Borland) environment? I would think
that test is one you'd want to do, especially using the input
file that made the original program fail.
 
D

DSF

I wonder if it was this:

do
{
buffer[c] = (char)tolower(i);
i = fgetc(iptr);
c++;
if(c >= BUFFERSIZE)
break;
} while(isalpha(i) && i != EOF);
buffer[c] = 0;

If (c >= BUFFERSIZE) then he will be writing a NULL at one past the
buffer. And the next variable after the declaration of 'buffer' is
'words'.
This was a potential problem, but not THE problem!

What happens if you fix the buffer overflow problem and run the
program in the original (Borland) environment? I would think
that test is one you'd want to do, especially using the input
file that made the original program fail.

I should 've been clearer. When I said (above) "not THE problem," I
meant that I had already corrected this and it made no difference in
the program's actions.

I fixed this back on 12-23-08 when Eric Sosman brought it to my
attention in the first reply to my original post.

DSF
 
D

DSF

If changing your malloc implementation makes your code work, my
suspicion is that you're doing something illegal that the new
implementation happens to let you get away with. A very common one is
writing past the end of a malloc'ed array; often this will make the
program crash at some later time, if you overwrite some of malloc's
internal data. But depending on the implementation, maybe that space
isn't used for malloc's data, or malloc doesn't come back to look at
it. Either way the bug remains. And you've already had at least one
similar bug...

You might see if your malloc implementation has some debugging options,
which might make it easier to find which part of your code is offending.

Borland C has an add-on called Codeguard that monitors numerous
aspects of a program's execution, including memory overruns with
malloc'd arrays. It reports no problems.

Any other ideas? All welcome.

DSF
 
N

Nate Eldredge

DSF said:
Borland C has an add-on called Codeguard that monitors numerous
aspects of a program's execution, including memory overruns with
malloc'd arrays. It reports no problems.

Any other ideas? All welcome.

Oh. If Borland C++ 5.x is a 16 bit compiler, this is a likely source of
your problems. Your `int's will overflow once they pass 32767, and
depending on your memory model you won't be able to allocate more than
either 64K or 640K of memory. This would certainly make your program
fail on large inputs. The reason it appears to succeed on very large
inputs might be related to integers wrapping around and becoming small
again; the allocation might succeed, but it would be for a different
size than you expect, and the rest of the program will probably go
horribly awry.

You can fix the wraparound by changing your `int's to `long int's, which
should at least cause it to fail reliably. But to have access to as
much memory as you need, you'll need to get a more modern 32-bit
compiler.
 
C

CBFalconer

Nate said:
.... snip ...

Oh. If Borland C++ 5.x is a 16 bit compiler, this is a likely
source of your problems. Your `int's will overflow once they
pass 32767, and depending on your memory model you won't be able
to allocate more than either 64K or 640K of memory. This would
certainly make your program fail on large inputs. The reason it
....

The parameter to malloc is a size_t type, not an int. So the size
of int should not affect it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top