split a string, segfaults after 1mb

D

dicaprio martens

Hello,

I am trying to split `get tokens' from a string. However it
keeps segfaulting on me after processing prox. 1mb of plain
text data. According to the debugger it segfaults on the
free() call.

I examend the memory free() try's to give back, but it's clearly
allocated by a malloc call. Could it be, that calling free() so
many times is not a good idea ? Or.. ?

What would be the right way of doing stuff like this ?

I have listed the funtion that splits the string up into tokens
below. If anyone could comment what I am missing ?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAXTOKENS 128
#define MAXLINE 1024

/* split string, return token array */
char **split(char *string, char *delim);

int main(void) {
char *delim = ".,:;`'\"+-_(){}[]<>*&^%$#@!?~/|\\= \t\r\n1234567890";
char **tokens = NULL;
char line[MAXLINE];
int i = 0;

while(fgets(line, MAXLINE, stdin) != NULL) {
if(strlen(line) < 3)
continue;

tokens = split(line, delim);
for(i = 0; tokens != NULL; i++)
printf("%02d: %s\n", i, tokens);

for(i = 0; tokens != NULL; i++)
free(tokens);
}

return 0;
}

/* split string, return token array */
char **split(char *string, char *delim) {
char **tokens = NULL;
char *working = NULL;
char *token = NULL;
int idx = 0;

tokens = malloc(sizeof(char *) * MAXTOKENS);
working = malloc((strlen(string) + 1) * sizeof(char));
if(tokens == NULL || working == NULL)
return NULL;

strcpy(working, string);
for(idx = 0; idx < MAXTOKENS; idx++)
tokens[idx] = NULL;

token = strtok(working, delim);
idx = 0;

while((idx < MAXTOKENS) && (token != NULL)) {
tokens[idx] = malloc(sizeof(char) * strlen(token));
if(tokens[idx] != NULL) {
strcpy(tokens[idx], token);
idx++;
token = strtok(NULL, delim);
}
}

free(working);
return tokens;
}
 
D

David Resnick

dicaprio said:
Hello,

I am trying to split `get tokens' from a string. However it
keeps segfaulting on me after processing prox. 1mb of plain
text data. According to the debugger it segfaults on the
free() call.

I examend the memory free() try's to give back, but it's clearly
allocated by a malloc call. Could it be, that calling free() so
many times is not a good idea ? Or.. ?

What would be the right way of doing stuff like this ?

I have listed the funtion that splits the string up into tokens
below. If anyone could comment what I am missing ?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAXTOKENS 128
#define MAXLINE 1024

/* split string, return token array */
char **split(char *string, char *delim);

int main(void) {
char *delim = ".,:;`'\"+-_(){}[]<>*&^%$#@!?~/|\\= \t\r\n1234567890";
char **tokens = NULL;
char line[MAXLINE];
int i = 0;

while(fgets(line, MAXLINE, stdin) != NULL) {
if(strlen(line) < 3)
continue;

tokens = split(line, delim);

split can return NULL, which will result in a NULL pointer
dereference below. Not really the problem, but...
for(i = 0; tokens != NULL; i++)
printf("%02d: %s\n", i, tokens);

for(i = 0; tokens != NULL; i++)
free(tokens);
}

return 0;
}

/* split string, return token array */
char **split(char *string, char *delim) {
char **tokens = NULL;
char *working = NULL;
char *token = NULL;
int idx = 0;

tokens = malloc(sizeof(char *) * MAXTOKENS);
working = malloc((strlen(string) + 1) * sizeof(char));
if(tokens == NULL || working == NULL)
return NULL;

strcpy(working, string);
for(idx = 0; idx < MAXTOKENS; idx++)
tokens[idx] = NULL;

token = strtok(working, delim);
idx = 0;

while((idx < MAXTOKENS) && (token != NULL)) {
tokens[idx] = malloc(sizeof(char) * strlen(token));
if(tokens[idx] != NULL) {
strcpy(tokens[idx], token);
idx++;
token = strtok(NULL, delim);
}
}


Two issues above. One is that you need to allocate an extra byte for
the terminating '\0' character, i.e. malloc strlen(token) + 1. sizeof
char is by definition 1, so you need not have it. That results in a 1
byte overwrite in strcpy, probably the source of your problem

A second is that your array is no longer NULL terminated if it contains
MAXTOKENS elements, and you read off into space. You need to allocate
MAXTOKENS+1 elements, put a NULL in them all (or put the NULL after the
last one you actually use) and only fill in up to MAXTOKENS

n.b. I was lazy today, rather than read your code I ran valgrind on it,
pointed out both of the above...
free(working);
return tokens;
}

-David
 
L

Lawrence Kirby

Hello,

I am trying to split `get tokens' from a string. However it
keeps segfaulting on me after processing prox. 1mb of plain
text data. According to the debugger it segfaults on the
free() call.

I examend the memory free() try's to give back, but it's clearly
allocated by a malloc call. Could it be, that calling free() so
many times is not a good idea ? Or.. ?

If execution fails in a free() call it typically means that either you
passed an invalid pointer to free() (such as one not created by malloc()
etc. or one already freed), or your program has messed up elsewhere and
corruped malloc's workspace. That is commonly done by writing beyond the
end of objected created by malloc().
What would be the right way of doing stuff like this ?

I have listed the funtion that splits the string up into tokens below.
If anyone could comment what I am missing ?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAXTOKENS 128
#define MAXLINE 1024

/* split string, return token array */ char **split(char *string, char
*delim);

int main(void) {
char *delim = ".,:;`'\"+-_(){}[]<>*&^%$#@!?~/|\\= \t\r\n1234567890";
char **tokens = NULL;
char line[MAXLINE];
int i = 0;

while(fgets(line, MAXLINE, stdin) != NULL) {
if(strlen(line) < 3)
continue;

tokens = split(line, delim);
for(i = 0; tokens != NULL; i++)
printf("%02d: %s\n", i, tokens);

for(i = 0; tokens != NULL; i++)
free(tokens);
}
}
return 0;
}
}
/* split string, return token array */ char **split(char *string, char
*delim) {
char **tokens = NULL;
char *working = NULL;
char *token = NULL;
int idx = 0;

tokens = malloc(sizeof(char *) * MAXTOKENS); working =
malloc((strlen(string) + 1) * sizeof(char)); if(tokens == NULL ||
working == NULL)
return NULL;


You have a memory leak here if only one of tokens and working is null.

Note that sizeof(char) is 1 but definition, always.
strcpy(working, string);
for(idx = 0; idx < MAXTOKENS; idx++)
tokens[idx] = NULL;

token = strtok(working, delim);
idx = 0;

while((idx < MAXTOKENS) && (token != NULL)) {
tokens[idx] = malloc(sizeof(char) * strlen(token));
if(tokens[idx] != NULL) {
strcpy(tokens[idx], token);

You write beyond the end of the array pointed to by tokens[idx] because
you failed to allocate space for the null character at the end. You did
this correctly further up.

Also consider that you're doing an awful lot of allocating here. The
strings already exist in working, there's no real need to copy them
elsewhere.
idx++;
token = strtok(NULL, delim);
}
}

free(working);
return tokens;
}

Lawrence
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top