to free() or not to free() in lex/yacc

B

Berk Birand

Hi,

I am working on a school project where we use lex/yacc to write a compiler
for a fictional (Java-like) language. I have handled all the details about
the yacc and lex files, but I still have a question regarding the dynamic
memory allocation for strings. When the lex file encounters a variable
name, I want it to pass this to yacc through yylval, and then
retrieve it to add to a syntax tree. For this purpose, I wrote the
following code in the lex file (third section), that will save the
variable in yylval:

void copy_name(char** dst, char* yy) {
int len;

//free(*dst);

// allocate memory for the new string
len = strlen(yy);
*dst = (char*) malloc(len * sizeof(char) + 1);
// copy the string
strcpy(*dst, yy);
}

It's basically a wrapper around strcpy(), which also allocates some memory
through malloc().

In the lex file, I have something like this
{LETTER}({LETTER}|{DIGIT}|"_")* {copy_name(&(yylval.Name) , yytext);
return(NAME);}

Now my question is whether I should keep the call to free in copy_name or
not. I would think that I need to do that, otherwise new memory would be
allocated each time a new variable is found, and I'd be facing memory
leaks. Yet when I uncomment that line, I start to get segfaults, for
reasons that I can't understand.

Can anybody with more expertise with lex/yacc help me out with this
problem?

Thank you,
Berk Birand
 
B

Berk Birand

Why not *dst = malloc(len + 1); ?

sizeof(char) is 1 by definition.

Yeah I guess I can do that. I figured it would be more explicit this way,
but if it's defined as 1 then it's just extra complication.

Where does yylval.Name come form, it it isn't allocated by malloc, all
bets are off. You either have to allocate all of the memory for your
string with malloc, or devise another scheme for copying them.

That's the problem. yylval is an internal struct variable to yacc. In my
yacc file, I set it's type as follows:

%union {
int Value;
char Operator[3];
char* Name;
NodeType *nPtr;
};

Now Name is a character pointer. Lex doesn't mess with the allocation of
it, but just declares yylval to be a struct.The point of
this copy_name function is to do that allocation. So I suppose we can
assume that the code that I wrote will be the one that will allocate it.
Would it the make sense to use free?

Thanks for you answer,
bb
 
I

Ian Collins

Berk said:
Hi,

I am working on a school project where we use lex/yacc to write a compiler
for a fictional (Java-like) language. I have handled all the details about
the yacc and lex files, but I still have a question regarding the dynamic
memory allocation for strings. When the lex file encounters a variable
name, I want it to pass this to yacc through yylval, and then
retrieve it to add to a syntax tree. For this purpose, I wrote the
following code in the lex file (third section), that will save the
variable in yylval:

void copy_name(char** dst, char* yy) {
int len;

//free(*dst);

// allocate memory for the new string
len = strlen(yy);
*dst = (char*) malloc(len * sizeof(char) + 1);

Why not *dst = malloc(len + 1); ?

sizeof(char) is 1 by definition.
It's basically a wrapper around strcpy(), which also allocates some memory
through malloc().

In the lex file, I have something like this
{LETTER}({LETTER}|{DIGIT}|"_")* {copy_name(&(yylval.Name) , yytext);
return(NAME);}

Now my question is whether I should keep the call to free in copy_name or
not. I would think that I need to do that, otherwise new memory would be
allocated each time a new variable is found, and I'd be facing memory
leaks. Yet when I uncomment that line, I start to get segfaults, for
reasons that I can't understand.
Where does yylval.Name come form, it it isn't allocated by malloc, all
bets are off. You either have to allocate all of the memory for your
string with malloc, or devise another scheme for copying them.
 
I

Ian Collins

Berk said:
Where does yylval.Name come form, it it isn't allocated by malloc, all
bets are off. You either have to allocate all of the memory for your
string with malloc, or devise another scheme for copying them.


That's the problem. yylval is an internal struct variable to yacc. In my
yacc file, I set it's type as follows:

%union {
int Value;
char Operator[3];
char* Name;
NodeType *nPtr;
};

Now Name is a character pointer. Lex doesn't mess with the allocation of
it, but just declares yylval to be a struct.The point of
this copy_name function is to do that allocation. So I suppose we can
assume that the code that I wrote will be the one that will allocate it.
Would it the make sense to use free?
Not unless something has been assigned to it. If adding the free causes
a crash, odds are the pointer is unassigned and contains a random value
that chokes free. So if the assignment is made once, by you and never
again, drop the free. If it gets reassigned (by you) then you have a
problem as there is no way of telling if the value is valid or not.
 
C

CBFalconer

Berk said:
Hi,

I am working on a school project where we use lex/yacc to write a compiler
for a fictional (Java-like) language. I have handled all the details about
the yacc and lex files, but I still have a question regarding the dynamic
memory allocation for strings. When the lex file encounters a variable
name, I want it to pass this to yacc through yylval, and then
retrieve it to add to a syntax tree. For this purpose, I wrote the
following code in the lex file (third section), that will save the
variable in yylval:

void copy_name(char** dst, char* yy) {
int len;

//free(*dst);

// allocate memory for the new string
len = strlen(yy);
*dst = (char*) malloc(len * sizeof(char) + 1);
// copy the string
strcpy(*dst, yy);
}

sizeof (char) is 1 by definition, and can be omitted. Avoid //
comments in Usenet, they are foulable by line wraps. Check the
result of malloc. #include <stdlib.h>. A simpler version is:

char *copy_name(char *yy) {
char *dst;

if (dst = malloc(1 + strlen(yy))) strcpy(dst, yy);
return dst;
}

which still needs the #include.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>
<http://www.aaxnet.com/editor/edit043.html>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
 
S

Stephen Sprunk

CBFalconer said:
A simpler version is:

char *copy_name(char *yy) {
char *dst;

if (dst = malloc(1 + strlen(yy))) strcpy(dst, yy);
return dst;
}

<OT>
An even simpler version, though less portable, is:

#define copy_name(x) strdup(x)

Of course, that assumes strdup() is available, but it's common enough you
can usually depend on it (and, as shown above, it's easy to implement if
not).
</OT>

S
 
C

Chris Torek

I am working on a school project where we use lex/yacc to write a compiler
for a fictional (Java-like) language. I have handled all the details about
the yacc and lex files, but I still have a question regarding the dynamic
memory allocation for strings. ...

This is not the right group for lex and yacc discussion (comp.compilers
is generally the right place), but: dealing sensibly with symbols
and strings is probably the hardest part of working with what these
tools give you. There are a number of approaches, with different
tradeoffs. (I have used several myself, even going so far as to
use more than one in a single lex-and-yacc system.)

One thing to keep in mind is that yacc's LALR(1) parser will do at
most one token of "lookahead", so depending on many things, you
can sometimes get away with a known but fixed set of buffers. If
you do full-blown dynamic allocation it is hard to avoid leaks when
dealing with (e.g.) syntax errors.

For more information, try comp.compilers. :)
 
J

jacob navia

Berk said:
Hi,

I am working on a school project where we use lex/yacc to write a compiler
for a fictional (Java-like) language. I have handled all the details about
the yacc and lex files, but I still have a question regarding the dynamic
memory allocation for strings. When the lex file encounters a variable
name, I want it to pass this to yacc through yylval, and then
retrieve it to add to a syntax tree. For this purpose, I wrote the
following code in the lex file (third section), that will save the
variable in yylval:

void copy_name(char** dst, char* yy) {
int len;

//free(*dst);

// allocate memory for the new string
len = strlen(yy);
*dst = (char*) malloc(len * sizeof(char) + 1);
// copy the string
strcpy(*dst, yy);
}

It's basically a wrapper around strcpy(), which also allocates some memory
through malloc().

In the lex file, I have something like this
{LETTER}({LETTER}|{DIGIT}|"_")* {copy_name(&(yylval.Name) , yytext);
return(NAME);}

Now my question is whether I should keep the call to free in copy_name or
not. I would think that I need to do that, otherwise new memory would be
allocated each time a new variable is found, and I'd be facing memory
leaks. Yet when I uncomment that line, I start to get segfaults, for
reasons that I can't understand.

Can anybody with more expertise with lex/yacc help me out with this
problem?

Thank you,
Berk Birand

There are several solutions to this. Ordering from the easiest to the more
complex ones:

1) Use a garbage collector for C. Some compilers have it as standard in
their distributions (lcc-win32 for example). If not, google for
"Boehm's garbage collector". The advantage is that you only allocate
memory, never freeing it. Problems gone.
2) If that doesn't help, look again at your problem. Why do you want to
free the space allocated for the names? Maybe it is a much better
strategy to just keep allocating memory and free it all automatically
when your compiler exits. I have used this strategy in many parsers
or compilers that I use. The total amount of memory is small, and it
is used everywhere later in the program, so it is just a waste of
time to micro-manage each small piece of storage.
3) If that doesn't help, at the start of your program allocate a big
chunk of memory for names. When you are done with all names, free
all the memory for names in a single call to free. You specify a
buffer of say, 256K. You allocate names in this buffer. When you are
done with the parsing of names, you free the buffer, not caring about
the individual names.
If you run out of space, you can allocate those buffers in a linked
list.

have fun

jacob
 
B

Berk Birand

<OT>
An even simpler version, though less portable, is:

#define copy_name(x) strdup(x)

I never actually thought of using strdup for this problem. Thanks for
pointing it out. It actually does exactly the same thing as my
copy_name(), except for the free() call. If I end up not using the free(),
I can directly call strdup.

Thanks for the tip.
 
B

Berk Birand

For more information, try comp.compilers. :)

I looked around for some groups that might be oriented towards lex and
yacc, but couldn't find any. Thanks for pointing me to the compilers one,
never thought of that. I will also look into the fixed buffer solution, as
it seems to be an easy way to handle this problem.

Thanks,
Berk
 
B

Berk Birand

On Mon, 16 Apr 2007 11:08:56 +0200, jacob navia wrote:

re are several solutions to this. Ordering from the easiest to the more
complex ones:

1) Use a garbage collector for C.

I actually didn't know there were garbage collection libraries for C.
Thanks a lot for pointing this out. I will definitely keep this in mind
for future projects, as it seems like it can be a life saver. For this
project though, I think such a solutions might be a little too big.

2) If that doesn't help, look again at your problem. Why do you want to
free the space allocated for the names? Maybe it is a much better
strategy to just keep allocating memory and free it all automatically
when your compiler exits. I have used this strategy in many parsers
or compilers that I use. The total amount of memory is small, and it
is used everywhere later in the program, so it is just a waste of
time to micro-manage each small piece of storage.

I suppose this is what will happen if I don't put in that call for free.
It will just allocate some more memory each time a new variable is found,
and keep a copy of the previous one in the memory. This is definitely a
very viable solution (after all, it is what I have now). Especially since
this is an academic project, the programs that we'll be compiling won't be
longer than 30-40 lines, with a dozen variable declarations at best. The
one reason I would want to eschew this approach is so that I can learn a
more rigorous answer. I'm sure this is not what's being done in gcc, and
since the point of the project is to learn, might as well do that.

3) If that doesn't help, at the start of your program allocate a big
chunk of memory for names. When you are done with all names, free
all the memory for names in a single call to free. You specify a
buffer of say, 256K. You allocate names in this buffer. When you are
done with the parsing of names, you free the buffer, not caring about
the individual names.
If you run out of space, you can allocate those buffers in a linked
list.

This last one also seems very possible. I figured I could declare the name
attribute of yylval to be a character array of size 256 (I don't think I
will have longer variables). Then I could simply use strcpy to copy the
string found by lex to that array, and later read from it.

I am nevertheless going to ask this same question in comp.compilers to
see whether they have any other ideas. It seems like bullets 2) and 3) of
your answer will be what I implement.

Thanks for your answers,
Berk
 
C

CBFalconer

Berk said:
I never actually thought of using strdup for this problem. Thanks
for pointing it out. It actually does exactly the same thing as
my copy_name(), except for the free() call. If I end up not using
the free(), I can directly call strdup.

What free()? Neither the original, nor my replacement, used it.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>
<http://www.aaxnet.com/editor/edit043.html>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
 
R

Richard

CBFalconer said:
sizeof (char) is 1 by definition, and can be omitted. Avoid //
comments in Usenet, they are foulable by line wraps. Check the
result of malloc. #include <stdlib.h>. A simpler version is:

char *copy_name(char *yy) {
char *dst;

if (dst = malloc(1 + strlen(yy))) strcpy(dst, yy);

This indentation is strongly frowned upon on any codebase I have worked
on where debuggers might be used to locate problems or walk programmers
through the code. It is too easy to miss the result of the
comparison. Much nicer to read and easier to debug is:

if (dst = malloc(1 + strlen(yy))){
strcpy(dst, yy);
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Lex / Yacc problem 1
Lex/Yacc and multiple input files 2
yacc trouble 2
a yacc problem, help! 1
yacc 4
Does ANSI C allow free() to always be empty? 38
Calling free() causes weird crash. 2
lex and yacc 3

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top