Code Review? memory management in C

P

Pedro Graca

I have a file with different ways to write numbers

---- 8< (cut) --------
0: zero, zilch,, nada, ,,,, empty , void, oh
1: one
7: seven
2: two, too
---- >8 --------------

I wanted to read that file and put it into dynamic memory, like

char *** synonym;
/* assume everything is already malloc'd */
synonym[0][0] = "zero";
synonym[0][1] = "zilch";
/* ... */
synonym[4][0] = NULL;
/* ... */
synonym[7][0] = "seven";
synonym[7][1] = NULL;
synonym[8] = NULL;

After a few days (mostly nights) of struggling with it, the part of the
program that does that is done.

I welcome the approaching weekend to give it a rest before coding the
real reason for the program :)


Is it ok if I post 200 lines of newbie code, spread over three files,
for a code review on either alt.comp.lang.learn.c-c++ or comp.lang.c?
 
M

Mike Wahler

Pedro Graca said:
I have a file with different ways to write numbers

---- 8< (cut) --------
0: zero, zilch,, nada, ,,,, empty , void, oh
1: one
7: seven
2: two, too
---- >8 --------------

I wanted to read that file and put it into dynamic memory, like

char *** synonym;
/* assume everything is already malloc'd */
synonym[0][0] = "zero";
synonym[0][1] = "zilch";
/* ... */
synonym[4][0] = NULL;
/* ... */
synonym[7][0] = "seven";
synonym[7][1] = NULL;
synonym[8] = NULL;

After a few days (mostly nights) of struggling with it, the part of the
program that does that is done.

I welcome the approaching weekend to give it a rest before coding the
real reason for the program :)


Is it ok if I post 200 lines of newbie code, spread over three files,
for a code review on either alt.comp.lang.learn.c-c++ or comp.lang.c?

IMO that is a but much for a newsgroup post. I suggest
you put it on a web site, then post the URL here with
your comments/questions. Then those who are not interested
don't need to spend the time downloading, and those who are,
can.

-Mike
 
P

Pedro Graca

Mike said:
IMO that is a but much for a newsgroup post. I suggest
you put it on a web site, then post the URL here with
your comments/questions. Then those who are not interested
don't need to spend the time downloading, and those who are,
can.

Ok. I posted the code to
<http://hexkid.blogspot.com/2006/02/memory-management-in-c.html>

My real question is if the program does not have memory leaks or some
other bad things with the allocation calls.

Other than that, any other hints/tips on how to do things better would
be very much appreciated.


PS. I had to tweak the code a bit (the '<'s and the '&'s) to have it
display properly. Hope it doesn't become an uncompilable mess for people
who copy/paste it to their computers.
 
M

Michael Mair

F'up2 clc

Pedro said:
Ok. I posted the code to
<http://hexkid.blogspot.com/2006/02/memory-management-in-c.html>

My real question is if the program does not have memory leaks or some
other bad things with the allocation calls.

Other than that, any other hints/tips on how to do things better would
be very much appreciated.

I will comment on your main() function, so I quote that part here:
/* main.c
*
* Pedro Graca, 2006-02-09
* This code is in the public domain
* */

#include <stdio.h>
#include <stdlib.h>
#include "selfref.h"

int main(int argc, char * argv[])

char ** argv
is clearer.
{
char *** synonyms = NULL;
FILE * fp;
char line[MAX_LINE_LENGTH + 1];

if (argc < 2) {
fprintf(stderr, "Usage: %s FILENAME\n", argv[0]);

Note: argv[0] can be NULL. Make this
argv[0] ? argv[0] : said:
exit(EXIT_FAILURE);
}

Here, you have finished validating the input. Now, the real
thing starts. Put it into a function (with const char *
parameter) and call this function.
fp = fopen(argv[1], "r");
if (!fp) {
fprintf(stderr, "Unable to open file.\n");
exit(EXIT_FAILURE);
}
while (fgets(line, MAX_LINE_LENGTH, fp)) {
if (parse_line(line, &synonyms))
fprintf(stderr, "Oops\n");
}
fclose(fp);

Note: If the file is completely empty, synonyms may still
be NULL here.
The same for errors in parse_line().

Obviously (by looking below, too), parse_line() does 2 things:
- If synonyms==NULL: Allocate synonyms (and maybe set
synonyms[0]=NULL); set some internal "line number" counter to 0.
- Always: replace synonyms[line number] by a pointer to the
parsed stuff, increase line number, set synonyms[line number]=NULL

This is one thing too much.
Break this down into
1) initialisation,
2) parsing, and
3) adding another item to the synonyms array.
{ /* validate program */
int nidx=0, syn_idx;

while (synonyms[nidx]) {

If synonyms is NULL, this will go terribly wrong.
syn_idx = 0;
printf("%d: ", nidx);
while (synonyms[nidx][syn_idx]) {
printf("%s", synonyms[nidx][syn_idx]);
++syn_idx;
if (synonyms[nidx][syn_idx])
printf(", ");
}
++nidx;
printf("\n");
}
}

Make this a function of its own.

{ /* free memory */
int nidx=0, syn_idx;

while (synonyms[nidx]) {
syn_idx = 0;
while (synonyms[nidx][syn_idx]) {
free(synonyms[nidx][syn_idx]);
++syn_idx;
}
free(synonyms[nidx][syn_idx]);
free(synonyms[nidx]);
++nidx;
}
free(synonyms[nidx]);

You forget to free(synonyms);

Rule of thumb: The role that allocates the stuff, frees the
stuff. Provide a dedicated allocation handling and an
accompanying deallocation function.
#ifdef _WIN32
printf("Press <ENTER>\n");
getchar();
#endif

Fair enough.
return 0;
}

All in all, rather nice.
Work at the design and break things down into easily testable
functions. Step 2: Have a test for these functions.
PS. I had to tweak the code a bit (the '<'s and the '&'s) to have it
display properly. Hope it doesn't become an uncompilable mess for people
who copy/paste it to their computers.

Provide a "nice to look at" as well as a download version.

Note: Consider using ggets()/fggets() for reading the lines.
These are non-standard but there is at least one public domain
version (by Chuck Falconer).
Advantage: No line length restriction, automatic allocation of
the memory for the line (which means no unnecessary copying in
your case).

I have looked only shortly at the actual selfref.c and
parse_line(). Do not try to save lines by putting if and
the statement following if (condition) in one line.
This easily can obscure things. Especially if one loses
indentation copying over your source from the browser.
The same holds for loops etc, too.


Cheers
Michael
 
F

Flash Gordon

Pedro Graca wrote:

Ok. I posted the code to
<http://hexkid.blogspot.com/2006/02/memory-management-in-c.html>

My real question is if the program does not have memory leaks or some
other bad things with the allocation calls.

Other than that, any other hints/tips on how to do things better would
be very much appreciated.

OK, I'll go through it and paste in the bits that I have comments on
together with the comments.
PS. I had to tweak the code a bit (the '<'s and the '&'s) to have it
display properly. Hope it doesn't become an uncompilable mess for people
who copy/paste it to their computers.

Yes, you do have to escape certain characters, but people get the
correct thing if they cut the text from the page and paste it to an
editor, so that is fine.

Not for code snippets and comments on them:

| if (argc < 2) {
| fprintf(stderr, "Usage: %s FILENAME\n", argv[0]);
| exit(EXIT_FAILURE);
| }

argc could be 0 and argv[0] a null pointer, so you should check for and
handle that.

| char line[MAX_LINE_LENGTH + 1];
| ...
| while (fgets(line, MAX_LINE_LENGTH, fp)) {
| if (parse_line(line, &synonyms)) fprintf(stderr, "Oops\n");
| }

fgets expects the size of the buffer rather than the size of the buffer
- 1, so a simple:
while (fgets(line, sizeof line, fp)) {

You definitely have memory leaks. You have a couple of bits of code of
the form:
| words_list = malloc(sizeof *words_list); /* TODO: error checking */
| words = 0;
|
| ... (stuff that does not involve either word_list or words) ...
|
| while (buf[bufi]) {
|
| ... (stuff that does not involve either word_list or words) ...
|
| words_list[words] = word; /* words will still be 0 here */
| ++words;

So the first time through this while loop you overwrite the pointer
generated by the first malloc.

You need more error checking on your passing. I tried it on an invalid
file (one of the source files in fact!) and it gave no output at all.

As you note, you need error checking on the allocations.

I spotted the following:
| word = malloc(w_len + 1); /* TODO: error checking */
| strncpy(word, &(buf[w_start]), w_len);
| word[w_len] = 0;
There is no point using strncpy IMHO, you might as well use memcpy.

I've not checked the code thoroughly so there may well be other things.
 
P

pete

Michael said:
F'up2 clc

Pedro Graca wrote:
int main(int argc, char * argv[])

char ** argv
is clearer.

int main(int argc, char *argv[])
.... is the form that I use,
because I copied it straight from the standard.
if (argc < 2) {
fprintf(stderr, "Usage: %s FILENAME\n", argv[0]);

Note: argv[0] can be NULL. Make this
argv[0] ? argv[0] : "<program>"
or similar.

I use puts instead of fprintf for that.

#define ARGV_0 "car"
#define OUTPUT ARGV_0 ".txt"

if (argc > 1) {

} else {
puts("Usage:\n >" ARGV_0 " <PART_FILE.txt> ...\n\n"
}
 
K

Keith Thompson

pete said:
Michael said:
F'up2 clc

Pedro Graca wrote:
int main(int argc, char * argv[])

char ** argv
is clearer.

int main(int argc, char *argv[])
... is the form that I use,
because I copied it straight from the standard.

Yes, the standard uses "char *argv[]", which is by definition
equivalent to "char **argv".

The [] form is intended to document the fact that the pointer argument
is a pointer to an array rather than to a single object.
Unfortunately, there's no enforcement of this -- I've never heard of a
compiler that even issues a warning based on this. (I'll note that
the language also supports comments, which can be used for just this
kind of thing.)

Personally, I prefer to use "char **argv" while keeping in mind C's
(admittedly overly complicated) rules about arrays and pointers. In
my opinion, allowing "char *argv[]" to be a synonym for "char **argv",
but only in the context of a parameter declaration, causes more
confusion than it cures. But that is, as I said, just my opinion, and
plenty of smart people unaccountably disagree with me. :cool:}
 
C

CBFalconer

Michael said:
.... snip ...

Note: Consider using ggets()/fggets() for reading the lines.
These are non-standard but there is at least one public domain
version (by Chuck Falconer).
Advantage: No line length restriction, automatic allocation of
the memory for the line (which means no unnecessary copying in
your case).

Available at <http://cbfalconer.home.att.net/download/ggets.zip>

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
 
P

pete

Keith said:
pete said:
Michael said:
F'up2 clc

Pedro Graca wrote:
int main(int argc, char * argv[])

char ** argv
is clearer.

int main(int argc, char *argv[])
... is the form that I use,
because I copied it straight from the standard.

Yes, the standard uses "char *argv[]", which is by definition
equivalent to "char **argv".

The [] form is intended to document the fact that the pointer argument
is a pointer to an array rather than to a single object.
Unfortunately, there's no enforcement of this -- I've never heard of a
compiler that even issues a warning based on this. (I'll note that
the language also supports comments, which can be used for just this
kind of thing.)

Personally, I prefer to use "char **argv" while keeping in mind C's
(admittedly overly complicated) rules about arrays and pointers. In
my opinion, allowing "char *argv[]" to be a synonym for "char **argv",
but only in the context of a parameter declaration, causes more
confusion than it cures. But that is, as I said, just my opinion, and
plenty of smart people unaccountably disagree with me. :cool:}

The only reason that I write main
with an array looking parameter
is because it's shown that way in the standard.
I don't write any other functions that way.

I also use lower case str() and xstr() macros,
instead of upper case,
just because that's the way they are in the standard also.
 
T

Thad Smith

Pedro said:
> I have a file with different ways to write numbers
>
> ---- 8< (cut) --------
> 0: zero, zilch,, nada, ,,,, empty , void, oh
> 1: one
> 7: seven
> 2: two, too
> ---- >8 --------------

and said:
Ok. I posted the code to
<http://hexkid.blogspot.com/2006/02/memory-management-in-c.html>

My real question is if the program does not have memory leaks or some
other bad things with the allocation calls.

Yes, I think there is a memory leak if you have two lines with the same
number:
5, five
5, cinco

The code does too many reallocs for my taste, but should function OK.

I appreciate your attempt to document the parameters for the functions.
I suggest you document the return values as well.

The pointer to data allocated for the first line will be overwritten by
the second.
 
P

Pedro Graca

Pedro said:
Ok. I posted the code to
<http://hexkid.blogspot.com/2006/02/memory-management-in-c.html>

My real question is if the program does not have memory leaks or some
other bad things with the allocation calls.


Thank you all for your comments. I'm going to implement your suggestions
and will post the result.

For now I replaced all malloc(), realloc(), and free() calls for my own
versions. The fact that the program works as expected on two different
systems/compilers does not mean it is correct. I had a *very* large
number of leaks. Thanks again for pointing me to them.

Hopefully, someday, pmg_memory_used() will report 0 at the end of the
program whatever the input was :)



/* dynmem.c
*
* Pedro Graca, 2006-02-10
* This code is in the public domain.
* */

#include <stdio.h>
#include <stdlib.h>
#include "dynmem.h"

/* This is just an indication and does not pretend to be
* very thrustworthy. In some cases (programs with bugs)
* it can be negative. That is why it isn't a size_t;
* and the code has casts involving sizes and `bytes_in_use`
* */
static int bytes_in_use = 0;

void * pmg_memory_malloc(size_t size, const char * msg) {
void * p = malloc(size);
if (p) {
bytes_in_use += (int)size; /* YES! I know what I'm doing. */
if (msg) {
printf("DBG: +%d bytes allocated (%s).\n", (int)size, msg);
}
}
return p;
}

void * pmg_memory_realloc(void * p,
size_t new_size,
size_t old_size,
const char * msg)
{
void * q;
q = realloc(p, new_size);
if (q) {
bytes_in_use += (int)new_size - (int)old_size; /* YES! I know what I'm doing. */
if (msg) {
printf("DBG: (+%d -%d) %d bytes allocated (%s).\n",
(int)new_size,
(int)old_size,
(int)new_size - (int)old_size, msg);
}
}
return q;
}

void pmg_memory_free(void * p, size_t size, const char * msg) {
bytes_in_use -= (int)size; /* YES! I know what I'm doing. */
if (msg) {
printf("DBG: -%d bytes allocated (%s).\n", (int)size, msg);
}
free(p);
}

int pmg_memory_used(void) {
return bytes_in_use;
}
 
E

Emmanuel Delahaye

Pedro Graca a écrit :
My real question is if the program does not have memory leaks or some

Well...

FRMWRK.DBG_SYSALLOC=1
SYSALLOC Overload (85 rec)
SYSALLOC Successful initialization: 85 records available
0: zero, zilch, nada, empty, void, oh
1: one
2: two, too
3:
4:
5:
6:
7: seven
Press <ENTER>

SYSALLOC min=4294967295 max=4294967295 delta=0
SYSALLOC Err: Not-matched list:
SYSALLOC Bloc 003D26B0 (36 bytes) realloc'ed at line 59 of 'selfref.c'
not freed

SYSALLOC Bloc 003D25D8 (4 bytes) malloc'ed at line 56 of 'selfref.c' not
freed
SYSALLOC Bloc 003D2608 (4 bytes) malloc'ed at line 56 of 'selfref.c' not
freed
SYSALLOC Bloc 003D2628 (4 bytes) malloc'ed at line 56 of 'selfref.c' not
freed
SYSALLOC Bloc 003D26A0 (4 bytes) malloc'ed at line 56 of 'selfref.c' not
freed
SYSALLOC Released Memory
FRMWRK.Termine

Press ENTER to continue.
 
F

Flash Gordon

Pedro said:
Thank you all for your comments. I'm going to implement your suggestions
and will post the result.

For now I replaced all malloc(), realloc(), and free() calls for my own
versions. The fact that the program works as expected on two different
systems/compilers does not mean it is correct. I had a *very* large
number of leaks. Thanks again for pointing me to them.

The wrapper functions are a very good idea.

<OT>
If you use Linux you might want to investigate valgrind, a very useful
tool for checking for memory leaks amongst other things. However, this
is not the place to discus valgrind.
Hopefully, someday, pmg_memory_used() will report 0 at the end of the
program whatever the input was :)

Working through the code with pen and paper can be a very good way to
track such things.
/* dynmem.c
*
* Pedro Graca, 2006-02-10
* This code is in the public domain.
* */

#include <stdio.h>
#include <stdlib.h>
#include "dynmem.h"

/* This is just an indication and does not pretend to be
* very thrustworthy. In some cases (programs with bugs)
* it can be negative. That is why it isn't a size_t;
* and the code has casts involving sizes and `bytes_in_use`
* */
static int bytes_in_use = 0;

void * pmg_memory_malloc(size_t size, const char * msg) {
void * p = malloc(size);
if (p) {
bytes_in_use += (int)size; /* YES! I know what I'm doing. */

You don't need the cast. Also, since this is a debugging aid, I would
suggest testing size against INT_MAX from limits.h or some arbitrary
lower value and reporting a problem if a ridiculously large allocation
is made?
if (msg) {
printf("DBG: +%d bytes allocated (%s).\n", (int)size, msg);

It would be a good idea to use fprintf and send the message to stderr.
This will probably allow you to use some implementation specific trick
to send debugging output and errors to a file whilst normal output goes
to the screen. For example, on Linux I would run the program like this:

../prog test_file 2> err.txt

BTW, although the earlier cast was not required, this one was. Although
casting to unsigned long and using a format specifier of %ul would be
better. C99 has an even better method!

You might want an else here and report the malloc failure.
return p;
}

void * pmg_memory_realloc(void * p,
size_t new_size,
size_t old_size,
const char * msg)

Later on, you might want to be clever and store the sizes somewhere and
then look them up here rather than relying on passing in the correct old
size. One of the other regulars has written a hashing library which
might be useful as a way of storing allocation sizes against pointers
and then looking them up efficiently.
{
void * q;
q = realloc(p, new_size);
if (q) {
bytes_in_use += (int)new_size - (int)old_size; /* YES! I know what I'm doing. */

This time you did need to cast since you don't want to use unsigned
arithmetic :)
if (msg) {
printf("DBG: (+%d -%d) %d bytes allocated (%s).\n",
(int)new_size,
(int)old_size,
(int)new_size - (int)old_size, msg);
}
}
return q;

Also, same comments as for your malloc wrapper.
}

void pmg_memory_free(void * p, size_t size, const char * msg) {
bytes_in_use -= (int)size; /* YES! I know what I'm doing. */

Here you did not need the cast.

Also, if you followed my suggesting of storing pointers and sizes this
would allos you to test whether it was a valid pointer you were trying
to free.
if (msg) {
printf("DBG: -%d bytes allocated (%s).\n", (int)size, msg);
}
free(p);
}

int pmg_memory_used(void) {
return bytes_in_use;
}

A good first pass at wrapping malloc and friends.
 
P

Pedro Graca

Pedro said:
Thank you all for your comments. I'm going to implement your suggestions
and will post the result.

As promised, new revamped code integrating most of your
suggestions is now posted to
<http://hexkid.blogspot.com/2006/02/memory-management-revisited.html>

Or you can download the source (except CBFalconer's ggets) from
selfref.zip <http://www.yourfilelink.com/get.php?fid=26471>
selfref.tar.gz <http://www.yourfilelink.com/get.php?fid=26475>
selfref.tar.bz2 <http://www.yourfilelink.com/get.php?fid=26477>

If you want to have a look at it, maybe you can find further
suggestions and comments about the code.
I don't have any specific questions about the code, but any
tip, comment, suggestion, ... is very much appreciated.

I want to thank you again for the previous comments.



* Michael Mair
Note: argv[0] can be NULL.

Hmmm ... I have to remember this.

Isn't argc guaranteed to be at least 1?
Is it possible to reference argv[1] and invoke UB?

#include <stddef.h> /* NULL */
int main(int argc, char ** argv) {
/* will the following line work even when argv[0] is NULL? */

if (argv[1] == NULL) { /* UB if argc == 0 */
/* no parameters passed to program */
}
return 0;
}


* Thad Smith
I appreciate your attempt to document the parameters for the functions.
I suggest you document the return values as well.

I know they are very important and I should have thought about
documenting them before posting the code.


* Flash Gordon
The wrapper functions are a very good idea.

<OT>
If you use Linux you might want to investigate valgrind, a very useful
tool for checking for memory leaks amongst other things. However, this
is not the place to discus valgrind.

Installed! Reading the man page ...
My latest version does not report any leaks.
Later on, you might want to be clever and store the sizes somewhere and
then look them up here rather than relying on passing in the correct old
size. One of the other regulars has written a hashing library which
might be useful as a way of storing allocation sizes against pointers
and then looking them up efficiently.

I did that -- I like to think I'm clever :)

The way it is now, it's not efficient and it's locked to a certain
number of pointers. I guess I could revamp the code to use allocated
memory instead, but that would just (possibly) create problems for
the allocation/deallocation within 'dynmem.c'. I feel I need some
more experience before trying for resource (and speed) efficiency
at this level.
 
K

Keith Thompson

Pedro Graca said:
* Michael Mair
Note: argv[0] can be NULL.

Hmmm ... I have to remember this.

Isn't argc guaranteed to be at least 1?

No. The standard says "The value of argc shall be nonnegative."
Is it possible to reference argv[1] and invoke UB?

Sure. If argc is 0, referencing argv[1] invokes UB. If argc is 1,
*dereferencing* argv[1] invokes UB (because argv[argc] is guaranteed
to be a null pointer).
 
S

Sjouke Burry

Keith said:
* Michael Mair
Note: argv[0] can be NULL.

Hmmm ... I have to remember this.

Isn't argc guaranteed to be at least 1?


No. The standard says "The value of argc shall be nonnegative."

Is it possible to reference argv[1] and invoke UB?


Sure. If argc is 0, referencing argv[1] invokes UB. If argc is 1,
*dereferencing* argv[1] invokes UB (because argv[argc] is guaranteed
to be a null pointer).
Was not the next argument beyond argc count to point
to an empty string??? Or so I heard...
 
K

Keith Thompson

Sjouke Burry said:
Keith said:
* Michael Mair

Note: argv[0] can be NULL.

Hmmm ... I have to remember this.

Isn't argc guaranteed to be at least 1?
No. The standard says "The value of argc shall be nonnegative."
Is it possible to reference argv[1] and invoke UB?
Sure. If argc is 0, referencing argv[1] invokes UB. If argc is 1,
*dereferencing* argv[1] invokes UB (because argv[argc] is guaranteed
to be a null pointer).
Was not the next argument beyond argc count to point
to an empty string??? Or so I heard...

Nope.

C99 5.1.2.2.1 Program startup:

If they are declared, the parameters to the main function shall
obey the following constraints:

-- The value of argc shall be nonnegative.

-- argv[argc] shall be a null pointer.

-- If the value of argc is greater than zero, the array members
argv[0] through argv[argc-1] inclusive shall contain pointers
to strings, which are given implementation-defined values
by the host environment prior to program startup. The intent
is to supply to the program information determined prior to
program startup from elsewhere in the hosted environment. If
the host environment is not capable of supplying strings with
letters in both uppercase and lowercase, the implementation
shall ensure that the strings are received in lowercase.

-- If the value of argc is greater than zero, the string pointed
to by argv[0] represents the program name; argv[0][0] shall be
the null character if the program name is not available from
the host environment. If the value of argc is greater than
one, the strings pointed to by argv[1] through argv[argc-1]
represent the program parameters.

-- The parameters argc and argv and the strings pointed to by
the argv array shall be modifiable by the program, and retain
their last-stored values between program startup and program
termination.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top