Want to optimize this procedure, any advice?

M

Michael B.

I'm still learning C so I've written a simple app which lets you make a
contact list (stored as a linked list of structs), write it to a file, and
read it back. It works fine, but I notice in my loading procedure there's
a point or two where I'll malloc a struct which I then don't need, and
dispose of it; I was wondering if there was some way I could optimize this
kludge out of existence:

void loadContacts () { int fd, bytesread; if ( (fd = open("contacts.data", O_RDONLY, 0) ) != -1 ) { struct contact *prev, *curr = (struct contact *)malloc(sizeof(struct contact)); if ( curr ) { if ( ( bytesread = read(fd, curr, sizeof(struct contact))) == sizeof(struct contact) ) { first.next = curr; // First is a global variable which is the first struct in the linked list } else { free (curr); // Can I avoid having to do this? } do { prev = curr; curr = (struct contact *)malloc(sizeof(struct contact)); prev->next = curr; } while ( ( bytesread = read(fd, curr, sizeof(struct contact))) == sizeof(struct contact) ); if ( bytesread < sizeof(struct contact) ) { prev->next = NULL; free(curr); // Likewise, this? } } else { perror("malloc.\n"); exit(1); } close (fd); }}
 
B

Ben Pfaff

Michael B. said:
void loadContacts () { int fd, bytesread; if ( (fd = open("contacts.data", O_RDONLY, 0) ) != -1 ) { struct contact *prev, *curr = (struct contact *)malloc(sizeof(struct contact)); if ( curr ) { if ( ( bytesread = read(fd, curr, sizeof(struct contact))) == sizeof(struct contact) ) { first.next = curr; // First is a global variable which is the first struct in the linked list } else { free (curr); // Can I avoid having to do this? } do { prev = curr; curr = (struct contact *)malloc(sizeof(struct contact)); prev->next = curr; } while ( ( bytesread = read(fd, curr, sizeof(struct contact))) == sizeof(struct contact) ); if ( bytesread < sizeof(struct contact) ) { prev->next = NULL; free(curr); // Likewise, this? } } else { perror("malloc.\n"); exit(1); } close (fd); }}

I recommend that you learn to use line breaks before worrying
about optimization.
 
M

Michael B.

I recommend that you learn to use line breaks before worrying
about optimization.

Oh my. It looks properly formatted in Pan which I used to post it.
I'm not quite sure what's wrong but it definitely shouldn't have been all
on one line!
 
M

Michael Baehr

Here's an attempt to repost using GNUs. If this doesn't work, I
don't know what will.

void loadContacts () {

int fd, bytesread;

if ( (fd = open("contacts.data", O_RDONLY, 0) ) != -1 ) {

struct contact *prev, *curr = (struct contact *)malloc(sizeof(struct contact));

if ( curr ) {
if ( ( bytesread = read(fd, curr, sizeof(struct contact))) == sizeof(struct contact) ) {
first.next = curr; // First is a global struct which is the first entry in the linked list
} else {
free (curr); // Is there a way to do away with this?
}
do {
prev = curr;
curr = (struct contact *)malloc(sizeof(struct contact));
prev->next = curr;
} while ( ( bytesread = read(fd, curr, sizeof(struct contact))) == sizeof(struct contact) );
if ( bytesread < sizeof(struct contact) ) {
prev->next = NULL;

free(curr); // Likewise, this?
}

} else {
perror("malloc.\n");
exit(1);

}

close (fd);
}
}
 
J

Jens.Toerring

Michael B. said:
I'm still learning C so I've written a simple app which lets you make a
contact list (stored as a linked list of structs), write it to a file, and
read it back. It works fine, but I notice in my loading procedure there's
a point or two where I'll malloc a struct which I then don't need, and
dispose of it; I was wondering if there was some way I could optimize this
kludge out of existence:
void loadContacts () {
int fd, bytesread;

if ( (fd = open("contacts.data", O_RDONLY, 0) ) != -1 ) {

open() isn't a Standard C function, which makes this slightly off-
topic in clc. Why resort to platform specific functions when there
is a portable function like fopen()?
struct contact *prev, *curr =
(struct contact *)malloc(sizeof(struct contact));

Possibly not a problem, but anyway: You don't have to cast the return
value of malloc() and you probably shouldn't, because it doesn't help
in any way but will keep the compiler from complaining if you forgot
to include <stdlib.h>. It's also usually recommended here to write it
instead as

struct contact *prev, *curr = malloc( sizeof *curr );

Here it doesn't make too much of a difference, but should you ever
change the type of the pointer you only have to change it in a
single place and not all over the place.
if ( curr ) {
if ( ( bytesread = read(fd, curr, sizeof(struct contact)))
== sizeof(struct contact) ) {

Another non-standard function - what's wrong about fread()? More
important is that reading structures as a binary blob will stop to
work when you try to use the data on a different system or, possibly,
if you compile the program with a different compiler or maybe even
just a different version of the same compiler. Try it out: use the
same data file e.g. on an Intel-PC and a machine with a different
architecture (say an IRIX or DEC/Compac or Apple machine) and see
what happens...
first.next = curr; /* First is a global variable which is
the first struct in the linked list
} else {
free (curr); /* Can I avoid having to do this? */

No. If you want to get rid of memory you allocated you have to call
free(). And unless you know in advance if you're going to be able to
read what you want you need some memory just for trying - otherwise
where else would you write the date in case of success? Of course,
you could use a temporary, automatic structure which you copy on
success into memory you then allocate, but I doubt very much that
this would be faster at all. And even if it would it would be in
the region of a few nano-seconds.

If you really want to speed things up a bit a better approach might
be be to make a rough estimate of how many structures you're going
to need, allocate that much memory and us it. If it's not enough
use realloc() to get twice as much memory and repeat if necessary.
At the end call realloc() again to reduce the amount of memory to
what you really did use. That way you would have only a small
number of malloc()/realloc() calls instead of one malloc() call
for each of the structures. But I am quite sure that for the kind
of program you seem to be writing you will never make up the time
you need to implement this by the time yould win in execution speed,
even if you're going to run this program 500 times a day as long as
you live;-)
}
do {
prev = curr;

Now it's getting real dangerous. If you just free()ed 'curr' you
now hold a pointer to something you don't own anymore. In this case
assigning to 'prev' doesn't make sense and then later dereferencing
'prev' would be a bad mistake. So you have to bail out if you had to
free() 'curr' in order to never get here.
curr = (struct contact *)malloc(sizeof(struct contact));

Better make this

curr = malloc( sizeof *cur );

And you forgot to check if malloc() did succeed.
prev->next = curr;
} while ( ( bytesread = read(fd, curr, sizeof(struct contact)))
== sizeof(struct contact) );

Wouldn't it be much easier to read if you use

} while ( ( bytesread = read( fd, curr, sizeof *curr ) )
== sizeof *curr );

Bonus point: If you change the "struct contact" bit it will still work
without any changes.
if ( bytesread < sizeof(struct contact) ) {
prev->next = NULL;
free(curr); /* Likewise, this? */

No. You allocated the memory, so you must call free() when you don't
need it anymore.
}
} else {
perror("malloc.\n");
exit(1);

Make this

exit( EXIT_FAILURE );

so you don't have to worry if 1 does not mean failure on the next
machine you're going to run this program on.
}
close (fd);
}
}

Finally, next time you post some code make sure that a) there are
correct linefeeds, b) remove all tab characters, c) use '/* */'
instead of '//' comments (then other people can still copy and past
the code even if your lines get too long) and d) keep your lines
below 80 (but better less) characters, thank you.

Regards, Jens
--
_ _____ _____
| ||_ _||_ _| (e-mail address removed)-berlin.de
_ | | | | | |
| |_| | | | | | http://www.physik.fu-berlin.de/~toerring
\___/ens|_|homs|_|oerring
 
B

Ben Pfaff

Michael Baehr said:
void loadContacts () {

int fd, bytesread;

if ( (fd = open("contacts.data", O_RDONLY, 0) ) != -1 ) {

This is not a standard C function. It is a POSIX extension. If
you want to talk about POSIX, comp.unix.programmer would be a
better place. comp.lang.c is for discussion of programming in
standard C.
struct contact *prev, *curr = (struct contact *)malloc(sizeof(struct contact));

I don't recommend casting the return value of malloc():

* The cast is not required in ANSI C.

* Casting its return value can mask a failure to #include
<stdlib.h>, which leads to undefined behavior.

* If you cast to the wrong type by accident, odd failures can
result.

When calling malloc(), I recommend using the sizeof operator on
the object you are allocating, not on the type. For instance,
*don't* write this:

int *x = malloc (sizeof (int) * 128); /* Don't do this! */

Instead, write it this way:

int *x = malloc (sizeof *x * 128);

There's a few reasons to do it this way:

* If you ever change the type that `x' points to, it's not
necessary to change the malloc() call as well.

This is more of a problem in a large program, but it's still
convenient in a small one.

* Taking the size of an object makes writing the statement
less error-prone. You can verify that the sizeof syntax is
correct without having to look at the declaration.
if ( curr ) {
if ( ( bytesread = read(fd, curr, sizeof(struct contact))) == sizeof(struct contact) ) {

Ditto for read() as for open(). You're not using it properly
anyhow--for reliability you handle signal interruptions.
Actually that goes for open() too but even more so for read().
first.next = curr; // First is a global struct which is the first entry in the linked list
} else {
free (curr); // Is there a way to do away with this?

It is a waste of time to try to get rid of the free(). Chances
are that the disk read and the system call (or your system's
equivalent) to request the disk read take much, much more time
than a free().

A much better way to optimize the code would be to use fopen()
and fread() from the standard C library. Those functions will
buffer the I/O, resulting in fewer system calls, which will quite
possibly speed up your program.
 
M

Michael B.

On Fri, 14 Nov 2003 00:48:12 +0000, Jens.Toerring wrote:

<snip>

Thanks for your advice and also for your clarification on what
belongs in comp.lang.c.

It's not September, but this is my first time posting in this NG.

Have a nice day :)
 
K

Kelsey Bjarnason

I'm still learning C so I've written a simple app which lets you make a
contact list (stored as a linked list of structs), write it to a file, and
read it back. It works fine, but I notice in my loading procedure there's
a point or two where I'll malloc a struct which I then don't need, and
dispose of it; I was wondering if there was some way I could optimize this
kludge out of existence:

void loadContacts () {

int fd, bytesread;

if ( (fd = open("contacts.data", O_RDONLY, 0) ) != -1 ) {

open isn't C. Try fopen.
struct contact *prev, *curr = (struct contact *)malloc(sizeof(struct contact));

Don't cast the return of malloc. Also, you're allocating a single
structure; why not skip allocating entirely and just use struct contact
curr; ?
if ( curr ) {
if ( ( bytesread = read(fd, curr, sizeof(struct contact))) ==

read isn't C.
sizeof(struct contact) ) {
first.next = curr; // First is a global variable which is the first
struct in the linked list
} else {
free (curr); // Can I avoid having to do this?

You probably should. Look what you've done:

next = curr;
free( curr );

What does next point to now? Right - a block of memory that's been freed.
Almost certainly a bug.

<end critique; there may be more issues.>
 
E

Ekkehard Morgenstern

Hi Michael,

Michael B. said:
I was wondering if there was some way I could optimize this

The read() call definitely will take longer than a malloc() or free() call.

So, the only way to save time is to read the file in larger chunks, either
by using buffered I/O ( fopen() / fread() ) which reads a bigger portion of
the file, or by reading the entire file and then extracting the data. After
you've done this you can optimize further by allocating larger blocks of
memory (for holding more structures than just one).

I hope that helps.

Regards,
Ekkehard Morgenstern.
 
M

Mark

Michael Baehr said:
void loadContacts () {
<snip>

As with anything, there are tradeoffs. You should always follow what
I consider to be the golden rules of programming:

1. Make it work.
2. Make it good.
3. Make it fast (optional).

These steps are intended to be completed in order. You have completed
#1, but you have not yet completed #2. You could attempt #3, but
honestly, there is not much point.

In comp.lang.c, #2 means "Make it portable and clean". All the world
is not a unix. Even though quality of implementation varies from
platform to platform, the good folks that wrote your C standard
library are generally very competent and have gone the extra mile to
get maximum performance. (Also it is worth noting that you should
take advantage of a profiler, if available, to achieve #3, as people
often misdiagnose where the bottlenecks in a program actually are at)

It is because of these things that I recommend for you to use standard
C library calls rather than POSIX calls, which are a level too low for
what you are doing. For example, fread() will generally do buffering,
while read() will not. "Well, I'll do my own buffering", you may say.
But which buffer size is the best tradeoff for your implementation?
Most likely the guys that wrote your C library are going to have a
better answer than you.

So, in no particular order, here are my recommendations for your code:

1. Use fopen() (C library) rather than open() (POSIX). Use fread()
rather than read(). Use fclose() rather than close().

2. The pointer to the first element of a list is generally referred
to as the 'head'. You may very well confuse people by calling it
'first', because they will wonder "did he have a reason for not
calling it 'head'?"

3. There are generally very few circumstances where use of a do loop
is the best thing. When you find them, they really tend to jump out
at you. I don't believe that this is one of those circumstances. Use
for here.


You should clean your program up and repost it here for criticism
before you attempt to optimize it. Make it as clean as possible.


Regards,

Mark Haigh
(e-mail address removed)
 
A

Al Bowers

Michael said:
I'm still learning C so I've written a simple app which lets you make a
contact list (stored as a linked list of structs), write it to a file, and
read it back. It works fine, but I notice in my loading procedure there's
a point or two where I'll malloc a struct which I then don't need, and
dispose of it; I was wondering if there was some way I could optimize this
kludge out of existence:

void loadContacts () {

int fd, bytesread;

if ( (fd = open("contacts.data", O_RDONLY, 0) ) != -1 ) {

struct contact *prev, *curr = (struct contact *)malloc(sizeof(struct contact));

if ( curr ) {
if ( ( bytesread = read(fd, curr, sizeof(struct contact))) == sizeof(struct contact) ) {
first.next = curr; // First is a global variable which is the first struct in the linked list
} else {
free (curr); // Can I avoid having to do this?
}
do {
prev = curr;
curr = (struct contact *)malloc(sizeof(struct contact));
prev->next = curr;
} while ( ( bytesread = read(fd, curr, sizeof(struct contact))) == sizeof(struct contact) );
if ( bytesread < sizeof(struct contact) ) {
prev->next = NULL;

free(curr); // Likewise, this?
}

} else {
perror("malloc.\n");
exit(1);

}

close (fd);
}
}

Creating functions for doubly linked lists, for reading and writing
them to a file, is a big chore for someone not experience in the
language. Looking at your posted function, I see that you must have
some understanding. However, the function design and logic flow
is prone for error. You need to evaluate your design and start
using ways to make a function do a specific task, and be useful
in communicating information with other functions. You can do this
by designing your function with parameters and a return value.

You need to redo the function and use the Standard C functions
fopen, fread, and fwrite in place of functions open, read and write.

Here is an example of how you might design various functions that
work together to reach your goal. Doing it this way makes managing
each function, and debugging, much easier.

This example creates a linked list, writes and reads as a binary file.
A good exercise would be to change the functions to read and write
the data in a text file. Then, if you are going to have a large
amount of data, you can look into ways of optimizing, such as
reading, writing, and allocating in chunks.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

typedef struct CONTACT
{
char name[40];
char addr[40];
struct CONTACT *previous;
struct CONTACT *next;
} CONTACT;


CONTACT *CreateNODE(const char *name, const char *addr);
void PrintLIST(CONTACT *p);
void InsertLink(CONTACT **root, CONTACT *src);
void PrintNODE(CONTACT *p);
void FreeLIST(CONTACT **root);
int ListToFile(CONTACT *list, const char *filename);
CONTACT *FileToList(const char *filename);

int main(void)
{
CONTACT *Listhead = NULL, *tmp;

if((tmp = CreateNODE("George Washington","White House")) == NULL)
exit(EXIT_FAILURE);
InsertLink(&Listhead,tmp);
if((tmp = CreateNODE("Bill Clinton","Jail House")) == NULL)
exit(EXIT_FAILURE);
InsertLink(&Listhead,tmp);
if((tmp = CreateNODE("George Bush","Hot House")) == NULL)
exit(EXIT_FAILURE);
InsertLink(&Listhead,tmp);
PrintLIST(Listhead);
if(ListToFile(Listhead, "test.txt"))
puts("\nWrote the file to disk\n");
FreeLIST(&Listhead);
if((Listhead = FileToList("test.txt")) != NULL)
puts("\nRead the File into List\n");
PrintLIST(Listhead);
FreeLIST(&Listhead);
return 0;
}

CONTACT *CreateNODE(const char *name, const char *addr)
{
CONTACT *p;

if((p = malloc(sizeof *p)) == NULL) return NULL;
strncpy(p->name,name,sizeof p->name);
p->name[39] = '\0';
strncpy(p->addr,addr,sizeof p->addr);
p->addr[39] = '\0';
p->next = p->previous = NULL;
return p;
}


void PrintNODE(CONTACT *p)
{
printf("Name: %s\nAddress: %s\n\n",p->name,p->addr);
return;
}

void InsertLink(CONTACT **root, CONTACT *src)
{
CONTACT *tmp = *root;

if(tmp != NULL)
{
for( ; tmp->next; tmp = tmp->next);
src->previous = tmp;
tmp->next = src;
}
else *root = src;
return;
}

void PrintLIST(CONTACT *root)
{
size_t i = 1;
if(root)
{
for( ; root; root = root->next)
{
printf("Link: %u\n",i++);
PrintNODE(root);
}
}
return;
}

void FreeLIST(CONTACT **root)
{
CONTACT *tmp;

for( ; *root; *root = tmp)
{
tmp = (*root)->next;
free(*root);
}
}

int ListToFile(CONTACT *list, const char *filename)
{
FILE *fp;

if(list == NULL) return 0;
if((fp = fopen(filename,"wb")) == NULL) return 0;
for( ; list; list = list->next)
{
if(1 != fwrite(list,sizeof *list,1,fp))
{
remove(filename);
return 0;
}
}
fclose(fp);
return 1;
}

CONTACT *FileToList(const char *filename)
{
CONTACT *root = NULL, tmp , *ptmp;
FILE *fp;
size_t i;

if((fp = fopen(filename, "rb")) == NULL) return NULL;
while((i = fread(&tmp,sizeof tmp, 1, fp)) == 1)
{
if((ptmp = CreateNODE(tmp.name,tmp.addr)) == NULL)
{
FreeLIST(&root);
return NULL;
}
InsertLink(&root,ptmp);
}
if(feof(fp) == 0)
{
FreeLIST(&root);
return NULL;
}
fclose(fp);
return root;
}
 
C

CBFalconer

Ben said:
I recommend that you learn to use line breaks before worrying
about optimization.

The original appeared here with line breaks, but with excessive (8
char) indentation and lines somewhat over 80 chars. Sample below:
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,898
Latest member
BlairH7607

Latest Threads

Top