K&R2, exercise 6.4

B

Barry Schwarz

ok, I changed the name to string_copy

Avoid creating function names beginning with str.

snip
getword() will prompt the user and return a useful value.

The getword you provided contained no prompt.

snip
yes, right, anything NULL needs a new node as the word ha snot seen
before.





I though that code will make this:

N1 --> N2 --> N3 ---> ....

It does build a linked list like that. But it always returns the
address of N1 which is what you store in arr_psnodes. At the end each
element of arr_psnodes will contain the address of the first struct
which is not what you want.
what is the meaning of N2a and N2b, it is not a doubly-linked list. I
thought you mean this:

Since you call the same function individually for each new node and
recursively to determine if word already exists, I used this notation
to indicate which invocation of the function the comment applied to.
N1 means the comment applies to the initial call for the first node.
N2a means the comment applies to the initial call for the second node.
N2b means the recursive call for the second node. You should read the
N1 comments first, then the N2a ones, etc.
N2
/ \
/ \
N2a N2b


I simply have a single linked list:


N1 --> N2 --> N3 --> N4 --> ....






I don't know. SInce mt program is quite raw and my understanding of linked
lists and pointers is quite weak, so I was just avoiding any garbage value
by using "else if".

This has nothing to do with linked lists and pointers. Your previous
if checked !match. This else if checked match. If the first if was
true, the else prevents the second if from being evaluated. If the
first if is false, then match must be true and you don't need to test
it.
so, should I return p->next ?

That only solves the problem for the second struct. The problem will
reappear for the third.

One way to solve the problem is to not fill in arr_psnodes until you
are done. Then simply walk through the list saving addresses in
arr_psnodes until p->next is NULL>
that is not what I want. I want to return the address of latest node
created or address of the last node, if the word is already in the list.







I use strcpy from library . Library does not do any check for me?

strdup calls malloc. malloc can fail and strdup checks before calling
strcpy. However, strdup returns the malloc value whether it is valid
or NULL. You will then use this value in both your sort and print
routines which will lead to undefined behavior if they don't check.
Rather than check in multiple places, it is better if the routine that
calls strdup checks immediately and takes appropriate action when
strdup fails.
oh.. I am getting frustrated now.. :(





okay, at least one thing in my program is good but getword() is just
a modification of K&R2's getword, it is not my creation :(



can you provide some hints on coding this program ?

or

you think a doubly-linked list is a good idea ?

No one mentioned doubly linked lists which seems like overkill for
this problem to me.


Remove del for email
 
B

Barry Schwarz

The only way to know that the program works or not is by printing the
accumulated words and their counts.

Have you tried stepping through the code with a debugger?
and of course, as usual with my programs, it compiles fine and Segfaults
on run time. The 1st run Segfaults because I used a dot in input (
comp.lang ) and 2nd run Segfaults on exit (Ctrl-D).

[arnuld@raj C]$ gcc -ansi -pedantic -Wall -Wextra 6-4.c
6-4.c:123: warning: unused parameter 'pp'
[arnuld@raj C]$ ./a.out
Richard Heathfiled
on comp.lang
Segmentation fault
[arnuld@raj C]$ ./a.out
heathfield
okay for C
Segmentation fault





/* K&R2, exercise 6.4
*
* write a program that prints the distinct words in its input sorted into
* decreasing order of frequency of occurence. Precede each word by its count.
*
*
* The basic 2 step design-idea:
*
* (1) we are going to create a linked list and store words in it.
* Each structure will have a count associated with the word it carries. When
* we will see a word which is already in the list then we will increase
the count * by 1. That way we will have a linked-list of unique
structures. *
* (2) we will save a pointer to each element of the linked
list ( which * are structures) in an array of pointers and will sort
these pointers according * to the number of occurences of each word. This
will make program efficient * as we will not be sorting the structures
in linked-list but the pointers * to those structures. *
*
* version 1.1
*
*/




#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>


enum MAXSIZE { ARRSIZE = 1000, WORDSIZE = 30 };

int getword( char * , int );
struct snode *add_to_snode( struct snode *, char * );
void print_snode( struct snode ** );
char *copy_str( char * );


struct snode
{
char *word;
int count;
struct snode *next;
};



int main( void )
{
char word[WORDSIZE];
struct snode *root_node;
struct snode *arr_psnodes[ARRSIZE];
struct snode **psnode, **psnode_last;

psnode = arr_psnodes;
psnode_last = psnode + (ARRSIZE - 1);

root_node = NULL;


while( getword( word, WORDSIZE ) && (psnode != psnode_last) )
{
if( isalpha(word[0]) )
{
root_node = add_to_snode( root_node, word );
*(psnode++) = root_node;
}
}

Didn't you forget a sort?

At this point arr_psnodes[0] points to your first struct. All the
other used elements point to your second struct.
print_snode( arr_psnodes );

return 0;
}




struct snode *add_to_snode( struct snode *p, char *w )
{
int match;

if( p == NULL )
{
p = malloc( sizeof(struct snode));

if( !p )
{
fprintf( stderr, "out of memory\n" );
exit( EXIT_FAILURE );
}

p->word = copy_str( w );
p->count = 1;
p->next = NULL;
}
else if ( !(match = strcmp( p->word, w )) )
{
p->count++;

return p;
}
else if( match )
{
p->next = add_to_snode( p->next, w );
}

return p->next;
}


void print_snode( struct snode **pp )

You pass arr_psnodes as the argument when you call this function. It
is an automatic array so it is not initialized. You fill in some
number (but not all) of the elements with addresses. As soon as you
reach the end of the addresses you provided, the code invokes
undefined behavior by attempting to evaluate *pp whose value is
indeterminate.
{
while( pp )

pp will never be NULL. It starts out pointing to arr_psnodes[0] and
is incremented forever.
{
printf("%s --> %d\n", (*pp)->word, (*pp)->count);
++pp;
}
}

void sort_nodes( struct snode **pp )
{
/* yet to be done */
}



char *copy_str( char *pc )
{
char *dup;

dup = malloc( strlen(pc) + 1 );

if( dup )
{
strcpy( dup, pc );
}

return dup;
}




/* A program that takes a single word as input. It will discard
* the whole input if it contains anything other than the 26 alphabets
* of English. If the input word contains more than 30 letters then only
* the extra letters will be discarded . For general purpose usage of
* English it does not make any sense to use a word larger than this size.
* Nearly every general purpose word can be expressed in a word with less
* than or equal to 30 letters.
*
*/
int getword( char *word, int max_length ) {
int c;
char *w = word;


while( isspace( c = getchar() ) )
{
;
}

while( --max_length )
{
if( isalpha( c ) )
{
*w++ = c;
}
else if( isspace( c ) || c == EOF )
{
*w = '\0';
break;
}
else
{
return 0;
}

c = getchar();
}


*w = '\0';


return word[0];
}


Remove del for email
 
A

arnuld

This function name belongs to the implementation but that's also just
another nit for the time being.

ok, I changed the name to string_copy

This works but you could accomplish the same thing by simply keeping a
count of the number of elements of arr_psnodes that are in use.

yes, I could have used array-indexing too but I am quite weak at
understanding pointers, so I used them to get an idea about them.


Nowhere do you prompt the user for input.

getword() will prompt the user and return a useful value.


if( isalpha(word[0]) )
{
root_node = add_to_snode( root_node, word ); *(psnode++) = root_node;
This does not do what you want. add_to_snode always adds new nodes at
the end but returns the address of the first node.

I am from C++ background and I never ever used Arrays in C++, neither
any linked list. At present I don't understand much how these linked lists
work. I am just confused about them but I try to learn them. I have read
the section 6.5 8 times but still don't get the basic idea of why used
a linked list and how they work.



For the first node (N1), this is true immediately.

On initial entry for the second node (N2a), this is false.

On the recursive entry for the second node (N2b), this is true.


yes, right, anything NULL needs a new node as the word ha snot seen
before.

(N1) You build struct_1 with "a", 1, and NULL.

(N2b) You build struct_2 with "b", 1, and NULL.


I though that code will make this:

N1 --> N2 --> N3 ---> ....


what is the meaning of N2a and N2b, it is not a doubly-linked list. I
thought you mean this:

N2
/ \
/ \
N2a N2b


I simply have a single linked list:


N1 --> N2 --> N3 --> N4 --> ....



Is there any possibility the if could evaluate to false? If it is
always true, a simple else would work just as well.


I don't know. SInce mt program is quite raw and my understanding of linked
lists and pointers is quite weak, so I was just avoiding any garbage value
by using "else if".


(N2a) So you call this function recursively with NULL and "b".

When N2b returns here, you change struct_1 to point to struct_2 (instead
of NULL) but p still points to struct_1.


so, should I return p->next ?

(N1) And you return the address of struct_1 which main will store in
root_node.

(N2b) And you return the address of struct_2 to N2a.

(N2a) And you return the address of struct_1 which main will store again
in root_node.


that is not what I want. I want to return the address of latest node
created or address of the last node, if the word is already in the list.




The routine that calls strdup does not check if it succeeded. You
either need to check here or everywhere


I use strcpy from library . Library does not do any check for me?


Actually, the extra letters are not discarded. They are left in the
input buffer and will be processed on the next call to getword as if
they were the start of a new word.

oh.. I am getting frustrated now.. :(


But bullet-proofing can come later and this is not an unreasonable
assumption to start with. By isolating getword in a separate function
you have made it easier to fix this problem when you get around to it.


okay, at least one thing in my program is good but getword() is just
a modification of K&R2's getword, it is not my creation :(



can you provide some hints on coding this program ?

or

you think a doubly-linked list is a good idea ?
 
A

arnuld

.....SNIP...
This does not do what you want. add_to_snode always adds new nodes at
the end but returns the address of the first node.

okay I have edited it a little bit.

.....SNIP...


The only way to know that the program works or not is by printing the
accumulated words and their counts.

and of course, as usual with my programs, it compiles fine and Segfaults
on run time. The 1st run Segfaults because I used a dot in input (
comp.lang ) and 2nd run Segfaults on exit (Ctrl-D).

[arnuld@raj C]$ gcc -ansi -pedantic -Wall -Wextra 6-4.c
6-4.c:123: warning: unused parameter 'pp'
[arnuld@raj C]$ ./a.out
Richard Heathfiled
on comp.lang
Segmentation fault
[arnuld@raj C]$ ./a.out
heathfield
okay for C
Segmentation fault





/* K&R2, exercise 6.4
*
* write a program that prints the distinct words in its input sorted into
* decreasing order of frequency of occurence. Precede each word by its count.
*
*
* The basic 2 step design-idea:
*
* (1) we are going to create a linked list and store words in it.
* Each structure will have a count associated with the word it carries. When
* we will see a word which is already in the list then we will increase
the count * by 1. That way we will have a linked-list of unique
structures. *
* (2) we will save a pointer to each element of the linked
list ( which * are structures) in an array of pointers and will sort
these pointers according * to the number of occurences of each word. This
will make program efficient * as we will not be sorting the structures
in linked-list but the pointers * to those structures. *
*
* version 1.1
*
*/




#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>


enum MAXSIZE { ARRSIZE = 1000, WORDSIZE = 30 };

int getword( char * , int );
struct snode *add_to_snode( struct snode *, char * );
void print_snode( struct snode ** );
char *copy_str( char * );


struct snode
{
char *word;
int count;
struct snode *next;
};



int main( void )
{
char word[WORDSIZE];
struct snode *root_node;
struct snode *arr_psnodes[ARRSIZE];
struct snode **psnode, **psnode_last;

psnode = arr_psnodes;
psnode_last = psnode + (ARRSIZE - 1);

root_node = NULL;


while( getword( word, WORDSIZE ) && (psnode != psnode_last) )
{
if( isalpha(word[0]) )
{
root_node = add_to_snode( root_node, word );
*(psnode++) = root_node;
}
}

print_snode( arr_psnodes );

return 0;
}




struct snode *add_to_snode( struct snode *p, char *w )
{
int match;

if( p == NULL )
{
p = malloc( sizeof(struct snode));

if( !p )
{
fprintf( stderr, "out of memory\n" );
exit( EXIT_FAILURE );
}

p->word = copy_str( w );
p->count = 1;
p->next = NULL;
}
else if ( !(match = strcmp( p->word, w )) )
{
p->count++;

return p;
}
else if( match )
{
p->next = add_to_snode( p->next, w );
}

return p->next;
}


void print_snode( struct snode **pp )
{
while( pp )
{
printf("%s --> %d\n", (*pp)->word, (*pp)->count);
++pp;
}
}

void sort_nodes( struct snode **pp )
{
/* yet to be done */
}



char *copy_str( char *pc )
{
char *dup;

dup = malloc( strlen(pc) + 1 );

if( dup )
{
strcpy( dup, pc );
}

return dup;
}




/* A program that takes a single word as input. It will discard
* the whole input if it contains anything other than the 26 alphabets
* of English. If the input word contains more than 30 letters then only
* the extra letters will be discarded . For general purpose usage of
* English it does not make any sense to use a word larger than this size.
* Nearly every general purpose word can be expressed in a word with less
* than or equal to 30 letters.
*
*/
int getword( char *word, int max_length ) {
int c;
char *w = word;


while( isspace( c = getchar() ) )
{
;
}

while( --max_length )
{
if( isalpha( c ) )
{
*w++ = c;
}
else if( isspace( c ) || c == EOF )
{
*w = '\0';
break;
}
else
{
return 0;
}

c = getchar();
}


*w = '\0';


return word[0];
}
 
A

arnuld

okay I have edited it a little bit.
.....SNIP...

Here is next version:






/* K&R2, exercise 6.4
*
* write a program that prints the distinct words in its input sorted into
* decreasing order of frequency of occurence. Precede each word by its count.
*
*
* The basic 2 step design-idea:
*
* (1) we are going to create a linked list and store words in it.
* Each structure will have a count associated with the word it carries. When
* we will see a word which is already in the list then we will increase
the count * by 1. That way we will have a linked-list of unique
structures. *
* (2) we will save a pointer to each element of the linked
list ( which * are structures) in an array of pointers and will sort
these pointers according * to the number of occurences of each word. This
will make program efficient * as we will not be sorting the structures
in linked-list but the pointers * to those structures. *
*
* version 1.2
*
*/




#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>


enum MAXSIZE { ARRSIZE = 1000, WORDSIZE = 30 };

int getword( char * , int );
struct snode *add_to_snode( struct snode *, char * );
void print_snode( struct snode ** );
char *copy_str( char * );


struct snode
{
char *word;
int count;
struct snode *next;
};



int main( void )
{
char word[WORDSIZE];
struct snode *root_node;
struct snode *arr_psnodes[ARRSIZE];
struct snode **psnode, **psnode_last;

psnode = arr_psnodes;
psnode_last = psnode + (ARRSIZE - 1);

root_node = NULL;


while( getword( word, WORDSIZE ) && (psnode != psnode_last) )
{
root_node = add_to_snode( root_node, word );
*(psnode++) = root_node;
}


print_snode( arr_psnodes );

return 0;
}




struct snode *add_to_snode( struct snode *p, char *w )
{
if( p == NULL )
{
p = malloc( sizeof(struct snode));

if( !p )
{
fprintf( stderr, "out of memory\n" );
exit( EXIT_FAILURE );
}

p->word = copy_str( w );
p->count = 1;
p->next = NULL;
}
else if ( !(strcmp( p->word, w )) )
{
p->count++;
}
else
{
p->next = add_to_snode( p->next, w );
return p->next;
}

return p;

}


void print_snode( struct snode **pp )
{
while( pp )
{
printf("%s --> %d\n", (*pp)->word, (*pp)->count);
++pp;
}
}

void sort_nodes( struct snode **pp )
{
/* yet to be done */
}



char *copy_str( char *pc )
{
char *dup;

dup = malloc( strlen(pc) + 1 );

if( dup )
{
strcpy( dup, pc );
}

return dup;
}




/* A program that takes a single word as input. It will discard
* the whole input if it contains anything other than the 26 alphabets
* of English. If the input word contains more than 30 letters then only *
the extra letters will be discarded . For general purpose usage of *
English it does not make any sense to use a word larger than this size. *
Nearly every general purpose word can be expressed in a word with less *
than or equal to 30 letters.
*
*/
int getword( char *word, int max_length ) {
int c;
char *w = word;


while( isspace( c = getchar() ) )
{
;
}

while( --max_length )
{
if( isalpha( c ) )
{
*w++ = c;
}
else if( isspace( c ) || c == EOF )
{
*w = '\0';
break;
}
else
{
return 0;
}

c = getchar();
}


*w = '\0';


return word[0];
}

=============== OUTPUT ========================
[arnuld@raj C]$ gcc -ansi -pedantic -Wall -Wextra 6-4.c
6-4.c:119: warning: unused parameter 'pp'
[arnuld@raj C]$ ./a.out
like this and that
and this and this
like --> 1
this --> 1
and --> 1
that --> 1
and --> 1
this --> 1
and --> 1
this --> 1
Segmentation fault
[arnuld@raj C]$
 
A

arnuld

strdup calls malloc. malloc can fail and strdup checks before calling
strcpy. However, strdup returns the malloc value whether it is valid
or NULL. You will then use this value in both your sort and print
routines which will lead to undefined behavior if they don't check.
Rather than check in multiple places, it is better if the routine that
calls strdup checks immediately and takes appropriate action when
strdup fails.


how about this ( I have changed the function name :):


char *copy_str( char *pc )
{
char *dup;

dup = malloc( strlen(pc) + 1 );

if( !dup )
{
fprintf(stderr, "can not copy word. out of memory\n");
exit( EXIT_FAILURE );
}


strcpy( dup, pc );

return dup;
}
 
B

Barry Schwarz

how about this ( I have changed the function name :):


char *copy_str( char *pc )
{
char *dup;

dup = malloc( strlen(pc) + 1 );

if( !dup )
{
fprintf(stderr, "can not copy word. out of memory\n");

My only additional suggestion would be to include the string pc points
to in this error message so the user has some idea of where the
program failed.
exit( EXIT_FAILURE );
}


strcpy( dup, pc );

return dup;
}


Remove del for email
 
A

arnuld

My only additional suggestion would be to include the string pc points
to in this error message so the user has some idea of where the
program failed.

okay, nice idea :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
474,434
Messages
2,571,690
Members
48,796
Latest member
Greg L.

Latest Threads

Top