doubly-linked list & sorting

Discussion in 'C Programming' started by arnuld, May 1, 2008.

  1. arnuld

    arnuld Guest

    This program follows from the section 6.5 of K&R2 where authors created a
    doubly-linked list using a binary-tree based approach. The only thing I
    have rewritten myself is the getword function. I am using it because there
    are many concepts involved in this program that I want to understand like:


    1.) degree of efficiency of this program as compared to sort using hash-table
    based approach.

    2.) Keeping the program well structured so that anyone can have a look and
    in some amount of time he can draw a map of the design and the
    key-concepts involved in a program.

    3.) Any other advice you think is important to be given to a beginning C
    programmer :)




    /* K&R2, exercise 6.4
    *
    * write a program that prints the distinct words in its input sorted
    * alphabetically. Precede each word by its count.
    *
    * ( exercise follows the section on linked lists, so I think it is the general
    * idea that I should use linked list to solve the problem )
    *
    */




    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <ctype.h>

    enum MAXSIZE { ARRSIZE = 1000, WORDSIZE = 30 };


    int getword( char * , int );


    struct tnode *add_to_tree( struct tnode *, char * );
    void print_tree( struct tnode * );
    struct tnode *tree_alloc( void );
    char *strdup( char * );


    int main( void )
    {
    struct tnode *root_node;
    char word[ARRSIZE];

    root_node = NULL;

    while( getword( word, WORDSIZE ) )
    {
    root_node = add_to_tree( root_node, word );
    }

    print_tree( root_node );


    return 0;
    }


    struct tnode {
    char *word;
    int count;
    struct tnode *left;
    struct tnode *right;
    };



    /* A program that takes a single word as input. If the input word
    * contains more than 30 letters then only the extra letters will be
    * discarded . For general purpose usage of English it does not make any
    * sense to use a word larger than this size. Nearly every general purpose
    * word can be expressed in a word with less than or equal to 30 letters.
    *
    * I am only considering pure words like "usenet". Words containing anything
    * other than the 26 alphabets of English will simply be discarded e.g. like
    * "usen@t" or "usenet3".
    *
    */
    int getword( char *word, int max_length )
    {
    int c;
    char *w = word;


    while( isspace( c = getchar() ) )
    {
    ;
    }

    while( --max_length )
    {
    if( isalpha( c ) )
    {
    *w++ = c;
    }
    else if( isspace( c ) || c == EOF )
    {
    *w = '\0';
    break;
    }
    else
    {
    return 0;
    }

    c = getchar();
    }

    /* I can simply ignore the if condition and directly write the '\0'
    onto the last element because in worst case it will only rewrite
    the '\n' that is put in there by else if clause.

    or in else if clause, I could replace break with return word[0].

    I thought these 2 ideas will be either inefficient or
    a bad programming practice, so I did not do it.
    */
    if( *w != '\0' )
    {
    *w = '\0';
    }



    return word[0];
    }





    /* adds the word onto the tree */
    struct tnode *add_to_tree( struct tnode *tree, char *pc )
    {
    int match;

    if( tree == NULL )
    {
    tree = tree_alloc();
    tree->word = strdup( pc );
    tree->count = 1;
    tree->left = tree->right = NULL;
    }
    else if( 0 == (match = strcmp( pc, tree->word )) )
    {
    ++tree->count;
    }
    else if( match > 0 )
    {
    tree->right = add_to_tree( tree->right, pc );
    }
    else if( match < 0 )
    {
    tree->left = add_to_tree( tree->left, pc );
    }

    return tree;

    }



    /* prints the tree */
    void print_tree( struct tnode *tree )
    {
    if ( tree )
    {
    print_tree(tree->left);
    printf("%30s -- %4d\n", tree->word, tree->count);
    print_tree(tree->right);
    }

    }


    /* allocate memory for a node */
    struct tnode *tree_alloc( void )
    {
    return malloc( sizeof( struct tnode ) );
    }



    char *strdup( char *pc )
    {
    char *pc2;

    pc2 = malloc( strlen(pc) + 1 );

    if( pc )
    {
    strcpy( pc2, pc );
    }

    return pc2;
    }

    ============= OUTPUT ===============
    [arnuld@raj C]$ gcc -ansi -pedantic -Wall -Wextra test.c
    [arnuld@raj C]$ ./a.out
    like this usenet and that of usenet is this
    and -- 1
    is -- 1
    like -- 1
    of -- 1
    that -- 1
    this -- 2
    usenet -- 2
    [arnuld@raj C]$





    --
    http://lispmachine.wordpress.com/
    my email ID is @ the above address
    arnuld, May 1, 2008
    #1
    1. Advertising

  2. On Thu, 01 May 2008 16:19:47 +0500, arnuld <> wrote:

    >This program follows from the section 6.5 of K&R2 where authors created a
    >doubly-linked list using a binary-tree based approach. The only thing I
    >have rewritten myself is the getword function. I am using it because there
    >are many concepts involved in this program that I want to understand like:
    >
    >
    >1.) degree of efficiency of this program as compared to sort using hash-table
    >based approach.
    >
    >2.) Keeping the program well structured so that anyone can have a look and
    > in some amount of time he can draw a map of the design and the
    >key-concepts involved in a program.
    >
    >3.) Any other advice you think is important to be given to a beginning C
    >programmer :)
    >
    >
    >
    >
    >/* K&R2, exercise 6.4
    > *
    > * write a program that prints the distinct words in its input sorted
    > * alphabetically. Precede each word by its count.
    > *
    > * ( exercise follows the section on linked lists, so I think it is the general
    > * idea that I should use linked list to solve the problem )
    > *
    > */
    >
    >
    >
    >
    >#include <stdio.h>
    >#include <stdlib.h>
    >#include <string.h>
    >#include <ctype.h>
    >
    >enum MAXSIZE { ARRSIZE = 1000, WORDSIZE = 30 };
    >
    >
    >int getword( char * , int );
    >
    >
    >struct tnode *add_to_tree( struct tnode *, char * );
    >void print_tree( struct tnode * );
    >struct tnode *tree_alloc( void );
    >char *strdup( char * );
    >
    >
    >int main( void )
    >{
    > struct tnode *root_node;
    > char word[ARRSIZE];


    Here you define an array of 1000 char.

    >
    > root_node = NULL;
    >
    > while( getword( word, WORDSIZE ) )


    Here you tell getword to only use the first 30 char in the 1000
    element array. At this point your concerns about efficiency are
    misplaced. You have been messing with this enum for at least a
    half-dozen threads and you still don't get it right.

    > {
    > root_node = add_to_tree( root_node, word );
    > }
    >
    > print_tree( root_node );
    >
    >
    > return 0;
    >}
    >
    >
    >struct tnode {
    > char *word;
    > int count;
    > struct tnode *left;
    > struct tnode *right;
    >};


    How does your code compile if this is not before all the references to
    the struct? I don't see an incomplete or tentative declaration for
    this struct up near the top.

    >
    >
    >
    >/* A program that takes a single word as input. If the input word
    > * contains more than 30 letters then only the extra letters will be
    > * discarded . For general purpose usage of English it does not make any
    > * sense to use a word larger than this size. Nearly every general purpose
    > * word can be expressed in a word with less than or equal to 30 letters.
    > *
    > * I am only considering pure words like "usenet". Words containing anything
    > * other than the 26 alphabets of English will simply be discarded e.g. like
    > * "usen@t" or "usenet3".
    > *
    > */
    >int getword( char *word, int max_length )
    >{
    > int c;
    > char *w = word;
    >
    >
    > while( isspace( c = getchar() ) )
    > {
    > ;
    > }
    >
    > while( --max_length )
    > {
    > if( isalpha( c ) )
    > {
    > *w++ = c;
    > }
    > else if( isspace( c ) || c == EOF )
    > {
    > *w = '\0';
    > break;
    > }
    > else
    > {
    > return 0;
    > }
    >
    > c = getchar();
    > }
    >
    > /* I can simply ignore the if condition and directly write the '\0'
    > onto the last element because in worst case it will only rewrite
    > the '\n' that is put in there by else if clause.
    >
    > or in else if clause, I could replace break with return word[0].
    >
    > I thought these 2 ideas will be either inefficient or
    > a bad programming practice, so I did not do it.
    > */
    > if( *w != '\0' )
    > {
    > *w = '\0';
    > }
    >
    >
    >
    > return word[0];


    What is the significance of returning the first character in the word?

    >}
    >
    >
    >
    >
    >
    >/* adds the word onto the tree */
    >struct tnode *add_to_tree( struct tnode *tree, char *pc )
    >{
    > int match;
    >
    > if( tree == NULL )
    > {
    > tree = tree_alloc();
    > tree->word = strdup( pc );
    > tree->count = 1;
    > tree->left = tree->right = NULL;
    > }
    > else if( 0 == (match = strcmp( pc, tree->word )) )
    > {
    > ++tree->count;
    > }
    > else if( match > 0 )
    > {
    > tree->right = add_to_tree( tree->right, pc );
    > }
    > else if( match < 0 )
    > {
    > tree->left = add_to_tree( tree->left, pc );
    > }
    >
    > return tree;
    >
    >}
    >
    >
    >
    >/* prints the tree */
    >void print_tree( struct tnode *tree )
    >{
    > if ( tree )
    > {
    > print_tree(tree->left);
    > printf("%30s -- %4d\n", tree->word, tree->count);
    > print_tree(tree->right);
    > }
    >
    >}
    >
    >
    >/* allocate memory for a node */
    >struct tnode *tree_alloc( void )
    >{
    > return malloc( sizeof( struct tnode ) );
    >}
    >
    >
    >
    >char *strdup( char *pc )
    >{
    > char *pc2;
    >
    > pc2 = malloc( strlen(pc) + 1 );
    >
    > if( pc )


    I'm pretty sure you meant pc2 here.

    > {
    > strcpy( pc2, pc );
    > }
    >
    > return pc2;
    >}
    >
    >============= OUTPUT ===============
    >[arnuld@raj C]$ gcc -ansi -pedantic -Wall -Wextra test.c
    >[arnuld@raj C]$ ./a.out
    >like this usenet and that of usenet is this
    > and -- 1
    > is -- 1
    > like -- 1
    > of -- 1
    > that -- 1
    > this -- 2
    > usenet -- 2
    >[arnuld@raj C]$



    Remove del for email
    Barry Schwarz, May 2, 2008
    #2
    1. Advertising

  3. arnuld

    arnuld Guest

    I am pretty much confused with the way C solves problems, not to
    mention the operation of pointers to arrays of pointers and pointers
    to pointers to structures. I just messed up my posts. Here is the
    program in full:


    /* K&R2, example from section 6.5, page 139
    *
    * Self-Refrential Structures
    *
    */



    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <ctype.h>


    enum MAXSIZE { MAXWORD = 100, BUFFSIZE = 500 };

    struct tnode
    {
    char *word;
    int count;
    struct tnode *left;
    struct tnode *right;
    };



    struct tnode *talloc( void );
    char *strdup( char * );


    struct tnode *add_node( struct tnode *, char * );
    void print_tree( struct tnode * );
    int getword( char *, int );
    void ungetch( int );

    char buff[BUFFSIZE];
    int bufp = 0;

    int main( void )
    {
    struct tnode *root;
    char word[MAXWORD];

    root = NULL;

    while( getword( word, MAXWORD ) != EOF )
    {
    if( isalpha( word[0] ) )
    {
    root = add_node( root, word );
    }
    }

    print_tree( root );


    return 0;
    }



    struct tnode *add_node( struct tnode *p, char *w )
    {
    int cond;

    if ( !p )
    {
    p = talloc();
    p->word = strdup( w );
    p->count = 1;
    p->left = p->right = NULL;
    }
    else if( !(cond = strcmp( w, p->word )) )
    {
    p->count++;
    }
    else if( cond < 0 )
    {
    p->left = add_node( p->left, w );
    }
    else if( cond > 0 )
    {
    p->right = add_node( p->right, w );
    }

    return p;
    }



    /* prints the tree in order */
    void print_tree( struct tnode *p )
    {
    if( p )
    {
    print_tree( p->left );
    printf( "%4d %s\n", p->count, p->word );
    print_tree( p->right);
    }
    }


    /* aloocate memory for new node */
    struct tnode *talloc( void )
    {
    return malloc( sizeof( struct tnode ) );
    }

    /* copies the string into some safe place*/
    char *strdup( char *s )
    {
    char *p;

    p = malloc( sizeof( strlen(s) + 1 ) );

    if( p )
    {
    strcpy( p, s );
    }

    return p;
    }



    /* get next word from input */
    int getword( char *word, int max )
    {
    int c;
    char *w;

    w = word;

    while( isspace( c = getchar() ))
    {
    ;
    }

    if( c != EOF )
    {
    *w++ = c;
    }

    if( !isalpha( c ))
    {
    *w = '\0';
    return c;
    }


    for( ; --max > 0; ++w )
    {
    if( !isalnum( *w = getchar() ) )
    {
    ungetch( *w );
    break;
    }
    }

    *w = '\0';

    return word[0];
    }


    void ungetch( int c )
    {
    if( bufp >= BUFFSIZE )
    {
    printf("ungetch: too many characters\n");
    exit(EXIT_FAILURE);
    }
    else
    {
    buff[bufp++] = c;
    }
    }
    arnuld, May 2, 2008
    #3
  4. arnuld

    user923005 Guest

    On May 2, 3:14 am, arnuld <> wrote:
    > I am pretty much confused with the way C solves problems, not to
    > mention the operation of pointers to arrays of pointers and pointers
    > to pointers to structures. I just messed up my posts. Here is the
    > program in full:
    >
    > /* K&R2, example from section 6.5, page 139
    >  *
    >  * Self-Refrential Structures
    >  *
    >  */
    >
    > #include <stdio.h>
    > #include <stdlib.h>
    > #include <string.h>
    > #include <ctype.h>
    >
    > enum MAXSIZE { MAXWORD = 100, BUFFSIZE = 500 };
    >
    > struct tnode
    > {
    >   char *word;
    >   int count;
    >   struct tnode *left;
    >   struct tnode *right;
    >
    > };
    >
    > struct tnode *talloc( void );
    > char *strdup( char * );
    >
    > struct tnode *add_node( struct tnode *, char * );
    > void print_tree( struct tnode * );
    > int getword( char *, int );
    > void ungetch( int  );
    >
    > char buff[BUFFSIZE];
    > int bufp = 0;
    >
    > int main( void )
    > {
    >   struct tnode *root;
    >   char word[MAXWORD];
    >
    >   root = NULL;
    >
    >   while( getword( word, MAXWORD ) != EOF )
    >     {
    >       if( isalpha( word[0] ) )
    >         {
    >           root = add_node( root, word );
    >         }
    >     }
    >
    >   print_tree( root );
    >
    >   return 0;
    >
    > }
    >
    > struct tnode *add_node( struct tnode *p, char *w )
    > {
    >   int cond;
    >
    >   if ( !p )
    >     {
    >       p = talloc();
    >       p->word = strdup( w );
    >       p->count = 1;
    >       p->left = p->right = NULL;
    >     }
    >   else if( !(cond = strcmp( w, p->word )) )
    >     {
    >       p->count++;
    >     }
    >   else if( cond < 0 )
    >     {
    >       p->left  = add_node( p->left, w );
    >     }
    >   else if( cond > 0 )
    >     {
    >       p->right = add_node( p->right, w );
    >     }
    >
    >   return p;
    >
    > }
    >
    > /* prints the tree in order */
    > void print_tree( struct tnode *p )
    > {
    >   if( p )
    >     {
    >       print_tree( p->left );
    >       printf( "%4d %s\n", p->count, p->word );
    >       print_tree( p->right);
    >     }
    >
    > }
    >
    > /* aloocate memory for new node */
    > struct tnode *talloc( void )
    > {
    >   return malloc( sizeof( struct tnode ) );
    >
    > }
    >
    > /* copies the string into some safe place*/
    > char *strdup( char *s )
    > {
    >   char *p;
    >
    >   p = malloc( sizeof( strlen(s) + 1 ) );
    >
    >   if( p )
    >     {
    >       strcpy( p, s );
    >     }
    >
    >   return p;
    >
    > }
    >
    > /* get next word from input */
    > int getword( char *word, int max )
    > {
    >   int c;
    >   char *w;
    >
    >   w = word;
    >
    >   while( isspace( c = getchar() ))
    >     {
    >       ;
    >     }
    >
    >   if( c != EOF )
    >     {
    >       *w++ = c;
    >     }
    >
    >   if( !isalpha( c ))
    >     {
    >       *w = '\0';
    >       return c;
    >     }
    >
    >   for( ; --max > 0; ++w )
    >     {
    >       if( !isalnum( *w = getchar() ) )
    >         {
    >           ungetch( *w );
    >           break;
    >         }
    >     }
    >
    >   *w = '\0';
    >
    >   return word[0];
    >
    > }
    >
    > void ungetch( int c )
    > {
    >   if( bufp >= BUFFSIZE )
    >     {
    >       printf("ungetch: too many characters\n");
    >       exit(EXIT_FAILURE);
    >     }
    >   else
    >     {
    >       buff[bufp++] = c;
    >     }
    >
    >
    >
    > }



    There are lots of problems with this code. What you are doing to add
    a node is not right. I did add problem diagnosis and also changed the
    name of strdup() which is definitely not allowed.

    /* K&R2, example from section 6.5, page 139
    *
    * Self-Refrential Structures
    *
    */
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <ctype.h>

    enum MAXSIZE {
    MAXWORD = 100, BUFFSIZE = 500
    };

    struct tnode {
    char *word;
    int count;
    struct tnode *left;
    struct tnode *right;
    };

    struct tnode *talloc(void);
    char *dupstr(char *);
    struct tnode *add_node(struct tnode *, char *);
    void print_tree(struct tnode *);
    int getword(char *, int);
    void ungetch(int);

    char buff[BUFFSIZE]; /* You never do anything with buff. */
    int bufp = 0;

    int main(void)
    {
    struct tnode *root;
    char word[MAXWORD];
    root = NULL;
    while (getword(word, MAXWORD) != EOF) {
    if (isalpha(word[0])) {
    root = add_node(root, word);
    }
    }
    print_tree(root);
    return 0;
    }

    struct tnode *add_node(struct tnode * p, char *w)
    {
    int cond;
    if (!p) {
    p = talloc();
    if (p != NULL) {
    p->word = dupstr(w);
    /* Always check for failed allocations and do _something_
    to handle it */
    if (p->word == NULL) {
    printf("ERROR! Out of memory at %s:%d\n", __FILE__,
    __LINE__);
    exit(EXIT_FAILURE);
    }
    p->count = 1;
    p->left = p->right = NULL;
    } else {
    /* Always check for failed allocations and do _something_
    to handle it */
    printf("ERROR! Out of memory at %s:%d\n", __FILE__,
    __LINE__);
    exit(EXIT_FAILURE);
    }
    } else if (!(cond = strcmp(w, p->word))) {
    p->count++;
    } else if (cond < 0) {
    p->left = add_node(p->left, w);
    } else if (cond > 0) {
    p->right = add_node(p->right, w);
    }
    return p;
    }

    /* prints the tree in order */
    void print_tree(struct tnode * p)
    {
    if (p) {
    print_tree(p->left);
    printf("%4d %s\n", p->count, p->word);
    print_tree(p->right);
    }
    }

    /* allocate memory for new node */
    struct tnode *talloc(void)
    { /* Since it is a one liner and called in one place, I see no need
    for this function */
    return malloc(sizeof(struct tnode));
    }

    /* copies the string into some safe place */
    /* strdup() is in the implementation's namespace */
    char *dupstr(char *s)
    {
    char *p;
    p = malloc(sizeof(strlen(s) + 1));
    if (p) {
    strcpy(p, s);
    }
    return p;
    }

    /* get next word from input */
    int getword(char *word, int max)
    {
    int c;
    char *w;
    w = word;
    while (isspace(c = getchar())) {
    ;
    }
    if (c != EOF) {
    *w++ = c;
    }
    if (!isalpha(c)) {
    *w = '\0';
    return c;
    }
    for (; --max > 0; ++w) {
    if (!isalnum(*w = getchar())) {
    ungetch(*w);
    break;
    }
    }
    *w = '\0';
    return word[0];
    }

    void ungetch(int c)
    {
    if (bufp >= BUFFSIZE) {
    printf("ungetch: too many characters\n");
    exit(EXIT_FAILURE);
    } else {
    buff[bufp++] = c;
    }
    }
    /*
    You never release memory.
    You will need to dispose of every object that you allocated.
    */
    user923005, May 2, 2008
    #4
  5. arnuld

    user923005 Guest

    Sample for Arnuld to compare:
    Here are the exercises from Chapter 6 (except the hash table stuff
    which seems out of place to me) as purloined from Richard Heathfield's
    K&R2 solution site, and then made to work with arbitrary files, and
    then the missing exercise was added and then I made some more icky
    hacks and called it good. That didn't make it good, but that's what I
    called it.

    /*
    Chapter 6. Structures Stuff.
    Write a program that prints out the distinct words in its
    input sorted into decreasing order of frequency of occurrence.
    Precede each word by its count.
    Authors: Bryan Williams, Ben Pfaff, and Richard Heathfield
    wrote it, and then Dann Corbit poked it with a stick.
    */

    #include "ch6.h"

    static void my_putch(char **, int *, int);
    int my_getch(FILE *);
    static void my_ungetch(int);
    static int skipws(FILE *);
    static int getstelem(char **, int *, int, FILE *);

    static int gComparisonLength = 6;
    /*
    Default: input is to file stdin, output to stdout.
    Plan: read the words into a tree, keeping a count of how many we
    have,
    allocate an array big enough to hold Treecount (WORD *)'s
    walk the tree to populate the array.
    qsort the array, based on size.
    print the array
    qsort the array, based on word text.
    print similarities in the array data (based on matching
    <gComparisonLength> initial characters)
    free the array
    free the tree
    free tibet (optional)
    free international shipping!
    */

    FILE *theFileIn(char *fname)
    {
    FILE *f = stdin;
    FILE *ftry = NULL;
    if (fname && fname[0]) {
    ftry = fopen(fname, "r");
    if (ftry == NULL) {
    char Emess[512];
    sprintf(Emess, "Failure to open %s for input.\n", fname);
    perror(Emess);
    puts("Error opening input file. Input stream set to
    standard input.");
    }
    }
    if (ftry) {
    f = ftry;
    }
    return f;
    }

    FILE *theFileOut(char *fname)
    {
    FILE *f = stdin;
    FILE *ftry = NULL;
    if (fname && fname[0]) {
    ftry = fopen(fname, "a");
    if (ftry == NULL) {
    char Emess[512];
    sprintf(Emess, "Failure to open %s for output.\n", fname);
    perror(Emess);
    puts("Error opening output file. Output stream set to
    standard output.");
    }
    }
    if (ftry) {
    f = ftry;
    }
    return f;
    }

    #define NONALPHA "1234567890 \v\f\n\t\r+=-*/\\,.;:'#~?<>|{}[]`!\"£$
    %^&()"

    int main0(int argc, char **argv)
    {
    int Status = SUCCESS;
    WORD *Words = NULL;
    size_t Treecount = 0;
    WORD **WordArray = NULL;
    FILE *fin = stdin;
    FILE *fout = stdout;
    if (argc > 1)
    fin = theFileIn(argv[1]);
    if (argc > 2)
    fout = theFileOut(argv[2]);
    if (argc > 3)
    gComparisonLength = atoi(argv[3]);
    gComparisonLength = gComparisonLength > 15? 15: gComparisonLength;
    gComparisonLength = gComparisonLength < 1? 1: gComparisonLength;

    /* Read the words into a tree */

    if (SUCCESS == Status) {
    Status = ReadInputToTree(&Words, &Treecount, fin);
    }
    /* Sanity check for no sensible input */
    if (SUCCESS == Status) {
    if (0 == Treecount) {
    Status = NO_WORDS_ON_INPUT;
    }
    }
    /* allocate a sufficiently large array */
    if (SUCCESS == Status) {
    WordArray = malloc(Treecount * sizeof *WordArray);
    if (NULL == WordArray) {
    Status = CANNOT_MALLOC_WORDARRAY;
    }
    }
    /* Walk the tree into the array */
    if (SUCCESS == Status) {
    Status = WalkTree(WordArray, Words);
    }
    /* qsort the array */
    if (SUCCESS == Status) {
    qsort(WordArray, Treecount, sizeof *WordArray, CompareCounts);
    }
    /* walk down the WordArray outputting the values */
    if (SUCCESS == Status) {
    Status = OutputWords(fout, Treecount, WordArray);
    }
    /* qsort the array */
    if (SUCCESS == Status) {
    qsort(WordArray, Treecount, sizeof *WordArray, CompareWords);
    }
    /* walk down the WordArray outputting the values */
    if (SUCCESS == Status) {
    Status = OutputStemmedWords(fout, Treecount, WordArray);
    }

    /* free the word array */
    if (NULL != WordArray) {
    free(WordArray);
    WordArray = NULL;
    }
    /* and free the tree memory */
    if (NULL != Words) {
    FreeTree(Words);
    Words = NULL;
    }
    /* Error report and we are finshed */
    if (SUCCESS != Status) {
    fprintf(stderr, "Program failed with code %d\n", Status);
    }
    return (SUCCESS == Status ? EXIT_SUCCESS : EXIT_FAILURE);
    }

    void FreeTree(WORD * W)
    {
    if (NULL != W) {
    if (NULL != W->Word) {
    free(W->Word);
    W->Word = NULL;
    }
    if (NULL != W->Left) {
    FreeTree(W->Left);
    W->Left = NULL;
    }
    if (NULL != W->Right) {
    FreeTree(W->Right);
    W->Right = NULL;
    }
    }
    }

    int AddToTree(WORD ** DestTree, size_t * Treecount, char
    *Word)
    {
    int Status = SUCCESS;
    int CompResult = 0;
    /* safety check */
    assert(NULL != DestTree);
    assert(NULL != Treecount);
    assert(NULL != Word);
    /* ok, either *DestTree is NULL or it isn't (deep huh?) */
    if (NULL == *DestTree) { /* this is the place to add it then */
    *DestTree = malloc(sizeof **DestTree);
    if (NULL == *DestTree) {
    /* horrible - we're out of memory */
    Status = NO_MEMORY_FOR_WORDNODE;
    } else {
    (*DestTree)->Left = NULL;
    (*DestTree)->Right = NULL;
    (*DestTree)->Count = 1;
    (*DestTree)->Word = dupstr(Word);
    if (NULL == (*DestTree)->Word) {
    /* even more horrible - we've run out of memory in the
    middle */
    Status = NO_MEMORY_FOR_WORD;
    free(*DestTree);
    *DestTree = NULL;
    } else {
    /* everything was successful, add one to the tree
    nodes count */
    ++*Treecount;
    }
    }
    } else { /* we need to make a decision */
    CompResult = strcmp(Word, (*DestTree)->Word);
    if (0 < CompResult) {
    Status = AddToTree(&(*DestTree)->Left, Treecount, Word);
    } else if (0 > CompResult) {
    Status = AddToTree(&(*DestTree)->Left, Treecount, Word);
    } else {
    /* add one to the count - this is the same node */
    ++(*DestTree)->Count;
    }
    } /* end of else we need to make a
    decision */
    return Status;
    }

    int ReadInputToTree(WORD ** DestTree, size_t * Treecount,
    FILE * Input)
    {
    int Status = SUCCESS;
    char Buf[8192] =
    {0};
    char *Word = NULL;
    /* safety check */
    assert(NULL != DestTree);
    assert(NULL != Treecount);
    assert(NULL != Input);
    /* for every line */
    while (NULL != fgets(Buf, sizeof Buf, Input)) {
    /* strtok the input to get only alpha character words */
    Word = strtok(Buf, NONALPHA);
    while (SUCCESS == Status && NULL != Word) {
    /* deal with this word by adding it to the tree */
    Status = AddToTree(DestTree, Treecount, Word);
    /* next word */
    if (SUCCESS == Status) {
    Word = strtok(NULL, NONALPHA);
    }
    }
    }
    return Status;
    }

    int WalkTree(WORD ** DestArray, WORD * Word)
    {
    int Status = SUCCESS;
    static WORD **Write = NULL;
    /* safety check */
    assert(NULL != Word);
    /* store the starting point if this is the first call */
    if (NULL != DestArray) {
    Write = DestArray;
    }
    /* Now add this node and it's kids */
    if (NULL != Word) {
    *Write = Word;
    ++Write;
    if (NULL != Word->Left) {
    Status = WalkTree(NULL, Word->Left);
    }
    if (NULL != Word->Right) {
    Status = WalkTree(NULL, Word->Right);
    }
    }
    return Status;
    }

    /*
    CompareCounts is called by qsort. This means that it gets pointers
    to the
    data items being compared. In this case the data items are pointers
    too.
    */
    int CompareCounts(const void *vWord1, const void *vWord2)
    {
    int Result = 0;
    WORD *const * Word1 = vWord1;
    WORD *const * Word2 = vWord2;
    assert(NULL != vWord1);
    assert(NULL != vWord2);
    /* ensure the result is either 1, 0 or -1 */
    if ((*Word1)->Count < (*Word2)->Count) {
    Result = 1;
    } else if ((*Word1)->Count > (*Word2)->Count) {
    Result = -1;
    } else {
    Result = 0;
    }
    return Result;
    }

    int CompareWords(const void *vWord1, const void *vWord2)
    {
    WORD *const * Word1 = vWord1;
    WORD *const * Word2 = vWord2;
    assert(NULL != vWord1);
    assert(NULL != vWord2);
    /* ensure the result is either 1, 0 or -1 */
    return i_strcmp((*Word1)->Word, (*Word2)->Word);
    }

    int CompareWordStems(const void *vWord1, const void
    *vWord2)
    {
    WORD *const * Word1 = vWord1;
    WORD *const * Word2 = vWord2;
    assert(NULL != vWord1);
    assert(NULL != vWord2);
    /* ensure the result is either 1, 0 or -1 */
    return i_strncmp((*Word1)->Word, (*Word2)->Word, gComparisonLength);
    }



    int OutputWords(FILE * Dest, size_t Count, WORD **
    WordArray)
    {
    int Status = SUCCESS;
    size_t Pos = 0;
    /* safety check */
    assert(NULL != Dest);
    assert(NULL != WordArray);
    /* Print a header */
    fprintf(Dest, "Total Words : %lu\n", (unsigned long) Count);
    /* Print the words in descending order */
    while (SUCCESS == Status && Pos < Count) {
    fprintf(Dest, "%10lu %s\n", (unsigned long) WordArray[Pos]-
    >Count, WordArray[Pos]->Word);

    ++Pos;
    }
    return Status;
    }

    int OutputStemmedWords(FILE * Dest, size_t Count, WORD **
    WordArray)
    {
    int Status = SUCCESS;
    size_t Pos;
    /* safety check */
    assert(NULL != Dest);
    assert(NULL != WordArray);
    /* Print a header */
    fprintf(Dest, "Total Words : %lu\n", (unsigned long) Count);
    /* Print the words in descending order */
    for (Pos = 1; Pos < Count; Pos++) {
    if (CompareWordStems(&WordArray[Pos-1], &WordArray[Pos])==0 &&
    CompareWords(&WordArray[Pos-1], &WordArray[Pos])!=0)
    fprintf(Dest, "%s is similar to %s\n", WordArray[Pos-1]-
    >Word, WordArray[Pos]->Word);

    }
    return Status;
    }

    /* Write a cross-referencer program that prints a list of all words in
    a
    * document, and, for each word, a list of the line numbers on which
    it
    * occurs. Remove noise words like "the", "and," and so on.
    */
    /* no such thing as strdup, so let's write one
    * supplementary question: why did I call this function dupstr,
    * rather than strdup?
    *
    */
    char *dupstr(char *s)
    {
    char *p = NULL;
    if (s != NULL) {
    p = malloc(strlen(s) + 1);
    if (p) {
    strcpy(p, s);
    }
    }
    return p;
    }

    /* case-insensitive string comparison */
    int i_strcmp(const char *s, const char *t)
    {
    int diff = 0;
    char cs = 0;
    char ct = 0;
    while (diff == 0 && *s != '\0' && *t != '\0') {
    cs = tolower((unsigned char) *s);
    ct = tolower((unsigned char) *t);
    if (cs < ct) {
    diff = -1;
    } else if (cs > ct) {
    diff = 1;
    }
    ++s;
    ++t;
    }
    if (diff == 0 && *s != *t) {
    /* the shorter string comes lexicographically sooner */
    if (*s == '\0') {
    diff = -1;
    } else {
    diff = 1;
    }
    }
    return diff;
    }
    /* case-insensitive string comparison */
    int i_strncmp(const char *s, const char *t, int n)
    {
    int diff = 0;
    int counter = 0;
    char cs = 0;
    char ct = 0;
    while (diff == 0 && *s != '\0' && *t != '\0' && counter++ < n) {
    cs = tolower((unsigned char) *s);
    ct = tolower((unsigned char) *t);
    if (cs < ct) {
    diff = -1;
    } else if (cs > ct) {
    diff = 1;
    }
    ++s;
    ++t;
    }
    if (diff == 0 && *s != *t) {
    /* the shorter string comes lexicographically sooner */
    if (*s == '\0') {
    diff = -1;
    } else {
    diff = 1;
    }
    }
    return diff;
    }


    void printlist(struct linelist * list, FILE *fout)
    {
    if (list != NULL) {
    printlist(list->next, fout);
    fprintf(fout,"%6d ", list->line);
    }
    }

    void printtree(struct wordtree * node, FILE *fout)
    {
    if (node != NULL) {
    printtree(node->left, fout);
    fprintf(fout,"%18s ", node->word);
    printlist(node->firstline, fout);
    fprintf(fout,"\n");
    printtree(node->right, fout);
    }
    }

    struct linelist *addlink(int line)
    {
    struct linelist *new = malloc(sizeof *new);
    if (new != NULL) {
    new->line = line;
    new->next = NULL;
    }
    return new;
    }

    void deletelist(struct linelist * listnode)
    {
    if (listnode != NULL) {
    deletelist(listnode->next);
    free(listnode);
    }
    }

    void deleteword(struct wordtree ** node)
    {
    struct wordtree *temp = NULL;
    if (node != NULL) {
    if (*node != '\0') {
    if ((*node)->right != NULL) {
    temp = *node;
    deleteword(&temp->right);
    }
    if ((*node)->left != NULL) {
    temp = *node;
    deleteword(&temp->left);
    }
    if ((*node)->word != NULL) {
    free((*node)->word);
    }
    if ((*node)->firstline != NULL) {
    deletelist((*node)->firstline);
    }
    free(*node);
    *node = NULL;
    }
    }
    }

    struct wordtree *addword(struct wordtree ** node, char *word, int
    line)
    {
    struct wordtree *wordloc = NULL;
    struct linelist *newline = NULL;
    struct wordtree *temp = NULL;
    int diff = 0;
    if (node != NULL && word != NULL) {
    if (NULL == *node) {
    *node = malloc(sizeof **node);
    if (NULL != *node) {
    (*node)->left = NULL;
    (*node)->right = NULL;
    (*node)->word = dupstr(word);
    if ((*node)->word != NULL) {
    (*node)->firstline = addlink(line);
    if ((*node)->firstline != NULL) {
    wordloc = *node;
    }
    }
    }
    } else {
    diff = i_strcmp((*node)->word, word);
    if (0 == diff) {
    /* we have seen this word before! add this line number
    to the
    * front of the line number list. Adding to the end
    would
    * keep them in the right order, but would take
    longer. By
    * continually adding them to the front, we take less
    time,
    * but we pay for it at the end by having to go to the
    end of
    * the list and working backwards. Recursion makes
    this less
    * painful than it might have been. */
    newline = addlink(line);
    if (newline != NULL) {
    wordloc = *node;
    newline->next = (*node)->firstline;
    (*node)->firstline = newline;
    }
    } else if (0 < diff) {
    temp = *node;
    wordloc = addword(&temp->left, word, line);
    } else {
    temp = *node;
    wordloc = addword(&temp->right, word, line);
    }
    }
    }
    if (wordloc == NULL) {
    deleteword(node);
    }
    return wordloc;
    }


    /* We can't use strchr because it's not yet been discussed, so we'll
    * write our own instead.
    */
    char *char_in_string(char *s, int c)
    {
    char *p = NULL;
    /* if there's no data, we'll stop */
    if (s != NULL) {
    if (c != '\0') {
    while (*s != '\0' && *s != c) {
    ++s;
    }
    if (*s == c) {
    p = s;
    }
    }
    }
    return p;
    }

    /* We can't use strtok because it hasn't been discussed in the text
    * yet, so we'll write our own.
    * To minimise hassle at the user end, let's modify the user's pointer
    * to s, so that we can just call this thing in a simple loop.
    */
    char *tokenise(char **s, char *delims)
    {
    char *p = NULL;
    char *q = NULL;
    if (s != NULL && *s != '\0' && delims != NULL) {
    /* pass over leading delimiters */
    while (NULL != char_in_string(delims, **s)) {
    ++*s;
    }
    if (**s != '\0') {
    q = *s + 1;
    p = *s;
    while (*q != '\0' && NULL == char_in_string(delims, *q)) {
    ++q;
    }
    *s = q + (*q != '\0');
    *q = '\0';
    }
    }
    return p;
    }

    /* return zero if this word is not a noise word,
    * or non-zero if it is a noise word
    */
    int NoiseWord(char *s)
    {
    int found = 0;
    int giveup = 0;
    static const char *const list[] = {
    "$",
    "0",
    "1",
    "2",
    "3",
    "4",
    "5",
    "6",
    "7",
    "8",
    "9",
    "a",
    "a's",
    "able",
    "about",
    "above",
    "abroad",
    "according",
    "accordingly",
    "across",
    "actually",
    "adj",
    "after",
    "afterwards",
    "again",
    "against",
    "ago",
    "ahead",
    "ain't",
    "all",
    "allow",
    "allows",
    "almost",
    "alone",
    "along",
    "alongside",
    "already",
    "also",
    "although",
    "always",
    "am",
    "amid",
    "amidst",
    "among",
    "amongst",
    "amoungst",
    "amount",
    "an",
    "and",
    "another",
    "any",
    "anybody",
    "anyhow",
    "anyone",
    "anything",
    "anyway",
    "anyways",
    "anywhere",
    "apart",
    "appear",
    "appreciate",
    "appropriate",
    "are",
    "aren't",
    "around",
    "as",
    "aside",
    "ask",
    "asking",
    "associated",
    "at",
    "available",
    "away",
    "awfully",
    "b",
    "back",
    "backward",
    "backwards",
    "based",
    "be",
    "became",
    "because",
    "become",
    "becomes",
    "becoming",
    "been",
    "before",
    "beforehand",
    "begin",
    "behind",
    "being",
    "believe",
    "below",
    "beside",
    "besides",
    "best",
    "better",
    "between",
    "beyond",
    "bill",
    "both",
    "bottom",
    "brief",
    "bring",
    "but",
    "by",
    "c",
    "c'mon",
    "c's",
    "call",
    "came",
    "can",
    "can't",
    "cannot",
    "cant",
    "caption",
    "cause",
    "causes",
    "certain",
    "certainly",
    "changes",
    "clearly",
    "co",
    "co.",
    "com",
    "come",
    "comes",
    "computer",
    "con",
    "concerning",
    "consequently",
    "consider",
    "considering",
    "contain",
    "containing",
    "contains",
    "corresponding",
    "could",
    "couldn't",
    "couldnt",
    "course",
    "cry",
    "currently",
    "d",
    "dare",
    "daren't",
    "de",
    "definitely",
    "describe",
    "described",
    "despite",
    "detail",
    "did",
    "didn't",
    "different",
    "directly",
    "do",
    "does",
    "doesn't",
    "doing",
    "don't",
    "done",
    "down",
    "downwards",
    "due",
    "during",
    "e",
    "each",
    "edu",
    "eg",
    "eight",
    "eighty",
    "either",
    "eleven",
    "else",
    "elsewhere",
    "empty",
    "en",
    "end",
    "ending",
    "enough",
    "entirely",
    "especially",
    "et",
    "etc",
    "even",
    "ever",
    "evermore",
    "every",
    "everybody",
    "everyone",
    "everything",
    "everywhere",
    "ex",
    "exactly",
    "example",
    "except",
    "f",
    "fairly",
    "far",
    "farther",
    "few",
    "fewer",
    "fifteen",
    "fifth",
    "fify",
    "fill",
    "find",
    "fire",
    "first",
    "five",
    "followed",
    "following",
    "follows",
    "for",
    "forever",
    "former",
    "formerly",
    "forth",
    "forty",
    "forward",
    "found",
    "four",
    "from",
    "front",
    "full",
    "further",
    "furthermore",
    "g",
    "get",
    "gets",
    "getting",
    "give",
    "given",
    "gives",
    "go",
    "goes",
    "going",
    "gone",
    "got",
    "gotten",
    "greetings",
    "h",
    "had",
    "hadn't",
    "half",
    "happens",
    "hardly",
    "has",
    "hasn't",
    "hasnt",
    "have",
    "haven't",
    "having",
    "he",
    "he'd",
    "he'll",
    "he's",
    "hello",
    "help",
    "hence",
    "her",
    "here",
    "here's",
    "hereafter",
    "hereby",
    "herein",
    "hereupon",
    "hers",
    "herself",
    "hi",
    "him",
    "himself",
    "his",
    "hither",
    "hopefully",
    "how",
    "howbeit",
    "however",
    "href",
    "http",
    "hundred",
    "i",
    "i'd",
    "i'll",
    "i'm",
    "i've",
    "ie",
    "if",
    "ignored",
    "immediate",
    "in",
    "inasmuch",
    "inc",
    "inc.",
    "including",
    "indeed",
    "indicate",
    "indicated",
    "indicates",
    "inner",
    "inside",
    "insofar",
    "instead",
    "interest",
    "into",
    "inward",
    "is",
    "isn't",
    "it",
    "it'd",
    "it'll",
    "it's",
    "its",
    "itself",
    "j",
    "just",
    "k",
    "kb",
    "keep",
    "keeps",
    "kept",
    "know",
    "known",
    "knows",
    "l",
    "la",
    "last",
    "lately",
    "later",
    "latter",
    "latterly",
    "least",
    "less",
    "lest",
    "let",
    "let's",
    "like",
    "liked",
    "likely",
    "likewise",
    "little",
    "look",
    "looking",
    "looks",
    "low",
    "lower",
    "ltd",
    "m",
    "made",
    "mailto",
    "mainly",
    "make",
    "makes",
    "making",
    "many",
    "may",
    "maybe",
    "mayn't",
    "mb",
    "me",
    "mean",
    "means",
    "meantime",
    "meanwhile",
    "merely",
    "might",
    "mightn't",
    "mill",
    "mine",
    "minus",
    "miss",
    "more",
    "moreover",
    "most",
    "mostly",
    "move",
    "mr",
    "mrs",
    "much",
    "must",
    "mustn't",
    "my",
    "myself",
    "n",
    "name",
    "namely",
    "nd",
    "near",
    "nearly",
    "necessary",
    "need",
    "needn't",
    "needs",
    "neither",
    "never",
    "neverf",
    "neverless",
    "nevertheless",
    "new",
    "next",
    "nice",
    "nine",
    "ninety",
    "no",
    "no-one",
    "nobody",
    "non",
    "none",
    "nonetheless",
    "noone",
    "nor",
    "normally",
    "not",
    "nothing",
    "notwithstanding",
    "novel",
    "now",
    "nowhere",
    "o",
    "obviously",
    "of",
    "off",
    "often",
    "oh",
    "ok",
    "okay",
    "old",
    "on",
    "once",
    "one",
    "one's",
    "ones",
    "only",
    "onto",
    "opposite",
    "or",
    "org",
    "other",
    "others",
    "otherwise",
    "ought",
    "oughtn't",
    "our",
    "ours",
    "ourselves",
    "out",
    "outside",
    "over",
    "overall",
    "own",
    "p",
    "part",
    "particular",
    "particularly",
    "past",
    "per",
    "perhaps",
    "piece",
    "placed",
    "please",
    "plus",
    "possible",
    "presumably",
    "probably",
    "provided",
    "provides",
    "put",
    "q",
    "que",
    "quite",
    "qv",
    "r",
    "rather",
    "rd",
    "re",
    "really",
    "reasonably",
    "recent",
    "recently",
    "regarding",
    "regardless",
    "regards",
    "relatively",
    "respectively",
    "right",
    "round",
    "s",
    "said",
    "same",
    "saw",
    "say",
    "saying",
    "says",
    "second",
    "secondly",
    "see",
    "seeing",
    "seem",
    "seemed",
    "seeming",
    "seems",
    "seen",
    "self",
    "selves",
    "sensible",
    "sent",
    "serious",
    "seriously",
    "seven",
    "several",
    "shall",
    "shan't",
    "she",
    "she'd",
    "she'll",
    "she's",
    "should",
    "shouldn't",
    "show",
    "side",
    "since",
    "sincere",
    "single",
    "six",
    "sixty",
    "so",
    "some",
    "somebody",
    "someday",
    "somehow",
    "someone",
    "something",
    "sometime",
    "sometimes",
    "somewhat",
    "somewhere",
    "soon",
    "sorry",
    "specified",
    "specify",
    "specifying",
    "still",
    "stuff",
    "sub",
    "such",
    "sup",
    "sure",
    "system",
    "t",
    "t's",
    "take",
    "taken",
    "taking",
    "tell",
    "ten",
    "tends",
    "th",
    "than",
    "thank",
    "thanks",
    "thanx",
    "that",
    "that'll",
    "that's",
    "that've",
    "thats",
    "the",
    "their",
    "theirs",
    "them",
    "themselves",
    "then",
    "thence",
    "there",
    "there'd",
    "there'll",
    "there're",
    "there's",
    "there've",
    "thereafter",
    "thereby",
    "therefore",
    "therein",
    "theres",
    "thereupon",
    "these",
    "they",
    "they'd",
    "they'll",
    "they're",
    "they've",
    "thick",
    "thin",
    "thing",
    "things",
    "think",
    "third",
    "thirty",
    "this",
    "thorough",
    "thoroughly",
    "those",
    "though",
    "three",
    "through",
    "throughout",
    "thru",
    "thus",
    "till",
    "to",
    "together",
    "too",
    "took",
    "top",
    "toward",
    "towards",
    "tried",
    "tries",
    "truly",
    "try",
    "trying",
    "twelve",
    "twenty",
    "twice",
    "two",
    "u",
    "un",
    "und",
    "under",
    "underneath",
    "undoing",
    "unfortunately",
    "unless",
    "unlike",
    "unlikely",
    "until",
    "unto",
    "up",
    "upon",
    "upwards",
    "us",
    "use",
    "used",
    "useful",
    "uses",
    "using",
    "usual",
    "usually",
    "v",
    "value",
    "various",
    "ve",
    "versus",
    "very",
    "via",
    "viz",
    "vs",
    "w",
    "want",
    "wants",
    "was",
    "wasn't",
    "way",
    "we",
    "we'd",
    "we'll",
    "we're",
    "we've",
    "welcome",
    "well",
    "went",
    "were",
    "weren't",
    "what",
    "what'll",
    "what's",
    "what've",
    "whatever",
    "when",
    "whence",
    "whenever",
    "where",
    "where's",
    "whereafter",
    "whereas",
    "whereby",
    "wherein",
    "whereupon",
    "wherever",
    "whether",
    "which",
    "whichever",
    "while",
    "whilst",
    "whither",
    "who",
    "who'd",
    "who'll",
    "who's",
    "whoever",
    "whole",
    "whom",
    "whomever",
    "whose",
    "why",
    "will",
    "willing",
    "wish",
    "with",
    "within",
    "without",
    "won't",
    "wonder",
    "would",
    "wouldn't",
    "www",
    "x",
    "y",
    "yes",
    "yet",
    "you",
    "you'd",
    "you'll",
    "you're",
    "you've",
    "your",
    "yours",
    "yourself",
    "yourselves",
    "z",
    "zero"
    };
    int top = sizeof list / sizeof list[0] - 1;
    int bottom = 0;
    int guess = top / 2;
    int diff = 0;
    if (s != NULL) {
    while (!found && !giveup) {
    diff = i_strcmp(list[guess], s);
    if (0 == diff) {
    found = 1;
    } else if (0 < diff) {
    top = guess - 1;
    } else {
    bottom = guess + 1;
    }
    if (top < bottom) {
    giveup = 1;
    } else {
    guess = (top + bottom) / 2;
    }
    }
    }
    return found;
    }

    /*
    * Argh! We can't use fgets()! It's not discussed until page 164.
    * Oh well... time to roll our own again...
    */
    char *GetLine(char *s, int n, FILE * fp)
    {
    int c = 0;
    int done = 0;
    char *p = s;
    while (!done && --n > 0 && (c = getc(fp)) != EOF) {
    if ((*p++ = c) == '\n') {
    done = 1;
    }
    }
    *p = '\0';
    if (EOF == c && p == s) {
    p = NULL;
    } else {
    p = s;
    }
    return p;
    }

    /*
    * Ideally, we'd use a clever GetLine function which expanded its
    * buffer dynamically to cope with large lines. Since we can't use
    * realloc, and because other solutions would require quite hefty
    * engineering, we'll adopt a simple solution - a big buffer.
    *
    * Note: making the buffer static will help matters on some
    * primitive systems which don't reserve much storage for
    * automatic variables, and shouldn't break anything anywhere.
    *
    */
    #define MAXLINE 8192
    int main1(int argc, char **argv)
    {
    static char buffer[MAXLINE] =
    {0};
    char *s = NULL;
    char *word = NULL;
    int line = 0;
    int giveup = 0;
    struct wordtree *tree = NULL;
    char *delims = " \t\n\r\a\f\v!\"%^&*()_=+{}[]\
    \|/,.<>:;#~?";
    FILE *fin = stdin;
    FILE *fout = stdout;
    if (argc > 1)
    fin = theFileIn(argv[1]);
    if (argc > 2)
    fout = theFileOut(argv[2]);

    while (!giveup && GetLine(buffer, sizeof buffer, fin) != NULL) {
    ++line;
    s = buffer;
    while (!giveup && (word = tokenise(&s, delims)) != NULL) {
    if (!NoiseWord(word)) {
    if (NULL == addword(&tree, word, line)) {
    fprintf(fout,"Error adding data into memory.
    Giving up.\n");
    giveup = 1;
    }
    }
    }
    }
    if (!giveup) {
    fprintf(fout,"%18s Line Numbers\n", "Word");
    printtree(tree, fout);
    }
    deleteword(&tree);
    return 0;
    }

    /* K&R 6-1: "Our version of getword() does not properly handle
    underscores, string constants, or preprocessor control lines.
    Write a better version."
    This is intended to be a solution to K&R 6-1 in "category 0" as
    defined by the official rules given on Richard Heathfield's "The C
    Programming Language Answers To Exercises" page, found at
    http://users.powernet.co.uk/eton/kandr2/index.html.
    For more information on the language for which this is a lexical
    analyzer, please see the comment preceding getword() below.
    Note that there is a small modification to my_ungetch() as defined
    by
    K&R. Hopefully this lies within the rules. */
    /* knr61.c - answer to K&R2 exercise 6-1.
    Copyright (C) 2000 Ben Pfaff <>.
    This program is free software; you can redistribute it and/or
    modify it under the terms of the GNU General Public License as
    published by the Free Software Foundation; either version 2 of the
    License, or (at your option) any later version.
    This program is distributed in the hope that it will be useful, but
    WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
    General Public License for more details.
    You should have received a copy of the GNU General Public License
    along with this program; if not, write to the Free Software
    Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
    02111-1307, USA. */
    /* Tokens. Other non-whitespace characters self-represent themselves
    as tokens. */

    /* Main program for testing. */
    int main2(int argc, char **argv)
    {
    FILE *fin = stdin;
    FILE *fout = stdout;
    if (argc > 1)
    fin = theFileIn(argv[1]);
    if (argc > 2)
    fout = theFileOut(argv[2]);
    my_ungetch('\n');
    for (;;) {
    char word[64];
    enum token token;
    /* Get token. */
    token = getword(word, sizeof word, fin);
    /* Print token type. */
    switch (token) {
    case TOK_ID:
    fprintf(fout,"id");
    break;
    case TOK_STRING:
    fprintf(fout,"string");
    break;
    case TOK_CHAR:
    fprintf(fout,"char");
    break;
    case TOK_EOF:
    fprintf(fout,"eof\n");
    return 0;
    default:
    fprintf(fout,"other");
    word[0] = token;
    word[1] = '\0';
    break;
    }
    /* Print token value more or less unambiguously. */
    {
    const unsigned char *s;
    fprintf(fout,"\t'");
    for (s = word; *s != '\0'; s++)
    if (isprint(*s) && *s != '\'')
    fputc(*s, fout);
    else if (*s == '\'')
    fprintf(fout,"\\'");
    else
    /* Potentially wrong. */
    fprintf(fout,"\\x%02x", *s);
    fprintf(fout,"'\n");
    }
    }
    }

    /* Parses C-like tokens from fin:
    - Parses C identifiers and string and character constants.
    - Other characters, such as operators, punctuation, and digits
    not part of identifiers are considered as tokens in
    themselves.
    - Skip comments and preprocessor control lines.
    Does not handle trigraphs, line continuation with \, or numerous
    other special C features.
    Returns a token type. This is either one of TOK_* above, or a
    single
    character in the range 0...UCHAR_MAX.
    If TOK_ID, TOK_STRING, or TOK_CHAR is returned, WORD[] is filled
    with the identifier or string value, truncated at LIM - 1
    characters and terminated with '\0'.
    For other returned token types, WORD[] is indeterminate. */

    enum token getword(char *word, int lim, FILE *fin)
    {
    int beg_line,
    c;
    for (;;) {
    beg_line = skipws(fin);
    c = my_getch(fin);
    if (!beg_line || c != '#')
    break;
    /* Skip preprocessor directive. */
    do {
    c = my_getch(fin);
    if (c == EOF)
    return TOK_EOF;
    }
    while (c != '\n');
    my_ungetch('\n');
    }
    if (c == EOF)
    return TOK_EOF;
    else if (c == '_' || isalpha((unsigned char) c)) {
    do {
    my_putch(&word, &lim, c);
    c = my_getch(fin);
    }
    while (isalnum((unsigned char) c) || c == '_');
    my_ungetch(c);
    return TOK_ID;
    } else if (c == '\'' || c == '"') {
    int quote = c;
    word[0] = '\0';
    while (getstelem(&word, &lim, quote, fin));
    return quote == '\'' ? TOK_CHAR : TOK_STRING;
    } else
    return (unsigned char) c;
    }

    /* Skips whitespace and comments read from fin.
    Returns nonzero if a newline was encountered, indicating that we're
    at the beginning of a line. */

    static int skipws(FILE *fin)
    {
    /* Either 0, if we're not inside a comment, or CLS_IN_CMT, if we
    are
    * inside a comment. */
    enum class in_comment = 0;
    /* Have we encountered a newline outside a comment? */
    int beg_line = 0;
    for (;;) {
    int c; /* Input character. */
    enum class class; /* Classification of `c'. */
    /* Get an input character and determine its classification. */
    c = my_getch(fin);
    switch (c) {
    case '\n':
    if (!in_comment)
    beg_line = 1;
    /* Fall through. */
    case ' ':
    case '\f':
    case '\r':
    case '\t':
    case '\v':
    class = CLS_WS;
    break;
    case '/':
    /* Outside a comment, slash-star begins a comment. */
    if (!in_comment) {
    c = my_getch(fin);
    if (c == '*')
    class = CLS_BEG_CMT;
    else {
    my_ungetch(c);
    c = '/';
    class = CLS_OTHER;
    }
    } else
    class = CLS_OTHER;
    break;
    case '*':
    /* Inside a comment, star-slash ends the comment. */
    if (in_comment) {
    c = my_getch(fin);
    if (c == '/')
    class = CLS_END_CMT;
    else {
    my_ungetch(c);
    class = CLS_OTHER;
    }
    } else
    class = CLS_OTHER;
    break;
    default:
    /* Other characters. */
    if (c == EOF)
    return 0;
    class = CLS_OTHER;
    }
    /* Handle character `c' according to its classification and
    whether
    * we're inside a comment. */
    switch (class |in_comment) {
    case CLS_WS:
    case CLS_WS | CLS_IN_CMT:
    case CLS_OTHER | CLS_IN_CMT:
    break;
    case CLS_BEG_CMT:
    in_comment = CLS_IN_CMT;
    break;
    case CLS_OTHER:
    my_ungetch(c);
    return beg_line;
    case CLS_END_CMT | CLS_IN_CMT:
    in_comment = 0;
    break;
    case CLS_BEG_CMT | CLS_IN_CMT:
    case CLS_END_CMT:
    default:
    fprintf(stderr,"can't happen\n");
    break;
    }
    }
    }

    /* Get a character inside a quoted string or character constant.
    QUOTE is ' for a character constant or " for a quoted string.
    *WORDP points to a string being constructed that has *LIMP bytes
    available. */
    static int getstelem(char **wordp, int *limp, int quote, FILE
    *fin)
    {
    int c;
    /* Handle end-of-quote and EOF. */
    c = my_getch(fin);
    if (c == quote || c == EOF)
    return 0;
    /* Handle ordinary string characters. */
    if (c != '\\') {
    my_putch(wordp, limp, c);
    return 1;
    }
    /* We're in a \ escape sequence. Get the second character. */
    c = my_getch(fin);
    if (c == EOF)
    return 0;
    /* Handle simple single-character escapes. */
    {
    static const char escapes[] =
    {"''??\"\"\\\\a\ab\bf\fn\nr\rt\tv\v"};
    const char *cp = strchr(escapes, c);
    if (cp != NULL) {
    my_putch(wordp, limp, cp[1]);
    return 1;
    }
    }
    /* Handle hexadecimal and octal escapes. This also handles invalid
    * escapes by default, doing nothing useful with them. That's okay
    * because invalid escapes generate undefined behavior. */
    {
    unsigned char v = 0;
    if (c == 'x' || c == 'X')
    for (;;) {
    static const char hexits[] = "0123456789abcdef";
    const char *p;
    c = my_getch(fin);
    p = strchr(hexits, tolower((unsigned char) c));
    if (p == NULL)
    break;
    v = v * 16 + (p - hexits);
    }
    else {
    int i;
    for (i = 0; i < 3; i++) {
    v = v * 8 + (c - '0');
    c = my_getch(fin);
    if (c < '0' || c > '7')
    break;
    }
    }
    my_putch(wordp, limp, v);
    my_ungetch(c);
    }
    return 1;
    }

    /* Capacity of putback buffer. */
    #define BUFSIZE 100
    /* Putback buffer. */
    char buf[BUFSIZE];
    /* Number of characters in putback buffer. */
    int bufp = 0;
    /* Retrieves and returns a character from fin or from the putback
    buffer.
    Returns EOF if end of file is encountered. */
    int my_getch(FILE *fin)
    {
    return bufp > 0 ? buf[--bufp] : getc(fin);
    }

    /* Stuffs character C into the putback buffer.
    From the caller's perspective, fails silently if the putback buffer
    is full. */

    static void my_ungetch(int c)
    {
    if (c == EOF)
    return;
    if (bufp >= BUFSIZE)
    fprintf(stderr,"my_ungetch: too many characters\n");
    else
    buf[bufp++] = c;
    }

    /* Stuffs character C into buffer *WORDP, which has *LIMP bytes
    available.
    Advances *WORDP and reduces *LIMP as appropriate.
    Drops the character on the floor if it would overflow the buffer.
    Ensures that *WORDP is null terminated if possible. */

    static void my_putch(char **wordp, int *limp, int c)
    {
    if (*limp > 1) {
    *(*wordp)++ = c;
    (*limp)--;
    }
    if (*limp > 0)
    **wordp = '\0';
    }

    int main(int argc, char **argv)
    {
    main0(argc, argv);
    fflush(NULL);
    main1(argc, argv);
    fflush(NULL);
    main2(argc, argv);
    fflush(NULL);
    return 0;
    }
    user923005, May 3, 2008
    #5
  6. arnuld

    user923005 Guest

    On May 2, 8:45 pm, user923005 <> wrote:
    > Sample for Arnuld to compare:
    > Here are the exercises from Chapter 6 (except the hash table stuff
    > which seems out of place to me) as purloined from Richard Heathfield's
    > K&R2 solution site, and then made to work with arbitrary files, and
    > then the missing exercise was added and then I made some more icky
    > hacks and called it good.  That didn't make it good, but that's what I
    > called it.

    [snip]
    It also needs this header called ch6.h :

    #ifndef KR2_CH6_PROTOTYPES_DEFINED
    #define KR2_CH6_PROTOTYPES_DEFINED

    #include <assert.h>
    #include <ctype.h>
    #include <limits.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>


    typedef struct WORD {
    char *Word;
    size_t Count;
    struct WORD *Left;
    struct WORD *Right;
    } WORD;

    struct linelist {
    struct linelist *next;
    int line;
    };
    struct wordtree {
    char *word;
    struct linelist *firstline;
    struct wordtree *left;
    struct wordtree *right;
    };


    enum token {
    TOK_ID = UCHAR_MAX + 1, /* Identifier. */
    TOK_STRING, /* String constant. */
    TOK_CHAR, /* Character constant. */
    TOK_EOF /* End of file. */
    };

    /* Classification of an input character. */
    enum class {
    CLS_WS = 0, /* Whitespace. */
    CLS_BEG_CMT, /* Slash-star beginning a comment. */
    CLS_END_CMT, /* Star-slash ending a comment. */
    CLS_OTHER, /* None of the above. */
    CLS_IN_CMT = 4 /* Combined with one of the above,
    indicates
    * we're inside a comment. */
    };

    #define SUCCESS 0
    #define CANNOT_MALLOC_WORDARRAY 1
    #define NO_WORDS_ON_INPUT 2
    #define NO_MEMORY_FOR_WORDNODE 3
    #define NO_MEMORY_FOR_WORD 4


    extern char *GetLine(char *, int, FILE *);
    extern char *char_in_string(char *, int);
    extern char *dupstr(char *);
    extern char *tokenise(char **, char *);

    extern int AddToTree(struct WORD **, size_t *, char *);
    extern int CompareCounts(const void *, const void *);
    extern int CompareWords(const void *, const void *);
    extern int CompareWordStems(const void *, const void *);
    extern int NoiseWord(char *);
    extern int OutputWords(FILE *, size_t, struct WORD **);
    extern int OutputStemmedWords(FILE *, size_t, struct WORD **);
    extern int ReadInputToTree(struct WORD **, size_t *, FILE *);
    extern int WalkTree(struct WORD **, struct WORD *);
    extern int i_strcmp(const char *, const char *);
    extern int i_strncmp(const char *s, const char *t, int n);

    extern enum token getword(char *, int, FILE *);
    extern struct linelist *addlink(int);
    extern struct wordtree *addword(struct wordtree **, char *, int);

    extern void FreeTree(struct WORD *);
    extern void deletelist(struct linelist *);
    extern void deleteword(struct wordtree **);
    extern void printlist(struct linelist *, FILE *);
    extern void printtree(struct wordtree *, FILE *);

    #endif /* KR2_CH6_PROTOTYPES_DEFINED */
    user923005, May 3, 2008
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. murali@pune

    doubly linked list

    murali@pune, Mar 23, 2006, in forum: Java
    Replies:
    3
    Views:
    924
    bugbear
    Mar 24, 2006
  2. darth
    Replies:
    0
    Views:
    454
    darth
    Apr 30, 2004
  3. chand
    Replies:
    7
    Views:
    321
    Terry Reedy
    Sep 5, 2005
  4. Daniel
    Replies:
    5
    Views:
    383
  5. Replies:
    2
    Views:
    995
    Ivan Vecerina
    May 3, 2006
Loading...

Share This Page