# Radix Tree

Discussion in 'C Programming' started by Foodbank, Nov 7, 2005.

1. ### FoodbankGuest

Hi,

I'm currently teaching myself C data structures in preparation for a
job transition and I've just passed the point in which radix sorting
was covered in my book. I've recently finished a radix sorting program
and was surprised by its speed. Therefore, I'd like help with a radix
tree program in order to compare it with a prior binary tree program
that I finished. The parts I'm having trouble with are collecting
statistics and inserting words into the tree (as commented in the
program below).

The purpose of the program is to scan the string from a text document
and provide various counts, what the deepest level of the tree is, etc.
(you'll see in my variables below, also it performs the same counts as
my previous binary tree exercise). Also, the way the way that the text
document is inputed into the program is on the command line under
Cygwin (UNIX emulator). I am not planning on using fopen in this one,
as I didn't in my previous binary tree program.

Thanks for your help,
James

Code (Text):

#include <stdio.h>
#include <math.h>
#include <malloc.h>
#include <strings.h>
#define     MAXLEN     100          // Maximum length
#define     TWID     27          // The tree branching width
#define     END     (TWID-1)                     // The code for a non
letter is 26
struct node {               // The tree node structure
int     count;
struct node *np[TWID];
} root;

// these will be the stats to collect
int nodes_used, total_words, unique_words, most_frequent_count,
deepest_level;
char most_frequent_word[MAXLEN];
char deepest_word[MAXLEN];

// function prototypes
void insert(struct node *, char *);
void treewalk(struct node *);
int get_word(char *);

main(int argc, char *argv[]) {
char word_buff[MAXLEN];     // the reusable word buffer
while(get_word(word_buff))
insert(&root, word_buff);
treewalk(&root);
printf("the root count is %d\n", root.count);
printf("the tree contains %d nodes\n", nodes_used);
printf("there are %d unique words out of %d total words\n",
unique_words, total_words);
printf("most frequent word <%s> appears %d times\n",
most_frequent_word, most_frequent_count);
printf("the deepest word <%s> is at level %d\n",
deepest_word, deepest_level);
}

void insert(struct node *np, char *word) {
char c;
/* ***** CODE TO INSERT A WORD IN THE TREE ***** */
}

char current[MAXLEN];              // keep track of the current word
during recursion
int treewalk_level = -1;     // keep track of the recursion depth
void treewalk(struct node *np) {
int i, c;
/* ***** CODE TO WALK THE TREE TO COLLECT THE STATS ***** */

}

#include <ctype.h>
//tells what a string is and ignores numbers
int get_word(char *s) {
int c;
do {
c = getchar();
if(c == EOF)
return(0);
} while(!isalpha(c));
do {
if(isupper(c))
c = tolower(c);
*s++ = c;
c = getchar();
} while(isalpha(c));
*s = 0;
return(1);
}

Foodbank, Nov 7, 2005

2. ### Michael MairGuest

Foodbank wrote:
> Hi,
>
> I'm currently teaching myself C data structures in preparation for a
> job transition and I've just passed the point in which radix sorting
> was covered in my book. I've recently finished a radix sorting program
> and was surprised by its speed. Therefore, I'd like help with a radix
> tree program in order to compare it with a prior binary tree program
> that I finished. The parts I'm having trouble with are collecting
> statistics and inserting words into the tree (as commented in the
> program below).
>
> The purpose of the program is to scan the string from a text document
> and provide various counts, what the deepest level of the tree is, etc.
> (you'll see in my variables below, also it performs the same counts as
> my previous binary tree exercise). Also, the way the way that the text
> document is inputed into the program is on the command line under
> Cygwin (UNIX emulator). I am not planning on using fopen in this one,
> as I didn't in my previous binary tree program.

>
Code (Text):

> #include <stdio.h>
> #include <math.h>
> #include <malloc.h>
> #include <strings.h>
> #define     MAXLEN     100          // Maximum length
> #define     TWID     27          // The tree branching width
> #define     END     (TWID-1)                     // The code for a non
> letter is 26[/color]

Note: Even if you use C99 or higher which permits C++ style comments,
using // is a bad idea for code posted to newsgroups as is nicely
illustrated by the unintended line break above.
[color=blue]
> struct node {               // The tree node structure
>      int     count;[/color]

Sizes of any kind sometimes are better represented by size_t.
[color=blue]
>      struct node *np[TWID];
> } root;
>
> // these will be the stats to collect
> int nodes_used, total_words, unique_words, most_frequent_count,
> deepest_level;
> char most_frequent_word[MAXLEN];
> char deepest_word[MAXLEN];[/color]

Note: I'd rather collect stats in a structure type passed by reference
to some statistics function instead of using a wild collection of
variables with external or internal linkage.
[color=blue]
>
> // function prototypes
> void insert(struct node *, char *);
> void treewalk(struct node *);[/color]

Here, you could pass your stats structure instead;
void treewalk(struct node *pRoot, struct stats *pStats);
In addition, I'd introduce a printStats() function as well.
[color=blue]
> int get_word(char *);[/color]

Note: Use _full_ prototypes. This enables your compiler to
warn you about different parameter names which may remind/tell
you of a forgotten/unknown change in semantics.
[color=blue]
>
> main(int argc, char *argv[]) {[/color]

If you use C99, it is
int main (int argc, char *argv[])
Note that this is equivalent to
int main (int argc, char **argv)
which is potentially less misleading w.r.t the type of argv.
[color=blue]
>      char word_buff[MAXLEN];     // the reusable word buffer
>      while(get_word(word_buff))
>           insert(&root, word_buff);
>      treewalk(&root);
>      printf("the root count is %d\n", root.count);
>      printf("the tree contains %d nodes\n", nodes_used);
>      printf("there are %d unique words out of %d total words\n",
>           unique_words, total_words);
>      printf("most frequent word <%s> appears %d times\n",
>           most_frequent_word, most_frequent_count);
>      printf("the deepest word <%s> is at level %d\n",
>           deepest_word, deepest_level);[/color]

You forgot to destroy the tree.
This is not a nice-to-have but a must.

return 0;[color=blue]
> }
>
> void insert(struct node *np, char *word) {
>      char c;
>      /* ***** CODE TO INSERT A WORD IN THE TREE ***** */[/color]

comp.lang.c is a _C_ newsgroup, not an algorithms & data structures
newsgroup.
If you want to know the basic algorithm, ask in comp.programming or
similar. If you want a code review or have problems with the actual
implementation, ask here.

You probably will want to use malloc() here. Returning something
to indicate whether an error occurred or not is a good idea, too.
[color=blue]
> }
>
> char current[MAXLEN];              // keep track of the current word
> during recursion
> int treewalk_level = -1;     // keep track of the recursion depth
> void treewalk(struct node *np) {
>      int i, c;
>      /* ***** CODE TO WALK THE TREE TO COLLECT THE STATS ***** */[/color]

The same as above apart from the return type.[color=blue]
>
>
> }
>
> #include <ctype.h>
> //tells what a string is and ignores numbers
> int get_word(char *s) {[/color]

In addition, I would also pass a "size_t maxcount"
parameter, to make sure that you cannot have a buffer overflow.[color=blue]
>      int c;[/color]

if (maxcount == 0)
return NOTHING_READ; /* To be defined */[color=blue]
>      do {
>           c = getchar();
>           if(c == EOF)
>                return(0);
>      } while(!isalpha(c));
>      do {
>           if(isupper(c))
>                c = tolower(c);
>           *s++ = c;
>           c = getchar();
>      } while(isalpha(c));[/color]

With maxcount, depending on the size passed:
while (isalpha(c) && --maxcount > 0);[color=blue]
>      *s = 0;
>      return(1);[/color]

BTW: return is no function, so
return 1;
will do fine.
[color=blue]
> }
>

Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

Michael Mair, Nov 7, 2005

## Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.