B
bolega
This braggart admits that he had to put this code in TWO books and
visit it twice to be explained. I am puting the excerpt from pp2-4 of
this book and the C code. The C code will become indented and syntax
highlighted once you paste in emacs etc. It is my belief and
observation on a lot of problems by these so called "demi gods" that
they are actually all average and no more intelligent. Its all that
they got some opportunities to study some things at opportune time and
facilities and also spent a lot of time in a productive environment
and team.
I know that lisp eval is written more clear than this recursion below
because I am able to read it easily. and that code is almost self
explanatory. C is more quirky. When you really mean recursively call
another function, you are using return so you can have tail
recursion ???? .
Anyway, its your chance to show how clear C/C++/lisp/Scheme code you
can write that is clearer. Also, i dont exclude pseudocode but it
should be clear enough to be instantly translatable to a programming
language. The real goal is to learn how to write or DERIVE recursion,
how to enumerate cases, order them, and build recursion. You may even
put some introductory tuturial and dont have to reply here in ascii
but can upload a pdf at some link in rapidshare etc. Look how he put
the code after problem statement and then has the need to explain it.
Ideally, the discussion should be such that the student or reader
himself jumps to the solution. That is why I give these unix school of
pike/thomson/kernighan low grade in almost all their expositions
except that they wrote earliest books to make millions of dollars in
royalties and since now they are nobody due to linux, they are poorly
regurgitating old material.
Enjoy .............
============================
The Practice of Programming
In 1998, Rob Pike and I were writing The Practice of Programming
(Addison-Wesley). The
last chapter of the book, “Notation,” collected a number of examples
where good notation
led to better programs and better programming. This included the use
of simple data specifications
(printf, for instance), and the generation of code from tables.
Because of our Unix backgrounds and nearly 30 years of experience with
tools based on
regular expression notation, we naturally wanted to include a
discussion of regular
expressions, and it seemed mandatory to include an implementation as
well. Given our
emphasis on tools, it also seemed best to focus on the class of
regular expressions found in
grep—rather than, say, those from shell wildcards—since we could also
then talk about the
design of grep itself.
The problem was that any existing regular expression package was far
too big. The local
grep was over 500 lines long (about 10 book pages) and encrusted with
barnacles. Open
source regular expression packages tended to be huge—roughly the size
of the entire
book—because they were engineered for generality, flexibility, and
speed; none were
remotely suitable for pedagogy.
I suggested to Rob that we find the smallest regular expression
package that would illustrate
the basic ideas while still recognizing a useful and nontrivial class
of patterns. Ideally,
the code would fit on a single page.
Rob disappeared into his office. As I remember it now, he emerged in
no more than an
hour or two with the 30 lines of C code that subsequently appeared in
Chapter 9 of The
Practice of Programming. That code implements a regular expression
matcher that handles
the following constructs.
Character Meaning
c Matches any literal character c.
.. (period) Matches any single character.
^ Matches the beginning of the input string.
$ Matches the end of the input string.
* Matches zero or more occurrences of the previous character.
This is quite a useful class; in my own experience of using regular
expressions on a day-today
basis, it easily accounts for 95 percent of all instances. In many
situations, solving the
right problem is a big step toward creating a beautiful program. Rob
deserves great credit
for choosing a very small yet important, well-defined, and extensible
set of features from
among a wide set of options.
Rob’s implementation itself is a superb example of beautiful code:
compact, elegant,
efficient, and useful. It’s one of the best examples of recursion that
I have ever seen, and it
shows the power of C pointers. Although at the time we were most
interested in conveying
the important role of good notation in making a program easier to use
(and perhaps
easier to write as well), the regular expression code has also been an
excellent way to
illustrate algorithms, data structures, testing, performance
enhancement, and other
important topics.
Implementation
In The Practice of Programming, the regular expression matcher is part
of a standalone program
that mimics grep, but the regular expression code is completely
separable from its
surroundings. The main program is not interesting here; like many Unix
tools, it reads
either its standard input or a sequence of files, and prints those
lines that contain a match
of the regular expression.
This is the matching code:
/* match: search for regexp anywhere in text */
int match(char *regexp, char *text)
{
if (regexp[0] == '^')
return matchhere(regexp+1, text);
do { /* must look even if string is empty */
if (matchhere(regexp, text))
return 1;
} while (*text++ != '\0');
return 0;
}
/* matchhere: search for regexp at beginning of text */
int matchhere(char *regexp, char *text)
{
if (regexp[0] == '\0')
return 1;
if (regexp[1] == '*')
return matchstar(regexp[0], regexp+2, text);
Character Meaning
c Matches any literal character c.
.. (period) Matches any single character.
^ Matches the beginning of the input string.
$ Matches the end of the input string.
* Matches zero or more occurrences of the previous character.
4 C H A P T E R O N E
if (regexp[0] == '$' && regexp[1] == '\0')
return *text == '\0';
if (*text!='\0' && (regexp[0]=='.' || regexp[0]==*text))
return matchhere(regexp+1, text+1);
return 0;
}
/* matchstar: search for c*regexp at beginning of text */
int matchstar(int c, char *regexp, char *text)
{
do { /* a * matches zero or more instances */
if (matchhere(regexp, text))
return 1;
} while (*text != '\0' && (*text++ == c || c == '.'));
return 0;
}
visit it twice to be explained. I am puting the excerpt from pp2-4 of
this book and the C code. The C code will become indented and syntax
highlighted once you paste in emacs etc. It is my belief and
observation on a lot of problems by these so called "demi gods" that
they are actually all average and no more intelligent. Its all that
they got some opportunities to study some things at opportune time and
facilities and also spent a lot of time in a productive environment
and team.
I know that lisp eval is written more clear than this recursion below
because I am able to read it easily. and that code is almost self
explanatory. C is more quirky. When you really mean recursively call
another function, you are using return so you can have tail
recursion ???? .
Anyway, its your chance to show how clear C/C++/lisp/Scheme code you
can write that is clearer. Also, i dont exclude pseudocode but it
should be clear enough to be instantly translatable to a programming
language. The real goal is to learn how to write or DERIVE recursion,
how to enumerate cases, order them, and build recursion. You may even
put some introductory tuturial and dont have to reply here in ascii
but can upload a pdf at some link in rapidshare etc. Look how he put
the code after problem statement and then has the need to explain it.
Ideally, the discussion should be such that the student or reader
himself jumps to the solution. That is why I give these unix school of
pike/thomson/kernighan low grade in almost all their expositions
except that they wrote earliest books to make millions of dollars in
royalties and since now they are nobody due to linux, they are poorly
regurgitating old material.
Enjoy .............
============================
The Practice of Programming
In 1998, Rob Pike and I were writing The Practice of Programming
(Addison-Wesley). The
last chapter of the book, “Notation,” collected a number of examples
where good notation
led to better programs and better programming. This included the use
of simple data specifications
(printf, for instance), and the generation of code from tables.
Because of our Unix backgrounds and nearly 30 years of experience with
tools based on
regular expression notation, we naturally wanted to include a
discussion of regular
expressions, and it seemed mandatory to include an implementation as
well. Given our
emphasis on tools, it also seemed best to focus on the class of
regular expressions found in
grep—rather than, say, those from shell wildcards—since we could also
then talk about the
design of grep itself.
The problem was that any existing regular expression package was far
too big. The local
grep was over 500 lines long (about 10 book pages) and encrusted with
barnacles. Open
source regular expression packages tended to be huge—roughly the size
of the entire
book—because they were engineered for generality, flexibility, and
speed; none were
remotely suitable for pedagogy.
I suggested to Rob that we find the smallest regular expression
package that would illustrate
the basic ideas while still recognizing a useful and nontrivial class
of patterns. Ideally,
the code would fit on a single page.
Rob disappeared into his office. As I remember it now, he emerged in
no more than an
hour or two with the 30 lines of C code that subsequently appeared in
Chapter 9 of The
Practice of Programming. That code implements a regular expression
matcher that handles
the following constructs.
Character Meaning
c Matches any literal character c.
.. (period) Matches any single character.
^ Matches the beginning of the input string.
$ Matches the end of the input string.
* Matches zero or more occurrences of the previous character.
This is quite a useful class; in my own experience of using regular
expressions on a day-today
basis, it easily accounts for 95 percent of all instances. In many
situations, solving the
right problem is a big step toward creating a beautiful program. Rob
deserves great credit
for choosing a very small yet important, well-defined, and extensible
set of features from
among a wide set of options.
Rob’s implementation itself is a superb example of beautiful code:
compact, elegant,
efficient, and useful. It’s one of the best examples of recursion that
I have ever seen, and it
shows the power of C pointers. Although at the time we were most
interested in conveying
the important role of good notation in making a program easier to use
(and perhaps
easier to write as well), the regular expression code has also been an
excellent way to
illustrate algorithms, data structures, testing, performance
enhancement, and other
important topics.
Implementation
In The Practice of Programming, the regular expression matcher is part
of a standalone program
that mimics grep, but the regular expression code is completely
separable from its
surroundings. The main program is not interesting here; like many Unix
tools, it reads
either its standard input or a sequence of files, and prints those
lines that contain a match
of the regular expression.
This is the matching code:
/* match: search for regexp anywhere in text */
int match(char *regexp, char *text)
{
if (regexp[0] == '^')
return matchhere(regexp+1, text);
do { /* must look even if string is empty */
if (matchhere(regexp, text))
return 1;
} while (*text++ != '\0');
return 0;
}
/* matchhere: search for regexp at beginning of text */
int matchhere(char *regexp, char *text)
{
if (regexp[0] == '\0')
return 1;
if (regexp[1] == '*')
return matchstar(regexp[0], regexp+2, text);
Character Meaning
c Matches any literal character c.
.. (period) Matches any single character.
^ Matches the beginning of the input string.
$ Matches the end of the input string.
* Matches zero or more occurrences of the previous character.
4 C H A P T E R O N E
if (regexp[0] == '$' && regexp[1] == '\0')
return *text == '\0';
if (*text!='\0' && (regexp[0]=='.' || regexp[0]==*text))
return matchhere(regexp+1, text+1);
return 0;
}
/* matchstar: search for c*regexp at beginning of text */
int matchstar(int c, char *regexp, char *text)
{
do { /* a * matches zero or more instances */
if (matchhere(regexp, text))
return 1;
} while (*text != '\0' && (*text++ == c || c == '.'));
return 0;
}