Can anyone write this recursion for simple regexp more beautifullyand clearly than the braggarts

K

Kenny McCormack

spinoza1111 said:
Trolls don't admit they are wrong to their worst enemies, but
Heathfield, who knows C if little else that I can discern, did prove
to me (along with my own experiments) that all parameters in C are
really and truly call be value, period, and as soon as I realized it,
I admitted it. I still have a problem with the fact that I had to beat
the answer out of him as well as mess with C.

A few comments:
1) That they totally mis-use the word "troll" is, of course, legion. It
wouldn't be such a big deal if it weren't for how obsessively anal they
are about words here. I.e., the hypocrisy is self-evident.

1a) It has certainly become common usage over the past decade or so to
use the word "troll" as a synonym for "people we don't like" (where
"we" is interpreted to mean "the Establishment"). This makes it no
less wrong, of course, but people would be more inclined to let it
pass if it weren't for the "clc personality type" (described above).

2) I have to admit also that Dicky was right about C's parameter passing
being exclusively call by value. I will further say that this fact
always impressed me - that everything is "by value". That is, in the
context of the minimalist/portable-assembler kind of language that C is.

2a) Nonetheless, that fact is that, in all but the most obscuirely
CLC-sense, it is true that C implements call-by-reference via pointers.
To argue otherwise, is just being a jerk, and I think we all
recognize that. It is the CLC disease to be technically right, but
so obviously wrong, at the same time.
 
R

rjf

In case anyone is actually interested in the writing of a parser
in lisp that implements regular expressions, in part, I recommend this
article by Henry Baker, http://home.pipeline.com/~hbaker1/Prag-Parse.html

which displays in its entirety a lisp program remarkable (in my
opinion)
for its brevity and generality.


Somehow I think this thread is not about lisp or regular expressions,
but I can't figure out what it IS about, so apologies in advance if my
response is off topic. I'm just sort of going by the subject line.
RJF
 
G

George Neuner

From Chambers: "*bilge* noun 1a the broadest part of a ship's bottom;
b (usually bilges) the lowermost parts on the inside of a ship's
hull, below the floorboards. 2 (also bilge-water) the dirty water
that collects in a ship's bilge. 3 /rather dated colloq/ rubbish,
nonsense, drivel."

I can see why you might object to the use of the neologism
"nilgewater", but I see no sensible reason for objecting to the
perfectly normal word "bilgewater". One might reasonably object to a
particular application thereof in a given context, but the word
itself is surely unobjectionable.

<snip>

Can we, at least, object to the making of a single word noun from a
two word "adjective noun" combination?

English already has more than 1.2 million perfectly good words. Let's
all try using them before inventing new ones.

George
 
J

Jerry Coffin

Even though I'm quite sure the original post really was some close
relative of trolling, let's dissect the code itself, and see if it
really can be improved a great deal:

int match(char *regexp, char *text)
{
if (regexp[0] == '^')
return matchhere(regexp+1, text);

If an RE starts with '^', it has to match at the beginning of the
text, so we check only for a match of the remainder of the RE,
starting from the beginning of the text.

do { /* must look even if string is empty */
if (matchhere(regexp, text))
return 1;
} while (*text++ != '\0');

Otherwise, we step through each possible position in the text, and
check for a match at that position.

/* matchhere: search for regexp at beginning of text */
int matchhere(char *regexp, char *text)
{
if (regexp[0] == '\0')
return 1;

If we've reached the end of the RE, return 1, indicating we found a
match.

if (regexp[1] == '*')
return matchstar(regexp[0], regexp+2, text);

If we find a '*', use 'matchstar' to check for a match.

if (regexp[0] == '$' && regexp[1] == '\0')
return *text == '\0';

If the RE ends in '$', we have a match only if we've reached the end
of the text.

if (*text!='\0' && (regexp[0]=='.' || regexp[0]==*text))
return matchhere(regexp+1, text+1);

If we're at the end of the text, but there's more to the RE, we don't
have a match. Otherwise, if the current character of the RE is '.',
OR it's any other (non-meta-) character that matches the current
character in the text, we have a match iff the remainder of the RE
matches the remainder of the text.

return 0;

Otherwise (e.g. we were at the end of the text, but not the end of
the RE) we do not have a match.

So far, it's simple and straightforward -- we've simply enumerated
the possibilities, and handled each in a fairly straightforward
manner.

/* matchstar: search for c*regexp at beginning of text */
int matchstar(int c, char *regexp, char *text)
{
do { /* a * matches zero or more instances */
if (matchhere(regexp, text))
return 1;
} while (*text != '\0' && (*text++ == c || c == '.'));

Here's where things get a little bit tricky. Recall that matchstar()
is called when matchhere() encountered an RE something like 'x*E'. It
calls matchstar(), passing the 'x' as the first parameter, "E" as the
second, and the text as the third (where 'x' is some single
character, and "E" is an arbitrary RE).

matchstar() then says that such an RE matches if 1) 'x' matches some
number of characters in text, followed by E matching the remainder of
the text.

return 0;

If we fall out of the loop, we either reached the end of the text
without finding a match for the remainder of the RE, or else because
we found a mismatch between 'x' and the current character in the RE
before finding a match between the remainder of the RE and the
remainder of the text.

As to the original implication that the code is particularly
difficult to read or understand, I'd have to say no, not really. In
fact, much like most well written recursive code, it's structured
very much like an inductive proof: establish an invariant for what's
currently being considered, then show that the invariant remains true
for the remainder.

In this case, it does that by (mostly in matchhere) checking whether
the current item in the RE matches the current character in the text,
and if it does recursing to establish whether the remainder of the RE
matches the remainder of the text.

It's also rather similar to rather a large amount of Lisp code. The
primary difference is that in Lisp, the primary data structure is
typically a list, whereas here it's a string (or at least C's rather
limited version of a string). Nonetheless the similarity in usage is
striking. In essence, the code works primarily in terms of a CAR and
a CDR. The code examines the CAR directly, and assuming that's
successful, examines the CDR via a recursive call.

For most practical purposes, this _is_ Lisp code that happens to use
a rather strange syntax.
 
G

gavino

let me paste the code separately since it has some garbage that
inserted in the middle although it was just one block of text.

This is the matching code:
/* match: search for regexp anywhere in text */
int match(char *regexp, char *text)
{
if (regexp[0] == '^')
return matchhere(regexp+1, text);
do { /* must look even if string is empty */
if (matchhere(regexp, text))
return 1;} while (*text++ != '\0');
return 0;
}

/* matchhere: search for regexp at beginning of text */
int matchhere(char *regexp, char *text)
{
if (regexp[0] == '\0')
return 1;
if (regexp[1] == '*')
return matchstar(regexp[0], regexp+2, text);
if (regexp[0] == '$' && regexp[1] == '\0')
return *text == '\0';
if (*text!='\0' && (regexp[0]=='.' || regexp[0]==*text))
return matchhere(regexp+1, text+1);
return 0;}

/* matchstar: search for c*regexp at beginning of text */
int matchstar(int c, char *regexp, char *text)
{
do { /* a * matches zero or more instances */
if (matchhere(regexp, text))
return 1;

} while (*text != '\0' && (*text++ == c || c == '.'));
return 0;
}

By the way did you note the spicy style of narration by which they
promote each other ? and can you abstract the technique of narration
for image building ? Another scum in electronic, Bob Pease of national
semiconductor writes in the same style. I know all their basic ideas
in that field, which are trivial. The only art is the art of writing
and propaganda.

The one man I have real respect for and is humble is McCarthy. He
really put some original ideas together but still did not give any
clue how he constructed them. However, in lisp recursion is lucidly
clear. You test if a thing is nil. Else you check if it is an atom and
act appropriately to call a handler and otherwise recursion.

I am curious, so you are saying that one should not be impressed by c
and K+R and plan9 and all the rest? That they kind of suck and the
mccarthy really is much better and his lisp can do things much more
simply in fewer line of code?
 
G

gavino

By the walks-like,swims-like,quacks-like theorem, I do see why you
think so, but I think we should be slow to make such accusations,
since they are so often wrong. I have never actually been accused of
being a sock puppet as far as I am aware, but I have certainly been
accused of running them (which I have never done, but of course those
who accuse me don't believe that).

People can and often do use similar styles, especially when responding
to people with whom they're perhaps not the best of friends. The
other day, an irregular poster, a guy - who'd been away for a while
and is unlikely to have become familiar with my occasional habit of
marking nilgewater with "nonsense snipped - nothing left" - made
precisely the same response. It would be easy to accuse /him/ of
being a sock puppet... and yet his name has been well-known in
another group for many years and he knows far more about that group's
subject than I do, and I vaguely recall having a few extended
disagreements with him in the past. He also posts from a different
continent, by the way.

Sock puppets undoubtedly happen, but they don't really matter because
what counts is not who says what, but what they say (and, to a
certain extent, how they say it). We might pay a lot of attention to
a name we recognise (e.g. if Donald Knuth started posting, we'd sit
up and take notice) - but *only* after it became obvious that it
wasn't some spotty teenager mucking about. And we judge that by
content. "By their fruits you shall know them" and all that.

In the current case, bolega (unknown name, to me at least) is slagging
off Brian Kernighan and Rob Pike (both names known to me) and their
book (which I've read). He starts with the word "braggart", and the
rest is similar ranting. It's just perfectly normal bilgewater of the
sort you often find on the Internet, but I don't see why we need to
accuse it of being nilgewater as well.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within

reality: show me the apps!! apps I am using now: windows7[c?],
firefox3[c], pidgin char[gtk+c?], opera browser[c?], and vlc media
player[c] seem all to be made in c, so where are the lisp replacements
and I shall happily use them, esp if they are more felxible and fast
enuf.
 
G

gavino

Read this excellent article: Software Fault Prevention by Language
Choice: Why C is Not My Favorite Language, Richard Fateman. Fateman is
on the faculty of the rather C-centric Univ of California at Berkeley.http://www.eecs.berkeley.edu/~fateman/papers/software.pdf.

My current language C Sharp addresses some but not all of Fateman's
points.








Is C here to stay?
No way.
In 2038, lemme tellya straight,
C's inability to get a date
Right will cause the thousand years of darkness.
Ever wonder why spam sorts high, and makes you cry?
You're looking at the answer, my oh my.

forth for the win!
 
G

gavino

Don't make mock of my patronym, punk.




Wise words, for wisdom crieth in the streets. The only time I have
used a sock puppet was once on wikipedia.




Don't make mock of my patronym or I will make mock of you as I have in
the past. You **** with me I fight back, you dig? My relatives are war
heroes and professionals and they don't deserve this shit.






Don't make mock of my patronym, since it's an insult not so much to me
as to my relatives. If you must make mock, my first name is Edward and
my handle is spinoza1111.

My uncle, Edward Joseph NILGES, was killed in Italy in March 1945
leading Japanese American Nisei against the Germans. Don't mock him.

My father, Richard G Nilges, fought unethical practises in medicine.
Don't mock him.

Mock my first name or posting handle. Lay off "Nilges".

inyego montoya, you killed my father! you are about to die!
 
G

gavino

In <[email protected]>,





From Chambers: "*bilge* noun 1a the broadest part of a ship's bottom;
b (usually bilges) the lowermost parts on the inside of a ship's
hull, below the floorboards. 2 (also bilge-water) the dirty water
that collects in a ship's bilge. 3 /rather dated colloq/ rubbish,
nonsense, drivel."

I can see why you might object to the use of the neologism
"nilgewater", but I see no sensible reason for objecting to the
perfectly normal word "bilgewater". One might reasonably object to a
particular application thereof in a given context, but the word
itself is surely unobjectionable.

<snip>

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within

everyone knows this
 
G

gavino

Can we, at least, object to the making of a single word noun from a
two word "adjective noun" combination?

English already has more than 1.2 million perfectly good words.  Let's
all try using them before inventing new ones.

George

just as in forth one should learn the base word list before making a
word to do something from lots fo base words that does the same as a
low level word
 
G

gavino

Even though I'm quite sure the original post really was some close
relative of trolling, let's dissect the code itself, and see if it
really can be improved a great deal:

int match(char *regexp, char *text)
{
        if (regexp[0] == '^')
                return matchhere(regexp+1, text);

If an RE starts with '^', it has to match at the beginning of the
text, so we check only for a match of the remainder of the RE,
starting from the beginning of the text.

        do { /* must look even if string is empty */
                if (matchhere(regexp, text))
                        return 1;
        } while (*text++ != '\0');

Otherwise, we step through each possible position in the text, and
check for a match at that position.

/* matchhere: search for regexp at beginning of text */
int matchhere(char *regexp, char *text)
{
        if (regexp[0] == '\0')
                return 1;

If we've reached the end of the RE, return 1, indicating we found a
match.

        if (regexp[1] == '*')
                return matchstar(regexp[0], regexp+2, text);

If we find a '*', use 'matchstar' to check for a match.

        if (regexp[0] == '$' && regexp[1] == '\0')
                return *text == '\0';

If the RE ends in '$', we have a match only if we've reached the end
of the text.

        if (*text!='\0' && (regexp[0]=='.' || regexp[0]==*text))
                return matchhere(regexp+1, text+1);

If we're at the end of the text, but there's more to the RE, we don't
have a match. Otherwise, if the current character of the RE is '.',
OR it's any other (non-meta-) character that matches the current
character in the text, we have a match iff the remainder of the RE
matches the remainder of the text.

        return 0;

Otherwise (e.g. we were at the end of the text, but not the end of
the RE) we do not have a match.

So far, it's simple and straightforward -- we've simply enumerated
the possibilities, and handled each in a fairly straightforward
manner.

/* matchstar: search for c*regexp at beginning of text */
int matchstar(int c, char *regexp, char *text)
{
        do { /* a * matches zero or more instances */
                if (matchhere(regexp, text))
                        return 1;
        } while (*text != '\0' && (*text++ == c || c == '.'));

Here's where things get a little bit tricky. Recall that matchstar()
is called when matchhere() encountered an RE something like 'x*E'. It
calls matchstar(), passing the 'x' as the first parameter, "E" as the
second, and the text as the third (where 'x' is some single
character, and "E" is an arbitrary RE).

matchstar() then says that such an RE matches if 1) 'x' matches some
number of characters in text, followed by E matching the remainder of
the text.

        return 0;

If we fall out of the loop, we either reached the end of the text
without finding a match for the remainder of the RE, or else because
we found a mismatch between 'x' and the current character in the RE
before finding a match between the remainder of the RE and the
remainder of the text.

As to the original implication that the code is particularly
difficult to read or understand, I'd have to say no, not really. In
fact, much like most well written recursive code, it's structured
very much like an inductive proof: establish an invariant for what's
currently being considered, then show that the invariant remains true
for the remainder.

In this case, it does that by (mostly in matchhere) checking whether
the current item in the RE matches the current character in the text,
and if it does recursing to establish whether the remainder of the RE
matches the remainder of the text.

It's also rather similar to rather a large amount of Lisp code. The
primary difference is that in Lisp, the primary data structure is
typically a list, whereas here it's a string (or at least C's rather
limited version of a string). Nonetheless the similarity in usage is
striking. In essence, the code works primarily in terms of a CAR and
a CDR. The code examines the CAR directly, and assuming that's
successful, examines the CDR via a recursive call.

For most practical purposes, this _is_ Lisp code that happens to use
a rather strange syntax.

are you talking about using a perl compat regex package in lisp? or
doing regex natively in lisp somehow?
 
M

Michael Foukarakis

This braggart admits that he had to put this code in TWO books and
visit it twice to be explained. I am puting the excerpt from pp2-4 of
this book and the C code. The C code will become indented and syntax
highlighted once you paste in emacs etc. It is my belief and
observation on a lot of problems by these so called "demi gods" that
they are actually all average and no more intelligent. Its all that
they got some opportunities to study some things at opportune time and
facilities and also spent a lot of time in a productive environment
and team.

I know that lisp eval is written more clear than this recursion below
because I am able to read it easily. and that code is almost self
explanatory. C is more quirky. When you really mean recursively call
another function, you are using return so you can have tail
recursion ???? .

Anyway, its your chance to show how clear C/C++/lisp/Scheme code you
can write that is clearer. Also, i dont exclude pseudocode but it
should be clear enough to be instantly translatable to a programming
language. The real goal is to learn how to write or DERIVE recursion,
how to enumerate cases, order them, and build recursion. You may even
put some introductory tuturial and dont have to reply here in ascii
but can upload a pdf at some link in rapidshare etc. Look how he put
the code after problem statement and then has the need to explain it.
Ideally, the discussion should be such that the student or reader
himself jumps to the solution. That is why I give these unix school of
pike/thomson/kernighan low grade in almost all their expositions
except that they wrote earliest books to make millions of dollars in
royalties and since now they are nobody due to linux, they are poorly
regurgitating old material.

"He" has the need to explain it because its goal is to provide advice
and guidance, ie. be educational.

Now go fetch Lisp device drivers.
 
A

Aleksej Saushev

gavino said:
forth for the win!

The world still waits for you to write any program longer than 8 lines,
in any language, be it Forth, Lisp or Scheme.
 
S

spinoza1111

The world still waits for you to write any program longer than 8 lines,
in any language, be it Forth, Lisp or Scheme.

How about 26000 lines in Visual Basic? 50000 in Cobol? 75000 in
assembler? 25000 in PL/I? 20000 in C sharp? 10000 in C? 1000 in TI-79
machine language, a compiler and run time in 1K for Mouse, a Forth
knockoff?
 
N

Nick Keighley

[...] It is my belief and
observation on a lot of problems by these so called "demi gods" that
they are actually all average and no more intelligent. Its all that
they got some opportunities to study some things at opportune time and
facilities and also spent a lot of time in a productive environment
and team.

it's a *bad* thing that they spent time in a productive team?

I know that lisp eval is written more clear than this recursion below
because I am able to read it easily. and that code is almost self
explanatory. C is more quirky. When you really mean recursively call
another function, you are using return so you can have tail
recursion ???? .

ah, what? In what sense is C's recursion any more difficult that
any other language's recursion?


<snip>
 
N

Nick Keighley

[some program is] not extensible unless you're [familiar with idiomatic C] who's
comfortable with things like using parameters as work areas.

you keep on saying this as if it's some problem with C.
You can do it with *most* languages.

FORTRAN, Algol-60, Coral, Pascal, C++, Scheme all allow it

You can be prevented from doing it in Ada AND C
 
J

Janis Papanagnou

spinoza1111 said:
Do you think C syntax is brain damaged ? [...]

In advance to the comment I give below I'd like to point out that C is
not my Favourite Language.
Read this excellent article: Software Fault Prevention by Language
Choice: Why C is Not My Favorite Language, Richard Fateman. Fateman is
on the faculty of the rather C-centric Univ of California at Berkeley.
http://www.eecs.berkeley.edu/~fateman/papers/software.pdf.

You call that article "excellent"? I've got curious and read it. Wow.
It's one of the lousiest articles I've been pointed to for quite some
time. After the first pages I thought the author must love that language
(lisp) so much[*] that he got carried away, and just because of that
enthusiasm he's lacking to be precise or sophisticated in his comparisons,
deductions, and conclusions. But then the comparisons got partly even so
ridiculous that I thought this is certainly a joke, a fun paper, to subtly
promote C instead of lisp. In the end there's a note that the author was
a founder of a lisp systems vendor; while that might at least explain why
the paper is written as it is, it's nonetheless something I'd not expected
from an MIT emeritus professor in computer science, who should be, as all
scientists, obliged to to argue conclusively and fair. YMMV, of course.

Epilog: I've stumbled across the above quoted comment due to the posting
having been crossposted to a lot of groups, and I probably won't see your
replies.[**] Though, I don't care about discussions of *that* paper anyway,
because of it's extremely low quality, and it would thus require a huge
amount of effort if someone would really like to seriously argue about it;
I think the paper is not worth that effort. My intention was to warn folks
about so called "excellent articles" from (probably) sacrosanct authors.

Have fun with your favourite language(s) of choice, be it Lisp, or C, or
whatever else you enjoy to struggle with.

Janis

[*] Confirmed by the author at the end of his article.

[**] I haven't subscribed the two groups to which I reduced the x-posting
list. (But feel free to drop me an email if you like.)
 
P

Pascal J. Bourguignon

Richard Heathfield said:
spinoza1111 said:
Do you think C syntax is brain damaged ? [...]

In advance to the comment I give below I'd like to point out that C
is not my Favourite Language.
Read this excellent article: Software Fault Prevention by Language
Choice: Why C is Not My Favorite Language, Richard Fateman. Fateman
is on the faculty of the rather C-centric Univ of California at
Berkeley.
http://www.eecs.berkeley.edu/~fateman/papers/software.pdf.

You call that article "excellent"? I've got curious and read it.

And now you've got me curious, so I've read it. It's an interesting
article, but it could be made a lot shorter.

Yes, you've made me too read again that excellent article. Not all
articles written by academic guys need to be full of formula,
statistics and graphs. If you want that kind of article comparing
languages, they exist too. And for precise enumeration of C pitfalls,
a single article is not enough, whole libraries are written about
that!

I hereby present the full text of the proposed Second Edition:

++++++++++++++++++++++++++++++++++++++++++++++++++

WHY C IS NOT MY FAVOURITE PROGRAMMING LANGUAGE

Because I prefer Lisp.

++++++++++++++++++++++++++++++++++++++++++++++++++

Really, I didn't see much in there at all about C. In the few places
it was referenced, he seemed to be talking about lousy C code, which
certainly exists but by no means has a monopoly. Is there no such
thing as lousy (louthy?) Lisp code?

No. That's the point of that article.

You just cannot write Object* a; in Lisp.
You just cannot write a**b+++c in Lisp.
You just cannot write 4["abc"] in Lisp.
You just cannot write int a=100000;short b=a; in Lisp.
You just cannot write if(b=100000); in Lisp.
You just cannot write int* a=(int*)42; in Lisp.

You're right that there are also some pitfalls in Lisp, but instead of
being pitfalls that any programmers including newbies encounter
everyday, and overlook everyday, thus leaving numerous bugs in C
programs, pitfalls in Lisp are a few sophisticated technicalities
that are generally not encountered by programmers. The set of bug
producing pitfalls in Lisp is infinitesimal compared to C.
 
P

Pascal J. Bourguignon

Richard Heathfield said:
Hmmm. Okay, I'll bite - or try to. I guess you're not normally a
comp.lang.c kind of guy, and I'm certainly not a comp.lang.lisp kind
of guy, so perhaps we can take this up in c.p when I've had a chance
to install Lisp on my box?

Sure.

In the meantime have a look at the articles referenced on
http://www.cliki.net/Performance
comparing programmer productivity in various languages.

(And mind the http://www.cliki.net/Education links for lisp beginners).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top