Can anyone write this recursion for simple regexp more beautifullyand clearly than the braggarts

Vassil Nikolov · Sep 12, 2009

...
In another example, he seems to
think C programmers should know what "b+++c" does,
but don't and that this is an argument against C.
(I'm not too shy to admit *I* wasn't sure what "b+++c"
does. FWIW, gcc behaves as expected when faced with
either "b+ ++c" or "b++ +c" and treats "b+++c" as the
latter. But I can't imagine the "+++" form except in a
trolling question or in an Obfuscation Contest entry.)

The meaning of

b+++c

in the C programming language cannot be asserted by what a compiler
does with it, but by the syntactic rule (which I thought was very
well known) that, essentially, tokenization is greedy (there is no
`+++' token in the grammar, and `++' is longer than `+', therefore
the tokenization of the above is `b' `++' `+' `c').

(BTW, gcc accepts both "a+ + + +b" and "a+ ++ +b"
without complaint but generates *different* results
for them! Is *this* compliant?)

These are two different expressions, of course, but is

+b

an lvalue?

...
By this measure, Fateman might win the debate with
(dolist (x Foo)
(Guzzle x))
compared with C's slightly less terse
for (x = Foo_head; x; x = x->next)
Guzzle(x);
but a C programmer might #define dolist
if tersity is all that matters.

What would that definition look like? (The body of DOLIST may
contain an arbitrary number of forms.)

---Vassil.

vippstar · Sep 12, 2009

On Sep 12 said:
By this measure, Fateman might win the debate with
(dolist (x Foo)
(Guzzle x))
compared with C's slightly less terse
for (x = Foo_head; x; x = x->next)
Guzzle(x);
but a C programmer might #define dolist
if tersity is all that matters.

I just tried it; I must say: I'm surprised by the flexibility of the C
preprocessor, and was unable to find a difference between the two
macros. It's likely most of common lisp can be macro'ed that way into
C, so the jokes on CL bragging about its macros!

Frankly, I'd much rather type out the for-loop
than remember what 'dolist' does, especially since
Fateman implies there's a host of such keywords
I'll need to memorize.

He seems proud of a
'remove-duplicates' keyword, but *don't* tell him
you could write a remove_duplicates() function if
needed: he stipulates at the top of Page 7 that any C
function will be hard to decode and "possibly erroneous"!

Of course in the real world, with all the incompetent programmers, he
is quite right: anything with such flexibility quite likely has a bug-
infested implementation. He's not talking about a small circle of
people (c.l.c. inhabitats for instance), which might achieve better
results on average, he's talking about the average of C programmers -
and I think we'd agree there's more average programmers than experts.
But it is shown on a daily basis at c.l.c., that most code hides a bug
or two that only the regulars know (plus those who know C but are not
regulars). Of course, in that case he's also making a point about
incompetent lisp programmers but in any case, he's correct saying
that.

I've no idea if the performance cost of the blithely-made
copies is important, but Fateman seems to think so, going
so far as to "counterattack" with a C "inefficiency":
"implementations of C typically waste some number of bits
in each 32-bit pointer for machines that have an actual
address space less than 4 gigabytes"
Wow! Great moments in Computer science. Not!

We'd have releases of software for computers with 1GB memory, 2GB
memory, etc, and incompability problems when upgrading memory (in
reality, the optimization scheme can't work at all).

Willem · Sep 12, 2009

vippstar wrote:
) Of course in the real world, with all the incompetent programmers, he
) is quite right: anything with such flexibility quite likely has a bug-
) infested implementation. He's not talking about a small circle of
) people (c.l.c. inhabitats for instance), which might achieve better
) results on average, he's talking about the average of C programmers -
) and I think we'd agree there's more average programmers than experts.

Perhaps there are a lot more average programmers doing C than there are
doing Lisp ? Who knows, perhaps those that learn Lisp are inherently those
that are better at programming in general.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Keith Thompson · Sep 12, 2009

Richard Heathfield said:
In <[email protected]>,
James Dow Allen wrote: [...]

(BTW, gcc accepts both "a+ + + +b" and "a+ ++ +b"
without complaint but generates *different* results
for them! Is *this* compliant?)

Click to expand...

The first one is fine - unary +. Since +b is the same as b (unary +,
remember), and + +b is likewise the same as +b, and + + +b is
likewise the same as + +b, the a+ at the beginning just means "add a
to" - throwing in a binary + curveball for you.

The second one looks wrong to me because unary + is still an operator,
so it yields a value, and you can only ++ an object, not a value. If
that's legal (which I doubt), it's only because of a weird exception
for unary +. Does the Standard grant such licence? I doubt it very
much indeed.

[...]

The second one certainly looks like a bug in gcc. +b is not an
lvalue, so you can't legally apply ++ to it. (The standard doesn't
say so explicitly, but it seems clear enough. For one thing, the
integer promotions are applied to the operand; if b is of type short,
then +b is of type int, and it doesn't make sense for it to be an
value of type int with no int object in sight.)

I wouldn't be surprised if gcc allowed this as an extension, but it
won't complain about it even with options that tell it to attempt to
behave as a conforming compiler.

Sun's compiler complains:

"c.c", line 6: operand must be modifiable lvalue: op "++"

James Dow Allen · Sep 12, 2009

On Sat, 12 Sep 2009 00:30:15 -0700 (PDT), James Dow Allen <[email protected]> said:
The meaning of
b+++c
in the C programming language cannot be asserted by what a compiler
does with it ...

Yep. Sorry if my post implied otherwise.

I sometimes report gcc's behavior in lieu of quoting
The Standard(tm) (which I've never read) since, if there's
a difference, it would be interesting to know!

... the syntactic rule (which I thought was very
well known) that, essentially, tokenization is greedy ...

I'm sure it's well known to C language experts, but count me
out there! In 30 years of C programming (which never included
a C parser, obviously) I've *never* needed to remember this
fact! Perhaps my C coding is just too tame, but
I don't think it's been overly unsuccessful. For me, C
is about interesting algorithms like some of the code or puzzles
at my website, but I know for some of you it's more important
to see who can parse "a+ + ++ + +b" first.

These are two different expressions, of course, but is
+b
an lvalue?

I sure don't think so! This gcc behavior is mysterious to me;
I stumbled on it only because I'd already typed in the
unremembered "+++" case and added some more "+"s just for
fun.

What would that definition look like? (The body of DOLIST may
contain an arbitrary number of forms.)

As I imply, I don't advocate it (except for special purposes)
but simplest might be
#define dolist(x,h) for (x = h; x; x = x->next)
Now the Lisp fragment above "translates" to
dolist(x, Foo) {
Guzzle(x);
}
Very similar -- except for trivia like a prefix "("
becoming an infix "{".

James

Flash Gordon · Sep 12, 2009

Richard said:
Richard Heathfield wrote:
Is there no such
Ahem.

It may help some people to look at the inspiration for my article,
namely
Weider D. Yu. “A Software Fault Prevention Approach in Coding and Root
Cause Analysis,” Bell Labs Technical Journal, vol. 3 no. 2 April-June
1998, 3–21.

The (unfortunately unstated) point of Yu's article and (explicitly
stated) point of mine is that one should consider language choice if
reliability is important. I show that (as one possibility) it would be
rather hard to commit many of those errors in Lisp.

A number of us over on comp.lang.c don't consider C to be the best
language for everything, and indeed many of us use several languages.

Some of your article is a bit out of date now. For example...

A number of good compilers and analysis tools will detect the use of
uninitialised variables, so to catch the logic error of using an
uninitialised variable just read what the tools tell you!

You comment that Lisp code is properly indented in the normal
development environment. The same applies to C if you are using a decent
development environment, including many free editors which are
independent of specific C compilers. This correct indentation (and free
tools and editors will reindent) make it clear what you are breaking out
of or continuing.

Your comments about interface flaws are now irrelevant in "properly"
written C. Function prototypes have been available for a long time which
provide parameter type checking (your suggestion of a function expecting
a pointer but the call disagreeing would be required to generate a
diagnostic given a function prototype). C99 has added a requirement that
there is always a declaration in scope (although not a full prototype),
and for a long time compilers have been able to warn when a function is
called without a prototype in scope.

Your tradition of functions in lisp returning a condition if there is
nothing else to return is also a tradition in C. You can also return
multiple values in C by wrapping them in a struct, although this is
fiddly and not generally done.

Your section 4.3 I consider irrelevant since a modern editor will show
you if the extra statement for an if is not part of the "then" clause
since it will indent it in a way which makes this clear!

There are times when your hard real-time constraints mean that even a
1ms delay for GC would be problematic, so for such code I would still
not use a language with GC. I would happily consider other languages,
however, and would not insist on C. I also would not rule out Lisp for
other tasks just because it might not be suitable for this one.

I've done a lot of work on systems with less than 64K of memory, where
again the overhead of Lisp would not be acceptable. Adding more memory
would up production costs, power requirements and heat dissipation which
are all things which need to be kept down (under hard limits) for some
systems.

In short, use the right language for the job, which may not be Lisp or
C, and bare in mind all of the points. I won't spend a wek learning a
new language to write a 50 line program, but I might for a far larger
program. I've sometimes spent an hour learning a language to write 1
line of code, because I knew it was the right tool for the job!

James Kuyper · Sep 12, 2009

James said:
... Is confusion between "(*n)++" and "*n++"
or between "**p" and "*p" really a major source of C's alleged
productivity problem?? ...

It's been decades since I stopped making mistakes that simple, but it's
not uncommon in newbies.

... Isn't "0" the only Boolean false
in C? ...

If you mean a value of boolean type no. The constant 0 has int type. C90
doesn't have any boolean type as such; in C99 you could use (_Bool)0 or,
if you #include <stdbool.h>, you could also use (bool)0.

If you mean a value that is treated as false when it appears as a
condition in if(), while(), or for() statements, or in ?: expressions,
the answer is still "No". Any C expression with a value of 0 is treated
as false in those contexts. The value 0 can be represented in any
integer, floating point (including complex and _Imaginary), or character
type. As a special rule, a null pointer of any type compares equal to 0,
even though 0 is an integer, not a value of pointer type: the 0 gets
implicitly converted to a null pointer of the corresponding type, for
purposes of the comparison.

As a result, there are LOTS of ways to write a constant with a value
that counts as false in C. Here's just a few of them:

0, 0X0, 0U, 0L, 0LL, 0.0, 0.0E10, 0.0F, 0.0L, 0.0P-5, '\0', L'\0', '\x0'

And of course, (T*)0, where T represents any arbitrary type.

If you #include <stdbool.h>, you can also use the macro named "false",
which expands to a 0. I would have thought it slight more appropriate to
have it expand to (_Bool)0, but given the way C's integer promotions
work, that wouldn't make much difference.

I think it would be a good idea for conditions to require an expression
of boolean type, with all comparison operators returning values of that
type, and with no conversions allowed to and from boolean type. The
equivalent of such conversions could be achieved by

(nonbool == 0)

and

(boolvalue ? 1 : 0)

Such rules would be a minor inconvenience when writing code, but would
make the code clearer and more maintainable. However, the need for
backwards compatibility means that such a change could never be made to C.

....

(BTW, gcc accepts both "a+ + + +b" and "a+ ++ +b"
without complaint but generates *different* results
for them! Is *this* compliant?)

I don't think so. I believe that the first form is equivalent to

a + (+(+(+b)))

or in other words, a + b. The second form is equivalent to

a + (++(+b))

Which is a constraint violation because the result of the unary +
operator is not an lvalue, and the ++ operator requires an operand that
is an lvalue.

Nicolas Neuss · Sep 12, 2009

Richard Heathfield said:
But performance is clearly one factor. (Not the only factor, as
Richard Gabriel rightly points out.) And although I am no Lisp expert
(in fact, I'm currently a "hello world" kind of guy), it seems from my
(minimal) research on the subject that it is quite difficult to
squeeze enough out of Lisp to get it anywhere near C in performance
terms.

If one uses a good Common Lisp implementation (especially one which
compiles to native code, as e.g. CMUCL/SBCL, Allegro CL, or Lispworks),
fI would expect that for most code the results for equivalent code are
not worse than C by a factor of 2 or maybe 3. The reason is that one
can write Common Lisp code also very C-like, with type and optimization
declarations, so there is no language reason why compiled code should be
slower. (The observable difference is due to slightly less elaborate
compiler backends and possibly also garbage collection.)

Nicolas

Michael Weber · Sep 12, 2009

But performance is clearly one factor. (Not the only factor, as
Richard Gabriel rightly points out.) And although I am no Lisp expert
(in fact, I'm currently a "hello world" kind of guy), it seems from
my (minimal) research on the subject that it is quite difficult to
squeeze enough out of Lisp to get it anywhere near C in performance
terms.

Most things we do the first time are "quite difficult". With some
practice and a profiler which tells you where the hot spots are, it's
actually not hard at all to optimize Lisp code; certainly not harder
than optimizing C++ code, IME.

In fact, this is something that I like about CL in comparison to other
dynamic languages: I don't have to drop down to C to make it fast. I
can eliminate performance bottlenecks while staying within the
language, and keep all the flexibility and comfort of a dynamic
language where performance does not matter.

Some pointers:
* ICFP 2006 VM in CL, run-time within 20% of C implementations (but
quite a bit shorter) by adding a few declarations:
http://www.foldr.org/~michaelw/log/programming/lisp/icfp-contest-2006-vm

* B. Svingen, "When Lisp is faster than C",
http://www.cs.bham.ac.uk/~wbl/biblio/gecco2006/docs/p957.pdf

* D. Verna, "Beating C in Scientific Computing Applications",
http://www.lrde.epita.fr/~didier/research/verna.06.ecoop.pdf

Richard Fateman · Sep 12, 2009

For people who think that I set up straw men -- that is, made up
implausible mistakes in the C programming language:

I once again recommend that you look at Weider Yu's article, where he
itemizes these pitfalls as frequent causes of erroneous behavior that
were found in production code, in a Bell System ESS switching system.
These items were presumably written by programmers who were paid (some
would call them "professional programmers").

I didn't make them up.

My point was, Yu didn't consider the obvious solution to avoiding these
pitfalls: Choose a programming language that did not have them. Is this
point weakened by the availability of tools that can catch some of these
errors automatically (if they are used)? Maybe somewhat. Is there a good
argument that C is now the right language to use if your priority is
reliable code? Maybe the argument would be that academics are working
on proving the correctness of programs written in C? Maybe the argument
is that there are so many more C programmers that you should just have
multiple teams writing the code, and run the programs at the same time
and have them "vote"? (Especially after recent layoffs, you should be
able to hire lots of them.)

By the way, when I wrote the article, I just posted it on my web page. I
did not try to publish it anywhere. The editor of that publication
found it on my web page and asked if he could publish it. I allowed it.
RJF

PS. While I wrote the article, and I still pretty much agree with what
it says, I did not start this thread

Michael Weber · Sep 12, 2009

I believe it was another famous Lisp hacker, David Moon, who said that
no language can prevent a bad programmer from writing bad code. The
most we can say is that some languages provide more ways for you
easily write bad code. C is, in some respects, such a language, but
in return it provides various advantages to the careful programmer.

It's not for nothing that many Lisp implementations are written at
least partly in C.

The reason that part of the runtime of many language implementations,
including CL, is written in C is because the environment (OS, system
libraries, etc.) is written in C, and at some point we need to
interface with them. The simplest way to interface with C is using C,
wouldn't you think?

People have found creative ways to minimize the amount of code that
needs to be written in C; for example: Pre-Scheme <http://
citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.3.4031>, or ThinLisp
<http://www.thinlisp.org/>, etc..

Richard Tobin · Sep 12, 2009

Michael Weber said:
The reason that part of the runtime of many language implementations,
including CL, is written in C is because the environment (OS, system
libraries, etc.) is written in C, and at some point we need to
interface with them.

That's one reason.

But numerous Lisp interpreters have been written in C, and there are
several Lisps that compile to C. The latter case is an example of
C's role as a portable assembler language.

-- Richard

Richard Tobin · Sep 12, 2009

Keith Thompson said:
The second one certainly looks like a bug in gcc. +b is not an
lvalue, so you can't legally apply ++ to it.

gcc 4.0.1 says:

plus.c:5: warning: target of assignment not really an lvalue; this will be a hard error in the future

Quite possibly it doesn't give an error with older versions unless you
specify -ansi or similar.

-- Richard

Pascal Costanza · Sep 12, 2009

Richard said:
<cough, splutter> Er, okay, I think I'll put Lisp back on the shelf.
Look, it isn't dirty or anything. Thanks for your time, but I was
looking for something a little sportier. I mean, I knew it would be
slower, but a factor of TWO?

Keep in mind that we're talking about a language that's extremely
flexible and dynamic, similar to 'modern' scripting languages like
Python, Ruby, JavaScript, and the like, with all the benefits of
providing higher productivity, etc.

Considering that, a factor of only two is very impressive, IMHO.

Pascal

Bart · Sep 12, 2009

Keep in mind that we're talking about a language that's extremely
flexible and dynamic, similar to 'modern' scripting languages like
Python, Ruby, JavaScript, and the like, with all the benefits of
providing higher productivity, etc.

Considering that, a factor of only two is very impressive, IMHO.

Yes, that's pretty good. I'm working on a dynamic language now which
at present is some 3 to 10 times as slow as optimised C (although
there's some way to go yet...).

But that's measured for tight integer code. When you throw in some
string processing, higher level datatypes, and calls into the runtime,
then they can be comparable, say between 1 and 2 times as slow, for a
language considerably more expressive (ie. increase in apparent
runtime of 0 to 100%). In theory...

So perhaps some of that may be true for Lisp, although it did sound as
though you were already pulling out all the stops to get it down to
only 2x as slow as C.

And ultimately, for programs with a short runtime, it really doesn't
matter if it takes 100ms or 200ms.

Keith Thompson · Sep 12, 2009

James Kuyper said:
James Dow Allen wrote: [...]

... Isn't "0" the only Boolean false
in C? ...

Click to expand...

If you mean a value of boolean type no. The constant 0 has int
type. C90 doesn't have any boolean type as such; in C99 you could use
(_Bool)0 or, if you #include <stdbool.h>, you could also use (bool)0.

If you mean a value that is treated as false when it appears as a
condition in if(), while(), or for() statements, or in ?: expressions,
the answer is still "No". Any C expression with a value of 0 is
treated as false in those contexts. The value 0 can be represented in
any integer, floating point (including complex and _Imaginary), or
character type. As a special rule, a null pointer of any type compares
equal to 0, even though 0 is an integer, not a value of pointer type:
the 0 gets implicitly converted to a null pointer of the corresponding
type, for purposes of the comparison.

To be painfully precise, it's not quite "Any C expression with a
value of 0" that's treated as false, it's any C expression whose
value *compares equal* to 0.

0, which is syntactically an octal (yes, octal) integer constant,
yields the value zero of type int. 0.0 yields the value zero of type
double; it's really a different value, since it's of a different
type. But the comparison (0.0 == 0) yields a true result because
of the conversion rules, so in "if (0.0)" it's treated as false.

So rather than "Any C expression with a value of 0", it's any C
expression whose value, when compared to 0 using "==", after any
conversions imposed by the "==" operator to force both operands to
be of the same type.

As a result, there are LOTS of ways to write a constant with a value
that counts as false in C. Here's just a few of them:

0, 0X0, 0U, 0L, 0LL, 0.0, 0.0E10, 0.0F, 0.0L, 0.0P-5, '\0', L'\0', '\x0'

And of course, (T*)0, where T represents any arbitrary type.
'-'-'-'

[...]

I think it would be a good idea for conditions to require an
expression of boolean type, with all comparison operators returning
values of that type, and with no conversions allowed to and from
boolean type. The equivalent of such conversions could be achieved by

(nonbool == 0)

and

(boolvalue ? 1 : 0)

Such rules would be a minor inconvenience when writing code, but would
make the code clearer and more maintainable. However, the need for
backwards compatibility means that such a change could never be made
to C.

I agree, both that it would be better and that it's too late for C.

Richard Fateman · Sep 12, 2009

Richard Heathfield wrote:
....

<cough, splutter> Er, okay, I think I'll put Lisp back on the shelf.
Look, it isn't dirty or anything. Thanks for your time, but I was
looking for something a little sportier. I mean, I knew it would be
slower, but a factor of TWO?

<snip>

Oh, trolling through the park. I think you should put C back on the
shelf and program in assembler. Real men program in assembler.
But maybe if you program in binary you would save all the time it takes
to run through an assembler. So maybe real men program in binary.
RJF

Alessio Stalla · Sep 12, 2009

Yep. Sorry if my post implied otherwise.
I sometimes report gcc's behavior in lieu of quoting
The Standard(tm) (which I've never read) since, if there's
a difference, it would be interesting to know!

I'm sure it's well known to C language experts, but count me
out there! In 30 years of C programming (which never included
a C parser, obviously) I've *never* needed to remember this
fact! Perhaps my C coding is just too tame, but
I don't think it's been overly unsuccessful. For me, C
is about interesting algorithms like some of the code or puzzles
at my website, but I know for some of you it's more important
to see who can parse "a+ + ++ + +b" first.

I sure don't think so! This gcc behavior is mysterious to me;
I stumbled on it only because I'd already typed in the
unremembered "+++" case and added some more "+"s just for
fun.

As I imply, I don't advocate it (except for special purposes)
but simplest might be
#define dolist(x,h) for (x = h; x; x = x->next)
Now the Lisp fragment above "translates" to
dolist(x, Foo) {
Guzzle(x);
}
Very similar -- except for trivia like a prefix "("
becoming an infix "{".

Where is x declared in the C version? dolist in Lisp introduces a new
local variable.

Alessio

Slobodan Blazeski · Sep 12, 2009

Richard Heathfield wrote:

...

Oh, trolling through the park. I think you should put C back on the
shelf and program in assembler. Real men program in assembler.
But maybe if you program in binary you would save all the time it takes
to run through an assembler. So maybe real men program in binary.
RJF

Nope, real programmers use butterflies http://xkcd.com/378/

Bobi

http://www.linkedin.com/in/slobodanblazeski

Pascal Costanza · Sep 12, 2009

Bart said:
Yes, that's pretty good. I'm working on a dynamic language now which
at present is some 3 to 10 times as slow as optimised C (although
there's some way to go yet...).

But that's measured for tight integer code. When you throw in some
string processing, higher level datatypes, and calls into the runtime,
then they can be comparable, say between 1 and 2 times as slow, for a
language considerably more expressive (ie. increase in apparent
runtime of 0 to 100%). In theory...

So perhaps some of that may be true for Lisp, although it did sound as
though you were already pulling out all the stops to get it down to
only 2x as slow as C.

That depends on the kind of application you want to build. The good
thing about Lisp is, though, that you can stay within the same language
both for the convenient, flexible parts as well as for the
performance-critical parts. You don't have to get outside the boundaries
of Lisp "just" for performance purposes.

Pascal

Help!! Can anyone provide this solution?	1	Jan 30, 2022
Can anyone help me code a simple python code?	1	Mar 13, 2022
RegExp - Match specific words, but not if they're inside parenthesis (with or without other words within)	6	Jan 29, 2023
Can someone tell me what's wrong with this question on StackOverflow?	0	Aug 19, 2023
Can anyone please help me in this code	1	Apr 13, 2022
Can anyone help me whats wrong with this	2	Jun 2, 2022
Idk need help in editing this source code	0	Nov 5, 2022
May anyone tell me why the values aren't swapping?	0	Feb 29, 2020

Can anyone write this recursion for simple regexp more beautifullyand clearly than the braggarts

Vassil Nikolov

vippstar

Willem

Keith Thompson

James Dow Allen

Flash Gordon

James Kuyper

Nicolas Neuss

Michael Weber

Richard Fateman

Michael Weber

Richard Tobin

Richard Tobin

Pascal Costanza

Bart

Keith Thompson

Richard Fateman

Alessio Stalla

Slobodan Blazeski

Pascal Costanza

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads