embarrassing spaghetti code needs stylistic advice

L

luserXtrog

luser-ex-troll said:
On Wed, 25 Mar 2009 11:55:07 -0700 (PDT), luser-ex-troll
Traced, debugged, sieved, and splinted; is it stylish yet?
snip
/* ^[+-]?(\d+(\.\d*)?)|(\d*\.\d+)([eE][+-]?\d+)?$ */
test fsm_real[] = {
/* 0*/ { issign,   1,  1 },
/* 1*/ { isdigit,  2,  4 }, /* [+-]? */
/* 2*/ { isdigit,  2,  3 }, /* [+-]?\d\d* yes! */
/* 3*/ { isdot,    6,  7 }, /* [+-]?\d\d*[^\d] */
/* 4*/ { isdot,    5, -1 }, /* [+-]?[^\d] */
/* 5*/ { isdigit,  6, -1 }, /* [+-]?\. */
/* 6*/ { isdigit,  6,  7 }, /* [+-]?(\d\d*)?\.\d* yes! */
/* 7*/ { ise,      8, -1 }, /* [+-]?(\d\d*)?(\.\d*)? */
/* 8*/ { issign,   9,  9 }, /* [+-]?(\d\d*)?(\.\d*)?[eE] */
/* 9*/ { isdigit, 10, -1 }, /* [+-]?(\d\d*)?(\.\d*)?[eE][+-]? */
/*10*/ { isdigit, 10, -1 }, /* [+-]?(\d\d*)?(\.\d*)?[eE][+-]?\d\d*
yes! */
Just out of curiosity, I wonder why 2e7 is not valid?
I wonder why you wonder.
I'm getting correct recognition of 2e7. Perhaps there was a transient
error in one of the intermediate versions, but the current one accepts
2e7 just fine; and the machine quoted here appears to be the correct
one. Let's trace it to make sure.

I see you have correctly ignored my remark about string such as e2,
.e2 etc!  I thought you might be getting these wrong based on the
later comments.  I would correct these to reflect what has been seen
by the time the machine gets into that particular state.  The last
line suggest that .e2 will be taken as a real but, presumably, it
won't be because of previous matches to get into state 10.

Also, I don't think your top comment is correct or at least it is a
little misleading:

 /* ^[+-]?(\d+(\.\d*)?)|(\d*\.\d+)([eE][+-]?\d+)?$ */

suggests that '+2' will be taken as a real when it is not one (at not
least formally).

Thanks for the attention. I had some trouble coming up with an
expression that would guarantee at least one digit before or after the
decimal point (so .e1 would be rejected. I went around in circles
with: write an expression, draw the directed graph, fix expression,
fix graph, new expression, newfangled graph. When I came to a graph
that seemed to work, I just copied the expression that led me there.

The RE for reals that you posted is superior to mine. At some point I
plan to clean up the comments to better reflect the accumulated
knowledge that each state represents; but for the moment, I'm focused
on expanding the code to process the remaining syntactic entities so
it can be reincorporated into the larger project.

I wonder if I can modify the machine to count balanced parentheses in
strings?
 
L

luserXtrog

I don't want to pass all code through indent before reading it.  For
one thing, depending on the environment, that's not always even an
option; for another, indent can change the layout in ways that
shouldn't affect what the compiler sees, but can adversely affect what
a human reader sees.

Different programmers use different brace placement, among other
things.  I don't want to change that; for example, I might need to
conform to the existing style when I make changes and check them in.

And if there's a stray semicolon:
    while (condition);
        do_something;
indent will quietly change it to:
    while (condition);
    do_something;
making it harder to figure out what was actually intended.  (That's
admittedly an unusual case.)

What I frequently do is run code through expand (a Unix tool that
replaces each tab with the right number of spaces) -- but figuring out
what options to pass to expand can be non-trivial.  For many years, I
only saw code written with the assumption that tabstops are set every
8 columns, because that's the default setting on the Unix systems I
used.  Now I often see code that looks incorrect unless I set tabstops
to 4 columns, or sometimes 2.  There's no explicit indication of what
the tabstop setting should be; I just have to play with it until the
code looks right.

Vim has a :retab command which could help with this. To update a file
written with 4-column tabs to use 8-column tabs (& retain original
appearance), you could do:
:%s/^I/ /g
:%retab
I only recently saw the light and did this to all my files. Of course
this method will clobber any tabs which may be present for other
purposes (ie. in string literals), so may not be appropriate for code
acquired from elsewhere. In fact, one of the programs I've posted here
needlessly duplicated the standard isspace() function; it contained a
tab in a string literal that perhaps should be preserved during such a
conversion. I recommend investigation of the possibility of doing this
within your editor; be careful.
I'd be interested in seeing such an argument.

As a former partisan for that side, I can give you what I thought was
an argument for it: It's easier. Of course, not being true (or not for
long, anyway), it's a pretty lousy argument.
Perhaps -- except that I'm right and everyone who disagrees with me is
wrong.  :cool:}

Back in the old days, I did use tabs for indentation -- but not
necessarily one tab per level.  My usual style was to indent 3 columns
per level (I now use 4), but with tabstops set to 8 columns.  So the
beginning of a deeply indented line might have one or more tabs
followed by one or more spaces.  As long as everyone reading or
editing the code had their tabstops set to 8 columns, that was ok (and
it saved a little disk space).  Later people seem to have gotten the
idea that each indentation level must be represented by a single tab
character, with the tabstop settings adjusted as necessary.  But until
there's universal agreement on tabstop settings, there's always the
risk that code will be formatted inconsistently.  I've found that the
best way to avoid inconsistency is to use spaces exclusively.

Your editor should have settings to do this transparently; so pressing
the tab key inserts 4 spaces, but a second press replaces those with a
TAB character (which is always 8 by this method).

for vim, it's:
:set softtabstop=4
and these are nice with it (tangentially related):
:set shiftwidth=4
:set autoindent
:set smartindent
(Except in Makefiles, which require tab characters -- sigh.)

Yes, they really look weird when you're used to 4 columns.
 
L

luserXtrog

Bad idea.  If you are having problems with code, the only way we can
help is if you post the exact code (preferably using cut and paste).
While adding new lines is probably low risk, it is not risk free.

snip


Decide where a visual break will introduce the least problem for the
reader.  For example, the above if could be

  if (check(buf,decimal,dec_accept))
    {printf( "dec: %s\n", buf);
      return 0;
     }

I'm starting to appreciate this one. Two-space indents, no space after
initial bracketing, and the closing curly balances the opening one on
two axes. The condition is tight and the payload is loose.
Would padding out the closing curly allow this to be compressed thus?:

if (check(buf,decimal,dec_accept)) {
printf( "dec: %d\n", (int)strtol(buf,NULL,10));
return 0; }

Or is that sheer madness?
 
I

Ian Collins

luserXtrog said:
I'm starting to appreciate this one. Two-space indents, no space after
initial bracketing, and the closing curly balances the opening one on
two axes. The condition is tight and the payload is loose.
Would padding out the closing curly allow this to be compressed thus?:

if (check(buf,decimal,dec_accept)) {
printf( "dec: %d\n", (int)strtol(buf,NULL,10));
return 0; }

Or is that sheer madness?

Are you really trying to make your code as illegible as possible?
What's wrong with

if (check(buf,decimal,dec_accept))
{
printf( "dec: %s\n", buf);
return 0;
}

?
 
L

luserXtrog

"Richard" <[email protected]> ha scritto nel messaggio


i not use while() much these times


How horrendous. The first thing I looked for seeing that was a higher
level loop.
Hint : "while(--x);" executes no consitional statement so include no
conditional statement.
If a C programmer can not see what
while(x--);
does then he has no business in the code and will certainly screw up at

the above should be the same of "x=-1;" right??
( should be the same of w[x++]=a but with x--)
x=3, x--=(2,3)(1,2)(0,-1)

how many of you can do this without one compiler or book?

x--, x++ , ++x, ++x

it is all over complicated "++x" is enoughf
while(*d++=*s++);

this is over complex too.
     "while(*r++=*s++);" is not more easy than

.0:  a=*s|++s|*r=a|++r|a#.0

or in C
W:  a=*s;++s;*r=a;++r; if(a) goto W;

It's valid, but fails my subjective test for "pretty".
but no one would believe in that
and in the facts that a multiple instructions for line are ok
etc etc

I have no opposition to multiple statements on a line, in fact one of
my goals is the maximum power per line (another subjective measure,
but I have a very small screen!). But the primary purpose of this
thread was to replace gotos with more appropriate structures (where
appropriate, iff appropriate). I think the unnecessary temporary
variable sinks this one.

But it appears this was motivated by a difficulty with (pre-/post-)fix
(in-/de)crement operations, so how about a little refresher.

The compiler (in a broad sense) discovers prefix operators first.
So it does them first. Then it finds the variable, so it resolves that
level of the expression to the value of that variable.
You just have to correlate the direction of time with the direction
the text reads and it should make sense naturally (they both go left
to right).

++ x ;
increment value do it
x ++ ;
value gonna get incremented now

* s ++ /*...*/
the thing at which value (points) (and increment value later)...
= * /*...*/
assumes the value of the thing at which ...
buf ++ /*...*/
value (points) (and increment value later)...
;
it's later. all increments shall have happened.
 
B

Ben Bacarisse

luserXtrog said:
On Mar 27, 10:55 am, Ben Bacarisse <[email protected]> wrote:
I see you have correctly ignored my remark about string such as e2,
.e2 etc!  I thought you might be getting these wrong based on the
later comments.  I would correct these to reflect what has been seen
by the time the machine gets into that particular state.  The last
line suggest that .e2 will be taken as a real but, presumably, it
won't be because of previous matches to get into state 10.

Also, I don't think your top comment is correct or at least it is a
little misleading:

 /* ^[+-]?(\d+(\.\d*)?)|(\d*\.\d+)([eE][+-]?\d+)?$ */

suggests that '+2' will be taken as a real when it is not one (at not
least formally).

Thanks for the attention.

Well done for understanding it. I just read through it again (I do
before posting, I really do) and the typo rate is so high it is almost
nonsense.

I wonder if I can modify the machine to count balanced parentheses in
strings?

It would not be hard I think. Obviously you can not do it with a
plain FSM, but the FSM driver (czek) could keep a counter. Each state
structure would need an extra +1, -1 or 0 to get added to the counter
when a "yes" transition occurs.

You could be a little more general and pass some sort of "match state"
structure to the checking functions. You could keep track of pretty
much anything that way but the extra generality might not be worth the
extra complexity.

In fact, a lot of lexical objects that have matching parenthesis are
so simple it may not be worth writing it into these FSM function. I
have, on occasion, just written a "collect_until_matching(']');"
function to do this sort of thing.
 
L

luserXtrog

Are you really trying to make your code as illegible as possible?
What's wrong with

if (check(buf,decimal,dec_accept))
{
   printf( "dec: %s\n", buf);
   return 0;

}

?

Fluff. Is a curly brace of such immense significance that it deserves
an entire line to itself? For my purposes, the answer is no. Within
the context of that decision, where does it most sensibly belong? For
a larger block, I would keep it on the prior line. For a one line
block, the most natural place (to me) is surrounding the line on the
same line. One learns to balance parentheses in this manner, the skill
should be easily transposable to curlies.

But the question I asked was intended to indicate that I wasn't
entirely convinced, but from the analysis of conventions, it appeared
to be of equivalent utility.

I'm programming on a dynabook, gimme a break! With 4.5 inches,
vertical space is my most precious commodity (I can rotate the screen,
but it's hard to type lying on one side.
 
L

luserXtrog

luserXtrog said:
On Mar 27, 10:55 am, Ben Bacarisse <[email protected]> wrote:
I see you have correctly ignored my remark about string such as e2,
.e2 etc!  I thought you might be getting these wrong based on the
later comments.  I would correct these to reflect what has been seen
by the time the machine gets into that particular state.  The last
line suggest that .e2 will be taken as a real but, presumably, it
won't be because of previous matches to get into state 10.
Also, I don't think your top comment is correct or at least it is a
little misleading:
 /* ^[+-]?(\d+(\.\d*)?)|(\d*\.\d+)([eE][+-]?\d+)?$ */
suggests that '+2' will be taken as a real when it is not one (at not
least formally).
Thanks for the attention.

Well done for understanding it.  I just read through it again (I do
before posting, I really do) and the typo rate is so high it is almost
nonsense.

I wonder if I can modify the machine to count balanced parentheses in
strings?

It would not be hard I think.  Obviously you can not do it with a
plain FSM, but the FSM driver (czek) could keep a counter.  Each state
structure would need an extra +1, -1 or 0 to get added to the counter
when a "yes" transition occurs.

You could be a little more general and pass some sort of "match state"
structure to the checking functions.  You could keep track of pretty
much anything that way but the extra generality might not be worth the
extra complexity.

In fact, a lot of lexical objects that have matching parenthesis are
so simple it may not be worth writing it into these FSM function.  I
have, on occasion, just written a "collect_until_matching(']');"
function to do this sort of thing.

That's pretty much what I ended up doing for procedures in the
spaghetti version.

A nice thing I noticed while going over the (new) code again is that
if the buffer does not contain a number, it either contains an
executable name (bare word) or a single delimiter character.

I'm tempted to switch into functions that just loop and read until the
end. But after all this trauma; I'm hestitating to any anything more
until a really elegant solution occurs.

Of course to everyone else it appears that I'm wasting time, having
fun. In reality, I've passed the question over to my unconscious and
am waiting for hints to its answer.

For procedures, a simple recursive call until '}'.

For strings it would also need to handle all the escape-sequences in a
concise and efficient manner.
I should investigate what library functions are available for parts of
the task and stich those together. That seems to be working so far.
 
K

Keith Thompson

luserXtrog said:
Vim has a :retab command which could help with this. To update a file
written with 4-column tabs to use 8-column tabs (& retain original
appearance), you could do:
:%s/^I/ /g
:%retab

Two problems. First, I have to know that the file assumes 4-column
tabs. Second, rather than expanding tabs to spaces, it replaces them
with fewer tabs, which is not an improvement.

Once I know I'm dealing with 4-column tabs, I run the entire buffer
through "expand -t 4".
I only recently saw the light and did this to all my files. Of course
this method will clobber any tabs which may be present for other
purposes (ie. in string literals), so may not be appropriate for code
acquired from elsewhere. In fact, one of the programs I've posted here
needlessly duplicated the standard isspace() function; it contained a
tab in a string literal that perhaps should be preserved during such a
conversion. I recommend investigation of the possibility of doing this
within your editor; be careful.

Why would you use tabs in string literals? Just use "\t".

[...]
Your editor should have settings to do this transparently; so pressing
the tab key inserts 4 spaces, but a second press replaces those with a
TAB character (which is always 8 by this method).

Well, I use control-T to increase indentation by 1 level, and
control-D to decrease by 1 level. And with shiftwidth=4 and expandtab
settings, I don't get a literal tab character unless I ask for it
explicitly with a control-V prefix.

[...]

This is getting a bit off-topic because we're discussing tools rather
than the language, but we're also discussing the form of C source
files.
 
L

luserXtrog

Two problems.  First, I have to know that the file assumes 4-column
tabs.  Second, rather than expanding tabs to spaces, it replaces them
with fewer tabs, which is not an improvement.

Once I know I'm dealing with 4-column tabs, I run the entire buffer
through "expand -t 4".


Why would you use tabs in string literals?  Just use "\t".

You're right, of course. Momentary stupidity.
[...]
Your editor should have settings to do this transparently; so pressing
the tab key inserts 4 spaces, but a second press replaces those with a
TAB character (which is always 8 by this method).

Well, I use control-T to increase indentation by 1 level, and
control-D to decrease by 1 level.  And with shiftwidth=4 and expandtab
settings, I don't get a literal tab character unless I ask for it
explicitly with a control-V prefix.

I didn't know about expandtab. Excellent. I like said:
[...]

This is getting a bit off-topic because we're discussing tools rather
than the language, but we're also discussing the form of C source
files.

Agreed. But useful nevertheless.
 
C

CBFalconer

Ben said:
These would make no difference.


No, you have missed the point of the example. It was to show how
confusing bad indentation can be, particularly when there is a tiny,
but significant, ';' on the while line.

Yes, I missed that isolated semi entirely, making my entire
fun-pointing pointless. Bah.
 
J

JosephKK

No, the wise user uses only spaces, never tabs, for indentation.

Which is better, a comment telling every person viewing the file how
to set up the tabs (possibly using different methods depending on
which editor or other tool you're using), or using spaces and thereby
cleanly eliminating the need for either the comment or the setup?

Yes, this is my personal opinion, but it's a very strongly held one.

Well, while i agree, i once came across one programmer whose favorite
editor had the habit of replacing those spaces with tabs. Often when
the source had to be handed to another progarmmer, the tab expansion
defaulted to 8 spaces and crazyness broke loose.
.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,772
Messages
2,569,593
Members
45,111
Latest member
KetoBurn
Top