Musings, alternatives to multiple return, named breaks?

E

Eric Sosman

Don't let my use of the word "label" lead you into the idea that it is a
C label of the existing type - it is a "{}" label (something we
currently have no name for). It is probably better considered a name
for the block of code delimited by a pair of brackets. So break(name)
doesn't mean "take me to the C label 'name'" but rather "take me to the
line after the 'name' block of code". (And only from within that block
of code.)

In other words, not "take me THERE" but "get me out of HERE".

I wasn't misled, and I understood where your control transfer
would go (and it's the same place Java's goes). My dislike of the
construct is that `break label;' takes you to a point that may be
distant from where 'label: while(c)' is, to a point that is *not*
labelled or noted or marked or distinguished in any visible way.

"The ships hung in the sky in much the same way that
bricks don't."

"He had found a Nutri-Matic machine which had provided
him with a plastic cup filled with a liquid that was
almost, but not quite, entirely unlike tea."

Here we have statements in the form of descriptions, but
lacking useful descriptive power. "Go where the label isn't"
seems similar to me. YMMV.
 
M

mathog

glen said:
That probably includes a comment near the end, indicating
that there are other returns.

Amen to that!

I recently spent far too long trying to find a bug that passed execution
through a very long C++ method (not my code) where for mysterious
reasons the break point at the return was not being reached in some
cases. You guessed it, there were a couple of return statements buried
in the code several screens up.

This was much less obvious than it sounds because the method was making
recursive calls to itself (indirectly, through a short series of
intervening functions and a signal or two).

That missing comment would have saved me a lot of time.

Regards,

David Mathog
 
G

glen herrmannsfeldt

(snip, I wrote)
Amen to that!
I recently spent far too long trying to find a bug that passed execution
through a very long C++ method (not my code) where for mysterious
reasons the break point at the return was not being reached in some
cases. You guessed it, there were a couple of return statements buried
in the code several screens up.

(snip)

I have written some very small recursive tree processing programs,
which often include multiple returns. But small enough that it
would be hard to miss. (Might be one loop with an if and return,
followed by a return.)

If a break would do the same thing, I would probably use that instead
of return, but often enough it doesn't.

-- glen
 
S

Stefan Ram

mathog said:
I recently spent far too long trying to find a bug that passed execution
through a very long C++ method (not my code) where for mysterious

That's why I wrote in my first post of the thread that
multiple returns are not a problem /when the method is
short/.

When a method is not short, we often can refactor.

However, this is not about breaking a method into smaller
methods at any cost. The method has to be broken into
/meaningful/ units. And meaning we can not get from code
parts like »something«.
 
G

glen herrmannsfeldt

(snip)
Related to this simple case, the only reason the while(1) construct is
needed, with its accompanying terminal "break", is because C does not
support this:

int function(}{
int status = 0;
{ /* start a block of code */
if(something)break;
if(something_else)break;
// many more cases to handle, OK?
status = 1;
} /* end a block of code */
return(status);
}

As someone else noted, do { ... } while(0); is commonly used for
this case, probably more often then for what it is supposed to
be used for, actually testing at the end.

It is also used with preprocessor macros that need to look like
a single statement.

-- glen
 
I

Ian Collins

mathog said:
Related to this simple case, the only reason the while(1) construct is
needed, with its accompanying terminal "break", is because C does not
support this:

int function(}{
int status = 0;
{ /* start a block of code */
if(something)break;
if(something_else)break;
// many more cases to handle, OK?
status = 1;
} /* end a block of code */
return(status);
}

When a break is used within a delimiting set of brackets like that there
is compile time error like:

break statement not within loop or switch

That limitation results because this needs to do what it currently does:

while(1){
if(test){
break; //from outer switch or loop
}
}

Because in C {} delimited code blocks are not nameable, it was
impossible to specify with a break the identity of the block to exit.
So other rules were employed to determine "break from what", and one
corollary was that some pairs of brackets could not be exited with a break.

Well in one way they are; seeing cases like this is a strong hint that
the code in the block should be extracted into a (named!) function.
 
I

Ian Collins

mathog said:
Amen to that!

I recently spent far too long trying to find a bug that passed execution
through a very long C++ method (not my code) where for mysterious
reasons the break point at the return was not being reached in some
cases. You guessed it, there were a couple of return statements buried
in the code several screens up.

It's unfortunate that people write code like that. I tend to see it
more often in poor C++ than in C code mainly because the writer knows
that C++ has and therefore abuses the cleanup mechanisms to safely
handle early returns.

A reasonable debugger should be able to break on the closing brace of a
function which will catch any return.
 
B

BartC

On thinking about this a bit further, the if/else if/else case is
interesting in terms of block naming, it would not be:

if(test1){:block1:
}:block1:
else if(test2){:block2:
}:block2:

but rather

if(test1){:block_for_whole:
}
else if(test2){
}:block_for_whole:

since a break from "block1" in the first example goes where? Certainly
not to the "else" line after }:block1:, but rather to the one
after }:block2:. Similarly, a break from any other part of that construct
would go to the same place. This indicates that the entire if/else
if/else block should have one name for the whole code block. Similarly,
if the if/else if/else block was not explicitly named, then a break would
still go to the line after the entire block.

C doesn't have if/else-if.../else blocks. It just have if/else. I think this
can cause confusion, as sometimes a nested if-statement is used to form a
linear if/else-if chain, and sometimes it is meant to be nested (and you
want a multi-level break).

Further confusion is caused by the fact that a break statement will usually
be conditional, and therefore inside an if statement itself! So in:

if (test) break;

Does this refer to *this* if-statement, or an outer one; what about; if
(test) {break;} or if (test) {a; break; b;} ?

(And are the branches of an if-statement always blocks, or are the braces
needed?)

Also, sometimes you have blocks within blocks for (because they form another
nested statement, or to create a local scope). Then you have so many blocks
that it's no longer meaningful to talk about exiting a block. And if you
have to start labeling arbitrary { or } braces, then you can end with
something as undisciplined as a goto!

So I don't think it's a good idea to be able to break out of an 'if'
statements, because there are so many of them; some branches will have {}
blocks, and some won't; they can have a complex structure which can be
regarded as a single statement, or several nested ones; and the break
statement itself is likely to be wrapped in a conditional statement of its
own to further confuse things.
 
J

Jens Thoms Toerring

I think it's also OK with extra returns at the start of a function, for
quick exit. So you might have something like this:
int sortOfSquareRoot(int x) {
if (x < 0) return 0;
... big lump of code ...
return y;
}
Have an early return saves a layer of bracketing and indentation, and
should be quite clear in such cases.
But multiple returns in the middle of a function are often hard to
trace, and that's never a good idea.

I've got to admit that I'm a big fan of multiple returns when
it comes to error handling. A lot of stuff I write starts with
checking the function arguments, and if one if them isn't kosher
I return immediately with an error value. And even later, when
something doesn't add up and there's nothing within the function
that can be done I have no qualms returning at that point with
a return value to indicate that something went wrong.

Typical example is a function that's supposed to send some command
to a device. If the input to the function is goofy I bail out
immediately. If I find that the state of the device isn't com-
patible with what the function is supposed to do with the device
I bail out, immediately. If there's a problem that can't be fixed
with communicating with the device I bail out, immediately. If I
would follow the mantra of "only one return" such a function would
be a complete mess - too many layers of "if" and "else".

Other languages have exceptions. I'd love to have them in C, to
be honest. And throwing an exception is, basically, a return on
steroids. Why have I never seen a similar criticism of exceptions
but lots of complaints about multiple returns as if they would be
something the devil invented to make live even more miserable?

My impression is that this "no multiple returns" was originally
about practices where functions did lots of different things that
shouldn't have been done in a single function and that this has
morphed into a mantra that isn't allowed to be questioned any-
more. Things like what the OP (in princip;e) proposed like

int fail = OK;
while ( 1 ) {
if ( fail = do_something_1( ) )
break;
if ( fail = do_something_2( ) )
break;
...
}
return fail

etc. isn't any different from multiple returns, just harder to
read. It follow the dogma of "no multiple returns" by the letters
but not by the spirit.

My take on it is to return immediately whenever there's something
going wrong, but not to mis-use multiple returns to write functions
that do several, unrelated things. Multiple returns aren't a pro-
blem per se but only when used for the wrong reasons (just like
goto's).
Regards, Jens
 
S

Stefan Ram

be honest. And throwing an exception is, basically, a return on
steroids. Why have I never seen a similar criticism of exceptions

C has its return on steroids: It's called »longjmp«. But C
has no RAII or »finally«, so the use of »longjmp« is limited.
but lots of complaints about multiple returns as if they would be
something the devil invented to make live even more miserable?

When you want to formally prove assertions about code,
it is easier with stuctured programming.
int fail = OK;
while ( 1 ) {
if ( fail = do_something_1( ) )
break;
if ( fail = do_something_2( ) )
break;
...
}
return fail

This can be written avoiding both breaks and returns:

int f()
{ int fail = 0;
if( fail = do_something_1() );
else if( fail = do_something_2() );
else { ... }
return fail; }

, and this will also exit immediately when »do_something_1()«
is nonzero.
 
K

Keith Thompson

Eric Sosman said:
Java has something a bit like this. You can stick a label
on a loop (the loop constructs are much the same as C's), and
inside the loop you can `break label;' (or `continue label;').
It's occasionally (but only occasionally) helpful, when an inner
loop wants to break out of (or continue) a containing loop. Your
technique would work, but it's hard to imagine it surviving a
thoughtful code review.

(Unfortunately, Java's syntax shares a drawback with the one
you propose: The label is *here*, and `break label;' transfers
control to *there*, possibly far away. Ugh.)
[...]

I haven't done much with Java, but I've used languages that have a
similar feature: loops can be labeled, and a break or continue (or
equivalent) can refer to the label.

The idea is that the label isn't just a location to which you can branch
(you might as well use a goto for that); it's the *name of the loop*.

For example, using a C-like syntax:

PROCESS_ROWS:
for (row = 0; row < MAX_ROW; row ++) {
PROCESS_COLUMNS:
for (col = 0; col < MAX_COL; col ++) {
if (done_processing_columns) {
break PROCESS_COLUMNS;
}
if (done_with_all_rows) {
break PROCESS_ROWS;
}
}
}

With the syntax I've presented, a loop name has the same syntax
as a goto label, which could cause confusion (though Perl does the
same thing and I haven't known it to be a problem, perhaps because
gotos are rare in Perl). Ada uses distinct syntax for loop names
(and block names) vs. goto labels.

On the other hand, in Perl, the "break" and "continue" statements
are spelled "last" and "next", and the label name tends to refer
to what's processed by one iteration of the loop. In the example
above, the outer loop would probably be called "ROW", and the "break
PROCESS_ROWS;" would be "last ROW;"; skipping to the next row is
written "next ROW;". Not that changing C's keywords is an option,
of course.

If this were to be added to C, you'd have to define exactly how a
name is associated with a loop. I suppose it could just be defined
so that if a labeled statement is a loop, then a break or continue
can use the label name. And you could do the same thing for labeled
switch statements.

The counterargument, I suppose, is that if a function is complex
enough that loop names are helpful, it should probably be decomposed
into simpler functions anyway. Still, I've found labeled breaks useful
enough in other languages that I'd like to see them in C.
 
K

Keith Thompson

BartC said:
I don't think it's good when attempts to avoid goto lead to more convoluted
code. Why not just embrace it, but make the goto a little less obvious; this
defines a new 'statement' to jump to a common return point:

#define break_end goto end

int function(void){
int status = 0;
if (something) break_end;
if (something_else) break_end;
status = 1;
end:
return(status);
}

You can probably dress up multiple-level loop breaks in the same way.

I don't mind using a goto to jump to error-handling code at the end of a
function, but I really don't think wrapping "goto end;" in a "break_end"
macro serves any good purpose. Just drop the macro definition and write
"goto end;".
 
J

Jens Thoms Toerring

C has its return on steroids: It's called »longjmp«. But C
has no RAII or »finally«, so the use of »longjmp« is limited.

I know - I have my own implementation of "exceptions" for some
larger project, based on longjmp(), but it has, due to the con-
straints of C, several problems (and I'm still not 100% sure
that it is hasn't any chance of involving undefined behaviour -
no problems yet after about 10 years, but there's still a lurking
feeling of it may break somewehre in the future on an unanticipa-
tated new platform;-)
When you want to formally prove assertions about code,
it is easier with stuctured programming.

Ok, that may be a problem - but how many programs ever get this
treatment?
This can be written avoiding both breaks and returns:
int f()
{ int fail = 0;
if( fail = do_something_1() );
else if( fail = do_something_2() );
else { ... }
return fail; }
, and this will also exit immediately when »do_something_1()«
is nonzero.

This was in the "UNIX spirit" of returning 0 on errors. But the
real point is that you often have to do a bit of computation be-
tween the "if" and the next "if else". You could put that into
another function, definitely. But then, there's an argument for
splitting up everything into the smallest possible functions, but
also and against it that too many functions not in immediate view
an become a problem by itself. If I'd have to do e.g. three lines
of code between the first "if" and the "else if", that are perti-
nent to the task at hand, I'd tend to do them there in place
instead of creating a new function for them that may be out of
sight.

I completely agree that some rules are warranted in principle - all
I wanted to express was that making a religion out of them is detri-
mental. My point is to keep them in mind but with an understanding
of why they were made up, and, based on that knowledge, to decide
when to adhere to them and when it's time to break 'em.

Best regards, Jens
 
E

Eric Sosman

Eric Sosman said:
[...]
(Unfortunately, Java's syntax shares a drawback with the one
you propose: The label is *here*, and `break label;' transfers
control to *there*, possibly far away. Ugh.)
[...]

I haven't done much with Java, but I've used languages that have a
similar feature: loops can be labeled, and a break or continue (or
equivalent) can refer to the label.

The idea is that the label isn't just a location to which you can branch
(you might as well use a goto for that); it's the *name of the loop*.

I propose an economization for traffic signals. As things
now stand, every intersection needs its own set of lights to
say GO or STOP. Installing and wiring up one set of lights
may not be all that terribly expensive in the grand scheme of
things, but the scheme is in fact rather grand: Many thousands
of intersections and many thousands of lights, even in a fairly
small city.

So, my suggestion is to eliminate two-thirds of the lights
and reduce the total cost by maybe half. Instead of giving each
intersection its own set of lights, we'll give every third
intersection a fancier set of lights with three indications:
RED/GREEN for the local intersection, ORANGE/AQUA for the next,
suitably time-delayed, BLACK/CYAN for the one after that (the
color scheme is only an illustration; I'll leave the actual hue
choice to the perceptual psychologists). In other words, each
intersection has a color pair as its "name," so a light HERE can
unambiguously refer to an intersection THERE.

There's the suggestion, now for a question: Would you drive in
this town, or would you flee to the Andaman Islands instead? ;-)

Okay, okay, mountains, molehills. Whatever the merits of a
label that means "not here, elsewhere" may be, I strongly doubt
you'll ever convince me to *like* them.

[*] Keith Thompson
 
G

glen herrmannsfeldt

Which is especially useful with no goto. Java has goto as a reserved
word, but no goto statement using it.
I haven't done much with Java, but I've used languages that have a
similar feature: loops can be labeled, and a break or continue (or
equivalent) can refer to the label.
The idea is that the label isn't just a location to which you can branch
(you might as well use a goto for that); it's the *name of the loop*.

It takes getting used to, and I probably haven't yet, but that is
the way it is supposed to work.
For example, using a C-like syntax:
PROCESS_ROWS:
for (row = 0; row < MAX_ROW; row ++) {
PROCESS_COLUMNS:
for (col = 0; col < MAX_COL; col ++) {
if (done_processing_columns) {
break PROCESS_COLUMNS;
}
if (done_with_all_rows) {
break PROCESS_ROWS;
}
}
}

Seems that Java tried to keep as much C syntax as they could,
more than C++, so that should be right for Java.
With the syntax I've presented, a loop name has the same syntax
as a goto label, which could cause confusion (though Perl does the
same thing and I haven't known it to be a problem, perhaps because
gotos are rare in Perl). Ada uses distinct syntax for loop names
(and block names) vs. goto labels.
On the other hand, in Perl, the "break" and "continue" statements
are spelled "last" and "next", and the label name tends to refer
to what's processed by one iteration of the loop. In the example
above, the outer loop would probably be called "ROW", and the "break
PROCESS_ROWS;" would be "last ROW;"; skipping to the next row is
written "next ROW;". Not that changing C's keywords is an option,
of course.

MORTRAN2, from about 1975, has NEXT and EXIT.

MORTRAN processors are written as macro processors a little fancier
than cpp. Among others, macros can, when expanded, define other macros.
I believe that they can also extend (append to) the value of an already
defined macro. That, and an output routine that formats into fixed-form
Fortran is about all that it needs, other than the macros.

Oh, and in:

http://www.slac.stanford.edu/cgi-wrap/getdoc/slac-pub-1527.pdf

they also use ROW and COLUMN for their sample loops.
If this were to be added to C, you'd have to define exactly how a
name is associated with a loop. I suppose it could just be defined
so that if a labeled statement is a loop, then a break or continue
can use the label name. And you could do the same thing for labeled
switch statements.
The counterargument, I suppose, is that if a function is complex
enough that loop names are helpful, it should probably be decomposed
into simpler functions anyway. Still, I've found labeled breaks useful
enough in other languages that I'd like to see them in C.

Well, they don't have to be all that big. In processing through a 2D
matrix, it isn't hard to need break or continue on the row and column
level, even with very small loops.

Otherwise, another possibility is a form of break or continue that would
break or continue a specified number of nesting levels out. Not so good
if you add nesting levels later, though.

-- glen
 
L

Les Cargill

James said:
The biggest problem is that it doesn't avoid the real problem of code
with multiple returns: tracing the control flow. It is precisely as
difficult to track down "return" statements at it is to track down
"break;" statements. If the breaks are acceptable, replacing them with
returns should be equally acceptable.


The point of early return/break is to make explicit invariants that if
violated, preclude the main thing the function is for.

....

// prevent division overflow...
if (denom > epsi) ;
else if (denom < -epsi);
else
return RANGERR;

num = numerator / denom ;

....

This should *clarify* control flow, not obfuscate it.
 
L

Les Cargill

Stefan said:
C has its return on steroids: It's called »longjmp«. But C
has no RAII or »finally«, so the use of »longjmp« is limited.


When you want to formally prove assertions about code,
it is easier with stuctured programming.


This can be written avoiding both breaks and returns:

int f()
{ int fail = 0;
if( fail = do_something_1() );
else if( fail = do_something_2() );
else { ... }
return fail; }

, and this will also exit immediately when »do_something_1()«
is nonzero.


I'd have to agree that this is slightly cleaner.
 
L

Les Cargill

Ian said:
It's unfortunate that people write code like that. I tend to see it
more often in poor C++ than in C code mainly because the writer knows
that C++ has and therefore abuses the cleanup mechanisms to safely
handle early returns.

A reasonable debugger should be able to break on the closing brace of a
function which will catch any return.


This will fail with legendary frequency. :)

However, if you're good with mixed listings, you can always
make it work.
 
L

Les Cargill

Keith said:
I don't mind using a goto to jump to error-handling code at the end of a
function, but I really don't think wrapping "goto end;" in a "break_end"
macro serves any good purpose.

It's a bit of "loincloth" syntax.
 
I

Ian Collins

Les said:
This will fail with legendary frequency. :)

However, if you're good with mixed listings, you can always
make it work.

Eh? I've never known it not to work.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,772
Messages
2,569,591
Members
45,103
Latest member
VinaykumarnNevatia
Top