Favourite Pattern for Processing Zero, Once, Multiple

Shao Miller · Jan 10, 2013

Suppose you have a function 'bar' that returns a value that means one of:
- Do 'foo' zero times
- Do 'foo' one time
- Do 'foo' more than one time

What's your favourite way of handling such a value? For example:

while ((more = bar()) != cv_zero_times) {
/* Do it at least once */
foo();

/* Do it more than once? */
if (more != cv_more_times)
break;
}

- Shao Miller

BartC · Jan 10, 2013

Shao Miller said:
Suppose you have a function 'bar' that returns a value that means one of:
- Do 'foo' zero times
- Do 'foo' one time
- Do 'foo' more than one time

What's your favourite way of handling such a value?

Obvious return values would be (0, 1, 2), or (0, N) (or just N) if the
actual number of times is being specified.

Shao Miller · Jan 10, 2013

Obvious return values would be (0, 1, 2), or (0, N) (or just N) if the
actual number of times is being specified.

Agreed.

And what might be the pattern, in C, that you might use in order to
handle the value and to call 'foo' the right number of times? As
another example:

while (1) {
switch (bar()) {
case cv_zero_times:
/* Exit switch */
break;

case cv_more_times:
foo();
/* Continue loop */
continue;

default:
/* One time */
foo();
}
/* Exit loop */
break;
}

- Shao Miller

BartC · Jan 11, 2013

Shao Miller said:
On 1/10/2013 18:26, BartC wrote:

Agreed.

And what might be the pattern, in C, that you might use in order to handle
the value and to call 'foo' the right number of times? As another
example:

while (1) {
switch (bar()) {

OK, you want to know what is actually done with such a value.

I like using switch when decoding multiple values of the same expression. So
your second example is better, IMO, when your bar() function returns one of
three possibilities.

Ben Bacarisse · Jan 11, 2013

Shao Miller said:
Suppose you have a function 'bar' that returns a value that means one of:
- Do 'foo' zero times
- Do 'foo' one time
- Do 'foo' more than one time

What's your favourite way of handling such a value? For example:

while ((more = bar()) != cv_zero_times) {
/* Do it at least once */
foo();

/* Do it more than once? */
if (more != cv_more_times)
break;
}

I'd re-think the whole thing. It's confusing to me since a return of
"more than one time" does not seem to me any such thing (unless you've
got your loop wrong). It seems to mean "do it once and call bar again"
which is messy because it implies an awkward linkage between bar and
foo.

Does this crop up for real? I'd try to find a cleaner way to write the
whole bar() + foo() interaction.

Shao Miller · Jan 11, 2013

I'd re-think the whole thing. It's confusing to me since a return of
"more than one time" does not seem to me any such thing (unless you've
got your loop wrong). It seems to mean "do it once and call bar again"
which is messy because it implies an awkward linkage between bar and
foo.

Does this crop up for real? I'd try to find a cleaner way to write the
whole bar() + foo() interaction.

Thanks, BartC, for your response (I sent you an e-mail, but "no such user").

Thanks, Ben, for your response. Yes, the linkage is awkward!

Yes, 'foo' was just an example of a "do-something." The actual problem
that got me thinking about this is a bit boring, but I wondered if
anyone might have a pattern to suggest.

Consider 'bar' as translating a sequence of characters, one at a time.
The caller uses it for translation, but it's the caller's responsibility
to do something useful with the translated characters.

Sometimes 'bar' will emit no translated characters, but simply track
some state. ("Deal with zero translated characters.")

Sometimes 'bar' will emit a single translated character that's a simple
mapping from the input character. ("Deal with one translated character.")

Sometimes 'bar' will have multiple characters to emit, and the caller
needs to call it multiple times to get all of the characters. ("Here's
one translated character, but call me again because I have more.")

Could be useful for escape sequences: "123${special}ABC" where the '$'
through until the 'l' wouldn't emit anything, but then the subsequent
'}' would cause some special sequence to be emitted, where 'bar' would
know what to emit, but would have to be queried repeatedly until it
finished emitting the special sequence.

One way to deal with this would be to pass a call-back and context to
'bar' so that it could do the emitting itself, but that requires the
programmer to create such a call-back function.

Another way is to use EOF-style, and the case of a single character
output means that 'bar' is always called a second time simply to return
end-of-sequence, which thus suffers a bit of a performance drawback for
that case.

So I thought it was a bit of an interesting challenge and I wonder what
strategies folks might've or might come up with, as a general pattern.

Thanks for your time.

Bart van Ingen Schenau · Jan 11, 2013

One way to deal with this would be to pass a call-back and context to
'bar' so that it could do the emitting itself, but that requires the
programmer to create such a call-back function.

This might be a good option, but it depends on how well it fits in the
overall application design.

Another way is to use EOF-style, and the case of a single character
output means that 'bar' is always called a second time simply to return
end-of-sequence, which thus suffers a bit of a performance drawback for
that case.

This would be my first choice, before discriminating between 1 and
multiple values to return.
I would make that discrimination only if actual profiling measurements
have shown that the additional call really is a performance bottleneck.

So I thought it was a bit of an interesting challenge and I wonder what
strategies folks might've or might come up with, as a general pattern.

Thanks for your time.

Bart v Ingen Schenau

BartC · Jan 11, 2013

Shao Miller said:
On 1/10/2013 22:20, Ben Bacarisse wrote:

That's the key I think (see below).

Thanks, BartC, for your response (I sent you an e-mail, but "no such
user").

It was a valid email at one time (Try using bcas instead of bc)

Sometimes 'bar' will have multiple characters to emit, and the caller
needs to call it multiple times to get all of the characters. ("Here's
one translated character, but call me again because I have more.")

Could be useful for escape sequences: "123${special}ABC" where the '$'
through until the 'l' wouldn't emit anything, but then the subsequent '}'
would cause some special sequence to be emitted, where 'bar' would know
what to emit, but would have to be queried repeatedly until it finished
emitting the special sequence.

At first glance this seems perfectly straightforward: bar is just called
repeatedly until the 'EOF' marker you mention is returned. In this case,
bar() returns the next character in the sequence, rather than some code. If
the "123" in your example was consumed with no output, then bar wouldn't
return until there was something to return, in this case the first character
of whatever replaces "${special}".

But it gets more complicated when you have to consider where this input
string comes from. There might be some global state containing the input
stream, the current position in that stream, and the current position in any
translation string. Or that state might be contained inside bar().

Or perhaps the caller might supply bar() with the next input character (and
bar() needs some look-ahead to operate properly). Or a pointer to a struct
containing the state (so that bar() can be called from different places, all
with their own strings to translate).

So I thought it was a bit of an interesting challenge and I wonder what
strategies folks might've or might come up with, as a general pattern.

It's not clear what the roles of bar() and foo() are. Maybe one of those
returns the next translated character, but some more information is needed.
The whole thing seems a bit like a tokeniser, or at least has some of the
same problems to be solved.

Shao Miller · Jan 11, 2013

This might be a good option, but it depends on how well it fits in the
overall application design.

This would be my first choice, before discriminating between 1 and
multiple values to return.
I would make that discrimination only if actual profiling measurements
have shown that the additional call really is a performance bottleneck.

Yeah the call-back strategy has added complexity, but my natural
inclination would be towards this same strategy as you've stated you
prefer, with the same rationale.

Thanks for the feed-back!

Shao Miller · Jan 11, 2013

That's the key I think (see below).

At first glance this seems perfectly straightforward: bar is just called
repeatedly until the 'EOF' marker you mention is returned. In this case,
bar() returns the next character in the sequence, rather than some code. If
the "123" in your example was consumed with no output, then bar wouldn't
return until there was something to return, in this case the first
character
of whatever replaces "${special}".

But it gets more complicated when you have to consider where this input
string comes from. There might be some global state containing the input
stream, the current position in that stream, and the current position in
any
translation string. Or that state might be contained inside bar().

Or perhaps the caller might supply bar() with the next input character (and
bar() needs some look-ahead to operate properly). Or a pointer to a struct
containing the state (so that bar() can be called from different places,
all
with their own strings to translate).

It's not clear what the roles of bar() and foo() are. Maybe one of those
returns the next translated character, but some more information is
needed. The whole thing seems a bit like a tokeniser, or at least has
some of the same problems to be solved.

Yeah I was trying to abstract away the details and ask about a general
pattern (since that's what I'm interested in), but maybe the code
example was not illustrative enough.

For the subsequent translation scenario I tried to describe, the caller
is responsible for getting a single input character 'ic'. However they
do it should be opaque to the translating function, 'bar'. The caller
calls 'bar' to translate 'ic' to '*oc'.

extern return_type bar(state_type state, char ic, char * oc);

'bar' might not emit any output character '*oc' if it's not yet ready.
In that case, the caller shouldn't work with '*oc'; shouldn't send it
wherever output is going.

'bar' might emit '*oc' and have nothing more to emit until processing
the next 'ic' or beyond.

'bar' might emit '*oc' and inform the caller that there are more output
characters pending, so the caller should call 'bar' again until they've
been consumed. In this case, 'ic' is ignored for subsequent calls to
'bar' as 'bar' already knows what sequence it wishes to emit based on
the 'ic' that triggered it.

'foo' was really just a place-holder for "something the caller does with
the results of 'bar'". This might be to put the output characters in a
buffer or to send them to a stream.

I'm interested in general C patterns for this kind of scenario without
getting too distracted by this particular example of "translating
characters". Having said that, maybe working with this example scenario
is a better idea.

Fred K · Jan 11, 2013

Yeah I was trying to abstract away the details and ask about a general

pattern (since that's what I'm interested in), but maybe the code

example was not illustrative enough.

For the subsequent translation scenario I tried to describe, the caller

is responsible for getting a single input character 'ic'. However they

do it should be opaque to the translating function, 'bar'. The caller

calls 'bar' to translate 'ic' to '*oc'.

extern return_type bar(state_type state, char ic, char * oc);

'bar' might not emit any output character '*oc' if it's not yet ready.

In that case, the caller shouldn't work with '*oc'; shouldn't send it

wherever output is going.

'bar' might emit '*oc' and have nothing more to emit until processing

the next 'ic' or beyond.

'bar' might emit '*oc' and inform the caller that there are more output

characters pending, so the caller should call 'bar' again until they've

been consumed. In this case, 'ic' is ignored for subsequent calls to

'bar' as 'bar' already knows what sequence it wishes to emit based on

the 'ic' that triggered it.

Ouch! shades of strtok()
This design is not re-entrant-safe, and definitely not threadsafe.

'foo' was really just a place-holder for "something the caller does with

the results of 'bar'". This might be to put the output characters in a

buffer or to send them to a stream.

I'm interested in general C patterns for this kind of scenario without

getting too distracted by this particular example of "translating

characters". Having said that, maybe working with this example scenario

is a better idea.

I would consider something like this:

bar(state_type state, char ic, char * oc, MyCallback foo);

where bar internally calls foo the requisite number of times.
Perhaps in this scenario the oc parameter is not even needed.

Greg Martin · Jan 11, 2013

Yeah I was trying to abstract away the details and ask about a general
pattern (since that's what I'm interested in), but maybe the code
example was not illustrative enough.

For the subsequent translation scenario I tried to describe, the caller
is responsible for getting a single input character 'ic'. However they
do it should be opaque to the translating function, 'bar'. The caller
calls 'bar' to translate 'ic' to '*oc'.

extern return_type bar(state_type state, char ic, char * oc);

'bar' might not emit any output character '*oc' if it's not yet ready.
In that case, the caller shouldn't work with '*oc'; shouldn't send it
wherever output is going.

'bar' might emit '*oc' and have nothing more to emit until processing
the next 'ic' or beyond.

'bar' might emit '*oc' and inform the caller that there are more output
characters pending, so the caller should call 'bar' again until they've
been consumed. In this case, 'ic' is ignored for subsequent calls to
'bar' as 'bar' already knows what sequence it wishes to emit based on
the 'ic' that triggered it.

'foo' was really just a place-holder for "something the caller does with
the results of 'bar'". This might be to put the output characters in a
buffer or to send them to a stream.

I'm interested in general C patterns for this kind of scenario without
getting too distracted by this particular example of "translating
characters". Having said that, maybe working with this example scenario
is a better idea.

I found it a little difficult to grasp the initial scenario because of
the magical occurrences going on in the functions. I think that to
discuss patterns in any language the example needs to be have enough
detail to represent the problem space. When I looked at your first post
I thought you working on some strange sort of iterator but now it looks
more like a emitter/observer type problem. The former I'd solve in C
with an object passed as a parameter that maintains state information
whereas the latter registering callbacks for each of the states might be
easier to work with.

It sounds like an asynchronous problem and callbacks tend to work well
for them. You don't actually inform the caller though state variables
but through calling the functions they've provided.

Shao Miller · Jan 11, 2013

Ouch! shades of strtok()
This design is not re-entrant-safe, and definitely not threadsafe.

I meant for that to be in the 'state' parameter, shown above.

I would consider something like this:

bar(state_type state, char ic, char * oc, MyCallback foo);

where bar internally calls foo the requisite number of times.
Perhaps in this scenario the oc parameter is not even needed.

Right, that was one of the possibilities I'd mentioned upthread. Just
out of curiosity, does this call-back strategy score special points as
opposed to other strategies (such as repeated calls to 'bar' until
observing an "end-of-sequence" value), or is it roughly a matter of
individual preference, for you?

Thanks for the feed-back!

Shao Miller · Jan 11, 2013

I found it a little difficult to grasp the initial scenario because of
the magical occurrences going on in the functions. I think that to
discuss patterns in any language the example needs to be have enough
detail to represent the problem space. When I looked at your first post
I thought you working on some strange sort of iterator but now it looks
more like a emitter/observer type problem.

Sorry about that.

The former I'd solve in C
with an object passed as a parameter that maintains state information
whereas the latter registering callbacks for each of the states might be
easier to work with.

It sounds like an asynchronous problem and callbacks tend to work well
for them. You don't actually inform the caller though state variables
but through calling the functions they've provided.

Right, callbacks were one of the possibilities mentioned upthread. It
does seem a bit like an asynchronous problem... Maybe even possible to
think of in terms of co-routines. (But not in C.)

Do call-backs have a major advantage over some other strategy, such as
repeated calls to the "emitter" and having the "observer" continue until
observing an "end-of-sequence" value? Or is it more a matter of
individual preference, would you say?

Thanks for the feed-back!

glen herrmannsfeldt · Jan 11, 2013

Shao Miller said:
Suppose you have a function 'bar' that returns a value that means one of:
- Do 'foo' zero times
- Do 'foo' one time
- Do 'foo' more than one time

What's your favourite way of handling such a value?

It isn't very C-like to do it that way.

Since C doesn't allow multiple entry points to a function
(like Fortran or PL/I ENTRY statement) it is usual to have one function
do both jobs.

Note, for example, the C strtok() function.

With strtok, on the first call you pass the string to tokenize,
on subsequent calls you pass NULL. (It does keep state in static
storage, and so can't be used in reentrant code.)

In the zero case, strtok() returns NULL on the first call,
for the one time case, NULL on the second call, and more than
one, on the N+1th call.

So, why do you special case the 0 and 1 case?

If you really need that, loop through counting the number of
times the function returns non-NULL, then start over with
the appropriate choice.

Also, note for example, the Java Iterator class.

-- glen

Greg Martin · Jan 11, 2013

Right, callbacks were one of the possibilities mentioned upthread. It
does seem a bit like an asynchronous problem... Maybe even possible to
think of in terms of co-routines. (But not in C.)

Do call-backs have a major advantage over some other strategy, such as
repeated calls to the "emitter" and having the "observer" continue until
observing an "end-of-sequence" value? Or is it more a matter of
individual preference, would you say?

It depends I think. If you have the means to say wait until something
happens and when it does run the code suitable to the event while
another part of your program does other work a callback makes sense.
There's so many models for this. It depends on where it works to spend
cycles and on timing of the data. If there's nothing for your program to
do until foo() returns then blocking in it is fine. If you want to get
back to foo quickly then a callback as an argument to a thread might be
the answer.

e.g

void* poll_thread (void* args) {
struct Callbacks* cb = (struct Callbacks*) args;
struct DataAndState* it;

while (it = foo()) {
switch (it->type) {
case this:
cb->do_this (&it);
break;
case that:
cb->do_that (&it);
break;
default:

break;
}

return args;
}

// or

/* args may need to be locked on access */
while (it = foo(&args)) {
switch (it) {
case this:
fire_off_thread (&do_this, &args);
break;
case that:
fire_off_thread (&do_that, &args);
break;
default:
break;
}
}

// or
....

BartC · Jan 11, 2013

Shao Miller said:
On 1/11/2013 07:15, BartC wrote:

For the subsequent translation scenario I tried to describe, the caller is
responsible for getting a single input character 'ic'. However they do it
should be opaque to the translating function, 'bar'. The caller calls
'bar' to translate 'ic' to '*oc'.

extern return_type bar(state_type state, char ic, char * oc);

'bar' might not emit any output character '*oc' if it's not yet ready. In
that case, the caller shouldn't work with '*oc'; shouldn't send it
wherever output is going.

Your function seems to be deceptively simple, but it's not! It could be
used, for example, to translate one language to another.

But suppose it is kept simple, so that all it does is translate N characters
of input (N>=1) to M characters of output (M>=0).

It might not know the value of N either, until it's read the N'th (or even
subsequent) character (for example, to map repeated characters to just one
or zero occurrences).

One problem I can see is that both N and M could be very large, for various
reasons, but let's say the state variables can cope with buffering unlimited
numbers of input and/or output characters.

Then we're pretty much back to the dialog in your original post. I don't
know if there's any standard way of doing these things.

I can tell you that I probably wouldn't stick with the character-at-time
model, but allow a string to be returned.

I think (not having tried) the states can be reduced to just two: (1) bar is
waiting for the character to start the next block of N; (2) it is waiting
for the character to delimit this block. With (2), bar() *has* to be called
again until this pattern is complete, and it reverts to state (1). (But it's
also possible some patterns will never complete, and the caller needs a way
of terminating.)

After each call there will also be some output in the form of a string
(which can also be done by a callback on each character), of any length
including zero.

Paul N · Jan 12, 2013

Yeah I was trying to abstract away the details and ask about a general
pattern (since that's what I'm interested in), but maybe the code
example was not illustrative enough.

For the subsequent translation scenario I tried to describe, the caller
is responsible for getting a single input character 'ic'. However they
do it should be opaque to the translating function, 'bar'. The caller
calls 'bar' to translate 'ic' to '*oc'.

extern return_type bar(state_type state, char ic, char * oc);

'bar' might not emit any output character '*oc' if it's not yet ready.
In that case, the caller shouldn't work with '*oc'; shouldn't send it
wherever output is going.

'bar' might emit '*oc' and have nothing more to emit until processing
the next 'ic' or beyond.

'bar' might emit '*oc' and inform the caller that there are more output
characters pending, so the caller should call 'bar' again until they've
been consumed. In this case, 'ic' is ignored for subsequent calls to
'bar' as 'bar' already knows what sequence it wishes to emit based on
the 'ic' that triggered it.

'foo' was really just a place-holder for "something the caller does with
the results of 'bar'". This might be to put the output characters in a
buffer or to send them to a stream.

I'm interested in general C patterns for this kind of scenario without
getting too distracted by this particular example of "translating
characters". Having said that, maybe working with this example scenario
is a better idea.

What you're doing seems much the same as what a compiler does, so you
could read up on how they do it. I think this stage is called lexical.

I'd be inclined to drop the whole idea of sending characters to your
function. Make the function return the next output character, or EOF
if it is actually the end of the input file, and leave the function to
read in as many characters as it needs to get an output character.
(This may be what you are getting at by talking about callbacks, but
there's no need to use a callback as such if there is only one other
function producing the input characters.) In short, write a function
that you can pull characters from, and which will in turn pull in the
characters it needs, instead of trying to push characters into the
function.

Phil Carmody · Jan 13, 2013

Shao Miller said:
Suppose you have a function 'bar' that returns a value that means one of:
- Do 'foo' zero times
- Do 'foo' one time
- Do 'foo' more than one time

What's your favourite way of handling such a value? For example:

while ((more = bar()) != cv_zero_times) {
/* Do it at least once */
foo();

/* Do it more than once? */
if (more != cv_more_times)
break;
}

Vomit. Doesn't even satisfy your initial description either (which
makes no mention of calling 'bar' multiple times).

From what you've actually described, I'd do:

int count=bar(); /* 0, 1, or more */
while(count) {
foo();
count -= (count&1); /* never drops below 2 */
}

Phil

Shao Miller · Jan 13, 2013

Vomit. Doesn't even satisfy your initial description either (which
makes no mention of calling 'bar' multiple times).

From what you've actually described, I'd do:

int count=bar(); /* 0, 1, or more */
while(count) {
foo();
count -= (count&1); /* never drops below 2 */
}

Yes, yet another instance of a sacrifice for brevity's sake backfiring
and sabotaging the intended meaning. 'foo()' was intended to be a
place-holder for "the caller does something with the result of 'bar'".
I did indeed fail to specify much about 'bar'. Sorry about that.

Please consider:

int bar(state_t state, char ic, char * oc);

And that if 'bar' returns 0, the caller doesn't work with '*oc' and
simply moves on to a new 'ic'.

If 'bar' returns 1, the caller works with '*oc' and moves on to a new 'ic'.

If 'bar' returns something else, the caller works with '*oc' and knows
to call 'bar' repeatedly for more '*oc' (in which case 'bar' ignores
'ic') until some later time when it'll finally move on to a new 'ic'.

Have you come across this type of thing, before? Thanks for the feed-back.

Processing in Python help	0	Aug 31, 2022
zero argument member functions versus properties	8	Nov 3, 2013
Musings, alternatives to multiple return, named breaks?	95	Mar 14, 2014
Text processing	29	Sep 26, 2011
Optimizing list processing	17	Dec 11, 2013
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
Would you like the alternative to Zero Ohm?	1	Sep 9, 2011
Match a pattern multiple times, returning matches, captures andoffset?	9	Apr 5, 2011

Favourite Pattern for Processing Zero, Once, Multiple

Shao Miller

BartC

Shao Miller

BartC

Ben Bacarisse

Shao Miller

Bart van Ingen Schenau

BartC

Shao Miller

Shao Miller

Fred K

Greg Martin

Shao Miller

Shao Miller

glen herrmannsfeldt

Greg Martin

BartC

Paul N

Phil Carmody

Shao Miller

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads