Thad Smith wrote about the following problem [I added more notes, since
the problem has been partially misunderstood by some]
Disclaimer: This is a hard problem, only minimally disguised into
something on-topic for comp.lang.c
You are programming a C99 conforming freestanding implementation (e.g. a
microprocessor device programmed in C, but we can't use <stdio.h> or
otherwise store data on disk). Disruption (reset or power off/on cycle)
can occur at *ANY* time without advance warning, and stops whatever
operation is underway. The program is re-executed from main() after a
disruption.
The objective is to implement a single bit resistant to disruption,
accessed using two procedures to be written:
int bRead(void); // Return 0 or 1 according to bit state
void bToggle(void); // Change the state of the bit
and such that these two properties are met:
a) The bit is stable in the absence of Toggle. That is, the result given
by any two undisrupted bRead() is the same unless bToggle() was invoked
in-between.
b) The bit is changed by Toggle. That is, the result given by any two
undisrupted bRead() is different if bToggle() was invoked exactly once
in-between and no disruption occurred during that execution of bToggle().
Note: The bit is therefore unspecified from the invocation of bToggle()
until the end of this bToggle() if it is undisrupted, or otherwise until
the end of the next undisrupted bRead(). This is the only definition of
"bit resistant to disruption" worth of interest.
The only way to store information across disruptions is using a number
of independent physical EEPROM cells, designated by index j. Access to
the EEPROM cells is by the following three functions. The properties
stated for theses functions apply for any fixed non-negative integer j
less than the number of cells, and "for that cell" is implied everywhere.
// Read a designated EEPROM cell; always returns 0 or 1.
extern int eRead(int j);
// Erase a designated EEPROM cell.
// After undisrupted eErase, the cell is "erased" and eRead returns
// 0 until eProgram is invoked (regardless of invocation of eErase
// and disruptions)
extern void eErase(int j);
// Program a designated *ERASED* EEPROM cell.
// After undisrupted eProgram, eRead returns 1 until eErase is invoked
// (regardless of disruptions)
// Programming a cell that is not "erased" cause undefined behavior.
extern void eProgram(int j);
An EEPROM cell can be left neither erased nor programmed by a disrupted
eErase unless that cell was already erased, or by a disrupted eProgram.
For a cell in such halfway state, eRead returns a value specified only
to be 0 or 1. That otherwise unspecified value may in particular change
after disruption (due e.g. to uncontrollable random noise and/or
temperature and/or voltage drift in the hardware used by eRead), even
though the sate of cells is unchanged by disruption or/and eRead.
Note: if a cell is not erased (left in a halfway state or programmed),
an eProgram of that cell is prohibited. A workaround is to use the
following rather than eProgram():
void mySafeProgram(int j)
{
eErase(j);
eProgram(j);
}
Before the first run, all EEPROM cells have been erased. eRead, eErase,
and eProgram terminate unless a disruption occurs, assuming constraints
on j and eProgram are met. bRead and bToggle must similarly terminate
unless a disruption occurs.
Can it be done? If yes, what is the minimum number of cells needed? Give
a proof.
Depending on the answer, try to minimize the expected failure rate or
maximize the expected LED flash rate in the following use case:
*/
#define NCELLS 10 // number of cells
extern void sLed(int x); // 0: LED on green then off
// 1: LED on red then off.
extern int eRead(int j); // Read a cell.
extern void eErase(int j); // Erase a cell to 0.
extern void eProgram(int j); // Program an erased cell to 1.
int bRead(void); // Return 0 or 1 according to bit state.
void bToggle(void); // Change the state of the bit.
// Declarations, and the code for bRead/bToggle, go here
int main(void)
{
int vState;
vState = bRead();
while(1)
{
sLed(vState);
bToggle();
vState = !vState;
}
return 0;
}
/*
Assume eProgram, eErase, sLed each last 1 second; everything else is
comparatively instant; a disruption occurs with odds p each second;
unspecified eRead values are assigned at random with mean 0.5; the LED
is off following disruption; a failure is a violation of the
requirement: if a disruption occurs with the LED on in some color, then
the first time the program lights a LED it must be with that color.
Note: any bRead/bToggle conforming to a) b) works in this test program,
and any bRead/bToggle making this test program working can be changed
into bRead/bToggle2 conforming to a) b), by defining bToggle2 as:
void bToggle2(void)
{
(void)bRead();
bToggle();
}
Notice that disruption without warning is the standard under which most
real-life hardware actually operates (with the exception of systems with
physical protection against adversarial reset and a power reserve with
means to sense imminent power loss), even though this is often left out
of the specification. And that real disruption-surviving media (EEPROM,
Flash, CDRW, though not battery-backed SRAM) do have the limitation that
read after interrupted change of state sometime gives an unstable
result. While "programming a cell that is not erased cause undefined
behavior" is not the rule, it can exist due to a poorly specified API,
an overcautious hardware specification, or in some cases due to tue
hardware limitations.
The problem is derived from actual requirements in a Smart Card. This is
neither homework, nor an attempt to gather brainpower for free: I'm the
author of the problem and have conclusively determined the feasibility,
with a concise proof. A similar problem "A bit in bottles" is in
rec.puzzles, some of the replies contain answers, and I committed mine.
http://groups.google.com/group/rec.puzzles/browse_thread/thread/ab892b3cf88af0de
Francois Grieu
> First, this is an uninteresting problem because if a power failure
> occurs during a toggle attempt, the restored state gives no
> information at all.
Is the prefix "UN" intended? Thad Smith shows continued INterest, and
I'm grateful to him for that.
> I'm taking another stab at the problem, based on more analysis and
> an understanding of possible use. Thanks to Daniel Giaimo for
> finding a problem in the earlier proposal.
>
> Let's assume the variable has three state categories: s1, s2, and t.
> s1 and s2 are stable states (will not change spontaneously).
> Other values are transition values. If the variable is read in a
> state t, the variable is then changed to s1 or s2.
>
> This can be used to perform a consistent database state update:
> Assume initially database copy A is consistent and dbstate=s1
> (database A is valid). To update, make an updated copy B. At
> this point both A and B are consistent. Set dbstate=s2 (database B
> is valid). Move B to A. Set dbstate=s1.
>
> If power is lost while dbstate is being changed, both database
> copies are valid, so restoring the state to either stable state
> results in a consistent database.
>
> We just need to make sure that s1 and s2 are always stable when
> an modification is in progress on copy A or B.
>
> In the following list of sequential states, each of the four bits
> are independently erasable and writable (normally implemented in
> separate bytes).
>
> Bit
> dbstate dcba Stable bits
> s1 0001 ca
> t1a 0011 da
> t1b 0111 db
> t1c 0110 db
> s2 0010 ba
> t2a 1010 cb
> t2b 1011 dc
> t2c 1001 ca
> (s1) 0001 ca
>
> To go from state s1 to s2 or s2 to s1, we first re-erase the last
> bit erased, then sequentially write the variable to the following
> transition and stable states, changing a single bit at a time.
>
> When reading dbstate, if value is s1 or s2, return the value.
> If value is txa or txb, erase and rewrite the last bit set,
> pr*o*cede to the next stable state. If value is txc, re-erase the
> last bit erased, then procede to the next stable state.
Going from t2b to t2c, disruption occurs while we erase "b".
On restart, state is "10?1". We read it as "1011" that is t2b.
While we "erase the last bit set", that is "a", disruption occurs.
On restart, state is "10??". We read it as "1010" that is t2a.
[notice the "Stable bits: cb" condition is not met]
While we "erase the last bit set", that is "d", disruption occurs.
On restart, state is "?0??" [or maybe "?0?0" is we have been prudent
and erased "a" right after seeing it as 0].
We read that as "0010" and assume state s2.
Before we make any change/call to bToggle(), disruption occurs
and we read the same state as "0001" and assume state s1, this is
a violation of requirement a.
[is we have erased "a" we will read "0000" and I guess we
will next move to s1 and be equally busted]
> If power fails when in a transition state, a reading of the state
> with the above rules will restore dbstate to a stable value.
Not so, see above.
> Multiple interrupts during a restoration can cause the state values
> to go from t1b to t1a to s1, but it doesn't revert any further
Right, unless I err.
> (similar for t2b to t2a to s2).
I don't think so..
[snip]
I plan to post my answer by the end of the week.
Francois Grieu