A bit resistant to disruption

Francois Grieu · Feb 9, 2010

Seebs said:
void
bToggle(void) {
int t;
if (eRead(2)) {
t = eRead(1);
eErase(0);
if (t)
eProgram(0); //
eErase(2); // [A]
}
t = eRead(0);
eErase(1);
if (t)
eProgram(1);
eProgram(2);
eErase(0);
if (!t)
eProgram(0);
eErase(2);
}

Click to expand...

Click to expand...

This one is incorrect. Assume a disruption where I inserted [A]; now
cell 2 is uncertain. Assume cell 2 is read as 1 during the next bToggle,
and a disruption occurs at ; now cells 0 and 2 are uncertain, and
bRead can return 0 (if cells 2 and 0 are read as 0) or 1 (if cell 2 is
read as 0 and cell 0 is read as 1).

Click to expand...

Okay, I misunderstood the spec. I thought that an erase necessarily
either worked or failed to work, and couldn't leave a cell indeterminate.
Re-reading the spec, I see that isn't correct. That does make this a
lot trickier.

Hint: there's nothing to prevent bRead from invoking eErase or/and eProgram.

Click to expand...

True. I'm not sure how it matters, though; a hypothetical user can just run
bToggle() over and over without ever hitting bRead(), so I can't usefully
rely on anything it does, I don't think.

Notice that after one (or several) disrupted bToggle(), the next
undisrupted bRead() can return anything. One attractive option is thus
that this undisrupted bRead() does cleanup so that the next undisrupted
bRead() will return the right value.

I'm assuming this has to work
in the general case, not just in the specific example case given.

Click to expand...

Yes. But if you can make some bRead()/bToggle() that work in my use
case, it is easy to exhibit a fully correct bRead()/bToggle2(): just define

void bToggle2(void )
{
(void) bRead();
bToggle();
}

The proof is left as a relatively easy exercise to the reader.

I suspect this could be solved with another bit, but I'm too sleepy to figure
out how right now.

Click to expand...

Good night..

Francois Grieu

Seebs · Feb 9, 2010

[re: my first attempt]

This one is incorrect. Assume a disruption where I inserted [A]; now
cell 2 is uncertain. Assume cell 2 is read as 1 during the next bToggle,
and a disruption occurs at ; now cells 0 and 2 are uncertain, and
bRead can return 0 (if cells 2 and 0 are read as 0) or 1 (if cell 2 is
read as 0 and cell 0 is read as 1).

Hint: there's nothing to prevent bRead from invoking eErase or/and eProgram.

I get it! The risk here is that if eRead() returns unreliable results,
future queries of the same bit may not yield the same results, resulting
in inconsistent flow through the program logic; in particular, attempts
might be made to modify things based on spurious data.

So.

int
bSureOf(int j) {
if (eRead(j)) {
eErase(j);
eProgram(j);
return 1;
} else {
eErase(j);
return 0;
}
}

int
bRead(void) {
if (bSureOf(2)) {
return eRead(1);
} else {
return eRead(0);
}
}

void
bToggle(void) {
int t;
if (bSureOf(2)) {
t = eRead(1);
eErase(0);
if (t)
eProgram(0);
eErase(2);
}
t = eRead(0);
eErase(1);
if (t)
eProgram(1);
eProgram(2);
eErase(0);
if (!t)
eProgram(0);
eErase(2);
}

We start with all bits clear. bSureOf(2) erases 2, just to be cautious,
but returns 0. We read a bit. We erase bit 1 (also clear at the moment,
but who cares), and don't program it. So far, nothing has been changed.
At this point, we start trying to write 2. If this is interrupted, two
possibilities exist:

1. The attempted read in the next hit of bSureOf() comes up 0; we
forcibly erase bit 2, and trust bit 0, which was correct.
2. The attempted read in the next hit of bSureOf() comes up 1; we
forcibly erase-and-write bit 2, and trust bit 1, which was correct.

Either way, if we get past any check on bit 2, we have a
correct and consistent state. Thus, eventually we come to the eErase(0),
which occurs only when bit 2 is definitely known to be set to 1. We now
try to set bit 0 to its new value. Once that happens, we try to clear
bit 2. If we succeed, all is well -- bit 2 is clear, bit 0 is the toggled
state. If we are interrupted, one of two things can happen on the next
read:

1. The attempted read in bSureOf(2) comes up 0; we forcibly erase
bit 2, and we continue to use the new, toggled, state of bit 0. Our
behavior is consistent.

2. The attempted read in bSureOf(2) comes up 1; we try to write bit
2, and if we succeed, we fall back on the not-yet-toggled state of bit
1. Since the toggle was interrupted, this behavior is acceptable.

3. The attempted read in bSureOf(2) comes up 1; we try to write bit
2, but are interrupted somewhere during it. It is still ambiguous whether
the toggle has succeeded, but no matter what happens, if bRead() returns,
we will have either toggled or failed to toggle the bit.

If bit 2 is set, then bit 1 has definitely been set to the most recently
returned value. (It is possible that we've since tried to store a toggled
value in bit 0, but if we were interrupted, we haven't returned it so we
don't need to show that new value yet.) If bit 2 is not set, then bit 0
definitely holds either the last returned value, or the most recently toggled
value. Either way, before we return, we commit to bit 2's perceived state.
If that is interrupted, that's okay -- eRead() doesn't return a value, because
it was disrupted, so we haven't returned a wrong value.

.... I think.

-s

Francois Grieu · Feb 9, 2010

Seebs said:
[re: my first attempt]

This one is incorrect. Assume a disruption where I inserted [A]; now
cell 2 is uncertain. Assume cell 2 is read as 1 during the next bToggle,
and a disruption occurs at ; now cells 0 and 2 are uncertain, and
bRead can return 0 (if cells 2 and 0 are read as 0) or 1 (if cell 2 is
read as 0 and cell 0 is read as 1).

Hint: there's nothing to prevent bRead from invoking eErase or/and eProgram.

Click to expand...

I get it! The risk here is that if eRead() returns unreliable results,
future queries of the same bit may not yield the same results, resulting
in inconsistent flow through the program logic; in particular, attempts
might be made to modify things based on spurious data.

Yes. See below ;-)

So.

int
bSureOf(int j) {
if (eRead(j)) {
eErase(j);
eProgram(j);
return 1;
} else {
eErase(j);
return 0;
}
}

int
bRead(void) {
if (bSureOf(2)) {
return eRead(1);
} else {
return eRead(0);
}
}

void
bToggle(void) {
int t;
if (bSureOf(2)) {
t = eRead(1);
eErase(0);
if (t)
eProgram(0); //
eErase(2);
}
t = eRead(0);
eErase(1);
if (t)
eProgram(1);
eProgram(2);
eErase(0); // [A]
if (!t)
eProgram(0);
eErase(2);
}

Click to expand...

Assume that on a bToggle(), a disruption occurs where I added [A], tus
with cell 2 fully programmed. Assume the next disruption occurs in the
next bToggle(), which will thus take the branch where I added .
Assume this disruption is during the eProgram(0) on the left of ,
leaving cell 0 undefined, and cell 2 fully programmed.
An undisrupted bRead now returns eRead(0), which is unspecified, and
leaves cell 2 fully programmed. Thus two undisrupted bRead() without a
bToggle() in between can return different results. Bust.

[snip]
Francois Grieu

Francois Grieu · Feb 9, 2010

[I messed up my earlier reply, sorry; let me try again]

Seebs wrote :

[re: my first attempt]

This one is incorrect. Assume a disruption where I inserted [A]; now
cell 2 is uncertain. Assume cell 2 is read as 1 during the next bToggle,
and a disruption occurs at ; now cells 0 and 2 are uncertain, and
bRead can return 0 (if cells 2 and 0 are read as 0) or 1 (if cell 2 is
read as 0 and cell 0 is read as 1).

Hint: there's nothing to prevent bRead from invoking eErase or/and eProgram.

Click to expand...

I get it! The risk here is that if eRead() returns unreliable results,
future queries of the same bit may not yield the same results, resulting
in inconsistent flow through the program logic; in particular, attempts
might be made to modify things based on spurious data.

Yes. See below ;-)

So.

int
bSureOf(int j) {
if (eRead(j)) {
eErase(j); // [C]
eProgram(j);
return 1;
} else {
eErase(j);
return 0;
}
}

int
bRead(void) {
if (bSureOf(2)) {
return eRead(1);
} else {
return eRead(0);
}
}

void
bToggle(void) {
int t;
if (bSureOf(2)) {
t = eRead(1);
eErase(0);
if (t)
eProgram(0); //
eErase(2);
}
t = eRead(0);
eErase(1);
if (t)
eProgram(1);
eProgram(2);
eErase(0); // [A]
if (!t)
eProgram(0);
eErase(2);
}

Click to expand...

We get a fist disruption at [A]. Cell 2 is fully programmed. Next
disruption is when bToggle() is doing eProgram(0) on the left of .
Cell 0 is undefined. We then do bRead() which is disrupted during
bSureOf(2), after eErase(2) at [C]. Now cell 2 is erased, and an
undisrupted bRead() returns eRead(0), which is unspecified. Two
undisrupted bRead() without a bToggle() in betwwen can thus return
different values. Bust.

[snip]
Francois Grieu

Seebs · Feb 9, 2010

[I messed up my earlier reply, sorry; let me try again]

No worries. I already found a bug in this one, which is that if we're
currently pointing at cell 1, and cell 1 and cell 0 are different, multiple
successive bRead() calls could inadvertantly flip back to cell 0 without
a toggle.

Hmm.

Does an eErase() of a cell which is already zero risk creating indeterminacy,
or does it stay zero regardless of disruptions?

-s

Francois Grieu · Feb 9, 2010

Seebs said:
[I messed up my earlier reply, sorry; let me try again]

Click to expand...

No worries. I already found a bug in this one, which is that if we're
currently pointing at cell 1, and cell 1 and cell 0 are different, multiple
successive bRead() calls could inadvertantly flip back to cell 0 without
a toggle.

Hmm.

Does an eErase() of a cell which is already zero risk creating indeterminacy,

No. I wanted that to be implied by "After undisrupted eErase, eRead
returns 0 until eProgram is invoked." and confirmed by "An EEPROM cell
can be left neither erased nor programmed by a disrupted eErase unless
that cell was already erased".

or does it stay zero regardless of disruptions?

I stays zero.

It took some effort to have the maker of the EEPROM assert that. I have
strong arguments towards the conjecture that the problem can't be solved
if this insurance is removed.

Francois Grieu

Seebs · Feb 9, 2010

No. I wanted that to be implied by "After undisrupted eErase, eRead
returns 0 until eProgram is invoked." and confirmed by "An EEPROM cell
can be left neither erased nor programmed by a disrupted eErase unless
that cell was already erased".

I stays zero.
Okay.

It took some effort to have the maker of the EEPROM assert that. I have
strong arguments towards the conjecture that the problem can't be solved
if this insurance is removed.

I believe you are correct.

So, it seems like the key gimmick would be to have values where "0" is a
safe state, so if they look to be 0, you can set them to 0 unconditionally.

Hmm. This is pretty tricky. I am pretty sure it's doable, but everything
I've come up with so far ends up in a case where you can't tell which bit
is indeterminate, and trying to stabilize them risks leaving two bits
indeterminate.

-s

Moi · Feb 10, 2010

Moi wrote :

Yes I see your point. One redeeming difference is that things are
strictly sequential / singe-process in my case.

Yes, but in fact, messages are the building blocks of any kind
of atomicity / serialisation. A message can be recieved (and processed)
or not.

IMHO your eeprom-casus is in essence a sychronisation problem; there
may be a difference (in state) between the eeprom and what the program
knows. Sending (lossy) messages to and from the eeprom helps the program to
adapt its views. (or impose its will on the poor eeprom ;-)
The fact that there is only one "writer" makes it simpler than concurrency
problems, but not basically different.

BTW is there an elegant way to emulate a faulty eeprom by e.g. a diskfile ?
something like:
{ write random value to cell; sync;
wait a second;
write good value to cell; sync;
}??

(with the exception of erasing an empty cell, which should always be a no-op)

BTW, I think I need 3 bits to store one non-lossy bit.
AvK

Francois Grieu · Feb 10, 2010

Moi said:
Yes, but in fact, messages are the building blocks of any kind
of atomicity / serialisation. A message can be recieved (and processed)
or not.

IMHO your eeprom-casus is in essence a sychronisation problem; there
may be a difference (in state) between the eeprom and what the program
knows. Sending (lossy) messages to and from the eeprom helps the program to
adapt its views. (or impose its will on the poor eeprom ;-)
The fact that there is only one "writer" makes it simpler than concurrency
problems, but not basically different.
Agreed.

BTW is there an elegant way to emulate a faulty eeprom by e.g. a diskfile ?
something like:
{ write random value to cell; sync;
wait a second;
write good value to cell; sync;
}??

[OT] behavior of diskfiles under power disruption is notoriously
erratic; many hard disks have a write cache and some reorder writes..[/OT].

I'm thinking of a testbed program that does a randomized simulation,
using setjmp/longjmp to gracefully exit eProgram/eErase.

One issue is to zero the global or/and static variables on restart; for
now I only see these solutions:
- impose a special convention to the code under test;
- implement a C interpreter;
- assume things on the addresses of global or/and static variable;
- restart the test program using a script;
- restart the test program using system() from a supervising C program.

(with the exception of erasing an empty cell, which should always be a no-op)

BTW, I think I need 3 bits to store one non-lossy bit.

I come to the same conclusion. My best (hopefully) correct solution uses
at most 5 eErase() or eProgram() for the first bRead(), 2 eErase() or
eProgram() for the first subsequent bToggle(), 1 eErase or eProgram for
another subsequent bToggle().

Francois Grieu

Moi · Feb 10, 2010

Moi wrote:

BTW is there an elegant way to emulate a faulty eeprom by e.g. a
diskfile ? something like:
{ write random value to cell; sync;
wait a second;
write good value to cell; sync;
}??

Click to expand...

[OT] behavior of diskfiles under power disruption is notoriously
erratic; many hard disks have a write cache and some reorder
writes..[/OT].

But, in this case it is just what we want!

I come to the same conclusion. My best (hopefully) correct solution uses
at most 5 eErase() or eProgram() for the first bRead(), 2 eErase() or
eProgram() for the first subsequent bToggle(), 1 eErase or eProgram for
another subsequent bToggle().

Mine uses one bit_set() or bit_erase() per toggle.
initialisation (after possible crash) is a bit more expensive;
2 or 3 bitops.

BTW: can I rely on the program to crash (and be restarted) on a glitch,
or should the program test itself whether an operation succeeded ?

AvK

Francois Grieu · Feb 10, 2010

Moi said:
Moi wrote:

BTW is there an elegant way to emulate a faulty eeprom by e.g. a
diskfile ? something like:
{ write random value to cell; sync;
wait a second;
write good value to cell; sync;
}??

Click to expand...

[OT] behavior of diskfiles under power disruption is notoriously
erratic; many hard disks have a write cache and some reorder
writes..[/OT].

Click to expand...

But, in this case it is just what we want!

I come to the same conclusion. My best (hopefully) correct solution uses
at most 5 eErase() or eProgram() for the first bRead(), 2 eErase() or
eProgram() for the first subsequent bToggle(), 1 eErase or eProgram for
another subsequent bToggle().

Click to expand...

Mine uses one bit_set() or bit_erase() per toggle.
initialisation (after possible crash) is a bit more expensive;
2 or 3 bitops.

I'll post mine soon.. after rechecking it!

BTW: can I rely on the program to crash (and be restarted) on a glitch,
Yes.

or should the program test itself whether an operation succeeded ?

Assume an error-check by eRead is built into eProgram and eErase.

Francois Grieu

Noob · Feb 10, 2010

Francois said:
With actual physical EEPROM or Flash, after a change of a cell has been
disrupted, two readings without a change are not guaranteed to return
the same value. This is because a cell is read by comparing the charge
in the cell to a reference value, with some uncertainty, and that charge
only changes slowly (in the order of a millisecond, and a power loss, at
least a deliberate one as in short circuit, occurs faster than that).
Note this is one area where SRAM + battery backup is better (positive
feedback between the two sides of a SRAM cell makes it inherently
stable, or at least makes metastability plain unobservable). All this
stuff is very often overlooked, and gives rise to all kind of subtle and
hard to track issues, especially when temperature changes and when
disruption is frequent or/and adversarial (which must be assumed in
Smart Cards, especially when remotely powered).

This reminds me of the way FPGABoy dumped the GBC Boot ROM

http://www.fpgb.org/?page_id=17

Thad Smith · Feb 13, 2010

Francois said:
/*
Disclaimer: This is a hard problem, only minimally disguised into
something on-topic for comp.lang.c

comp.programming and comp.arch.embedded added to the list.

You are programming a C99 conforming freestanding implementation.
Disruption (reset or power off/on cycle) can occur at *ANY* time without
advance warning, and stops whatever operation is underway. The program
is re-executed from main() after a disruption.

The objective is to implement a single bit resistant to disruption,
accessed using two procedures to be written:
int bRead(void); // Return 0 or 1 according to bit state.
void bToggle(void); // Change the state of the bit.

These two properties shall be met:
a) The bit is stable in the absence of Toggle. That is, the result given
by any two undisrupted bRead() is the same unless bToggle() was invoked
in-between.
b) The bit is changed by Toggle. That is, the result given by any two
undisrupted bRead() is different if bToggle() was invoked exactly once
in-between and no disruption occurred during that execution of bToggle(). ....
> An EEPROM cell can be left neither erased nor programmed by a disrupted
> eErase unless that cell was already erased, or by a disrupted eProgram.
> For a cell in such halfway state, eRead returns a value specified only
> to be 0 or 1. That otherwise unspecified value may in particular change
> after disruption (due e.g. to uncontrollable random noise and/or
> temperature and/or voltage drift in the hardware used by eRead), even
> though the sate of cells is unchanged by disruption or/and eRead.
>
> Note: if a cell is not erased (left in a halfway state or programmed),
> an eProgram of that cell is prohibited. A workaround is to use the

First, this is an uninteresting problem because if a power failure occurs during
a toggle attempt, the restored state gives no information at all.

Consider the following alternate problem that guarantees to restore some
information.

Let a variable have 3 stable states s1, s2, and s3. A stable state is preserved
across power losses. Assume that the state always changes s1 to s2, s2 to s3,
and s3 to s1 (mod 3 counter). Since transitions are not instantaneous, there
are 3 transient states: t12, t23, and t31. How many bits, using your rules, are
required to, on power restoration, recover either the preceding stable state, or
if failure during transition, eith the preceding of succeeding stable state?
This explicitly prohibits restoring the third state.

This can be done with 3 independent bits:
B3B2B1
s1 0 0 1
t12 0 1 1
s2 0 1 0
t23 1 1 0
s3 1 0 0
t31 1 0 1

Transition from s1 to s2: write B2, then erase B1, etc.

Recovery from reset:
If in a stable state, erase all zero bits. This makes the stable state robust.
If in a transient state, erase the bit to revert to the previous stable state.

The only remaining problem is to getting a robust first state, since if the
power is lost when first attempting to write state s1 and the B1 is a weak 1, it
will not be restored. You thus need a robust initialization to the first state.

Proof: There is either one or two one bits set, since once two bits are set an
erasure precedes writing the other bit. If failure during a transition before
the written bit becomes readable, it is erased on recovery, making it stable.
If the transition state is read on restart, the new bit may be weak. It is
erased when the previous state is restored. If a failure occurs during a the
second step of the transition when the erasure was not complete, but intially
read as zero, the rule for zero erasure makes the new state robust.

This can be extended, of course, to a variable with additional states. All real
EEPROM that I am familiar with erases and writes more than one bit at a time, so
that would affect the design of an optimal algorithm for storing more information.

Daniel Giaimo · Feb 13, 2010

comp.programming and comp.arch.embedded added to the list.

First, this is an uninteresting problem because if a power failure
occurs during a toggle attempt, the restored state gives no information
at all.

Consider the following alternate problem that guarantees to restore some
information.

Let a variable have 3 stable states s1, s2, and s3. A stable state is
preserved across power losses. Assume that the state always changes s1
to s2, s2 to s3, and s3 to s1 (mod 3 counter). Since transitions are not
instantaneous, there are 3 transient states: t12, t23, and t31. How many
bits, using your rules, are required to, on power restoration, recover
either the preceding stable state, or if failure during transition, eith
the preceding of succeeding stable state? This explicitly prohibits
restoring the third state.

This can be done with 3 independent bits:
B3B2B1
s1 0 0 1
t12 0 1 1
s2 0 1 0
t23 1 1 0
s3 1 0 0
t31 1 0 1

Transition from s1 to s2: write B2, then erase B1, etc.

Recovery from reset:
If in a stable state, erase all zero bits. This makes the stable state
robust.
If in a transient state, erase the bit to revert to the previous stable
state.

The only remaining problem is to getting a robust first state, since if
the power is lost when first attempting to write state s1 and the B1 is
a weak 1, it will not be restored. You thus need a robust initialization
to the first state.

Proof: There is either one or two one bits set, since once two bits are
set an erasure precedes writing the other bit. If failure during a
transition before the written bit becomes readable, it is erased on
recovery, making it stable. If the transition state is read on restart,
the new bit may be weak. It is erased when the previous state is
restored. If a failure occurs during a the second step of the transition
when the erasure was not complete, but intially read as zero, the rule
for zero erasure makes the new state robust.

I think there is a flaw here. Suppose you are moving from transition
state t12 to s2 when you are interrupted. In this case bit b1 will be
in a halfway state when you come back up. Suppose when you come back up
you read the state as t12. You then follow your procedure to revert to
state s1 by erasing b2. You are then interrupted again. When you come
back, you read all bits 0 because bit b1 was actually in a halfway
state.

Moi · Feb 13, 2010

I think there is a flaw here. Suppose you are moving from transition
state t12 to s2 when you are interrupted. In this case bit b1 will be
in a halfway state when you come back up. Suppose when you come back up
you read the state as t12. You then follow your procedure to revert to
state s1 by erasing b2. You are then interrupted again. When you come
back, you read all bits 0 because bit b1 was actually in a halfway
state.

I had exactly the same coding, and I am suffering exactly the same problems at recovery.

The thing that counts is, that (given that at any time *only one bit can be halfway*)
- for the "stable" patters with one 1 bit the 1 bit can be trusted, and one of the 0s may
actually be halfway
- for the "transient" patterns with two 1s, only the 0 can be trusted, and one
of the 1s may actually be halfway.

At recovery: in the t31 (101) case above, the intended state is s3 (100),
and the way to get there is obviously to erase the rightmost bit (followed
by an additional erase of the middle bit)
This will work fine if the rightmost bit was actually halfway, but
if the leftmost bit was halfway, the next state *may* become 000,
(which in my implementation is a "forbidden" uninitialized state,
which steps forward to s1 (001), which is not what we want.
I am a bit hesitating to set the middle bit, since that is the only trusted bit
in the t31 (101) state.

So the real problem becomes: there are two bits can be halfway, but not both,
and we don't know which one. To be sure we need to touch them both, in the
right order (avoiding unwanted states, which, after a crash in our recovery process,
might be picked up by a second recovery)

AvK

Thad Smith · Feb 13, 2010

Daniel said:
I think there is a flaw here. Suppose you are moving from transition
state t12 to s2 when you are interrupted. In this case bit b1 will be
in a halfway state when you come back up. Suppose when you come back up
you read the state as t12. You then follow your procedure to revert to
state s1 by erasing b2. You are then interrupted again. When you come
back, you read all bits 0 because bit b1 was actually in a halfway
state.

Good catch, Dan. Actually what I would really do is to violate the rule against
reprogramming. If it is read as t12 on powerup I would reprogram B2, then erase
B1. I use the rationale that it is a rare occurrance (power must be lost in
brief transition period) and I expect reprogramming to reduce the lifetime
cycles of the cell, rather than immediately kill it. Also, I would add more
transition states in eliminate erasures for stable states on powerup, since
erasures reduce cell life, as well.

Thad Smith · Feb 14, 2010

Thad said:
comp.programming and comp.arch.embedded added to the list.

First, this is an uninteresting problem because if a power failure
occurs during a toggle attempt, the restored state gives no information
at all.

I'm taking another stab at the problem, based on more analysis and an
understanding of possible use. Thanks to Daniel Giaimo for finding a problem in
the earlier proposal.

Let's assume the variable has three state categories: s1, s2, and t. s1 and s2
are stable states (will not change spontaneously). Other values are transition
values. If the variable is read in a state t, the variable is then changed to
s1 or s2.

This can be used to perform a consistent database state update:
Assume initially database copy A is consistent and dbstate=s1 (database A is
valid). To update, make an updated copy B. At this point both A and B are
consistent. Set dbstate=s2 (database B is valid). Move B to A. Set dbstate=s1.

If power is lost while dbstate is being changed, both database copies are valid,
so restoring the state to either stable state results in a consistent database.
We just need to make sure that s1 and s2 are always stable when an
modification is in progress on copy A or B.

In the following list of sequential states, each of the four bits are
independently erasable and writable (normally implemented in separate bytes).

Bit
dbstate dcba Stable bits
s1 0001 ca
t1a 0011 da
t1b 0111 db
t1c 0110 db
s2 0010 ba
t2a 1010 cb
t2b 1011 dc
t2c 1001 ca
(s1) 0001 ca

To go from state s1 to s2 or s2 to s1, we first re-erase the last bit erased,
then sequentially write the variable to the following transition and stable
states, changing a single bit at a time.

When reading dbstate, if value is s1 or s2, return the value.
If value is txa or txb, erase and rewrite the last bit set, then precede to the
next stable state. If value is txc, re-erase the last bit erased, then precede
to the next stable state.

If power fails when in a transition state, a reading of the state with the above
rules will restore dbstate to a stable value. Multiple interrupts during a
restoration can cause the state values to go from t1b to t1a to s1, but it
doesn't revert any further (similar for t2b to t2a to s2).

Assume power fails during the t1c-s2 or t2c-s1 transition such that the powerup
state reads s2 or s1 with a partially erased bit (d for s1, c for s2). If the
partially erased bit is read as zero, dbstate will be considered stable, even
though there is an unstable bit. The state may revert to the previous
transition state, which, when read, will be advanced to the same stable state.
Assume d is a weak zero in s1 when dbstate starts to advance. The first
operation on advance is to re-erase d, eliminating the weak state, then start
transition to s2.

Mark Borgerson · Feb 14, 2010

I'm taking another stab at the problem, based on more analysis and an
understanding of possible use. Thanks to Daniel Giaimo for finding a problem in
the earlier proposal.

Let's assume the variable has three state categories: s1, s2, and t. s1 and s2
are stable states (will not change spontaneously). Other values are transition
values. If the variable is read in a state t, the variable is then changed to
s1 or s2.

This can be used to perform a consistent database state update:
Assume initially database copy A is consistent and dbstate=s1 (database A is
valid). To update, make an updated copy B. At this point both A and B are
consistent. Set dbstate=s2 (database B is valid). Move B to A. Set dbstate=s1.

If power is lost while dbstate is being changed, both database copies are valid,
so restoring the state to either stable state results in a consistent database.
We just need to make sure that s1 and s2 are always stable when an
modification is in progress on copy A or B.

In the following list of sequential states, each of the four bits are
independently erasable and writable (normally implemented in separate bytes).

Bit
dbstate dcba Stable bits
s1 0001 ca
t1a 0011 da
t1b 0111 db
t1c 0110 db
s2 0010 ba
t2a 1010 cb
t2b 1011 dc
t2c 1001 ca
(s1) 0001 ca

To go from state s1 to s2 or s2 to s1, we first re-erase the last bit erased,
then sequentially write the variable to the following transition and stable
states, changing a single bit at a time.

When reading dbstate, if value is s1 or s2, return the value.
If value is txa or txb, erase and rewrite the last bit set, then precede to the
next stable state. If value is txc, re-erase the last bit erased, then precede
to the next stable state.

If power fails when in a transition state, a reading of the state with the above
rules will restore dbstate to a stable value. Multiple interrupts during a
restoration can cause the state values to go from t1b to t1a to s1, but it
doesn't revert any further (similar for t2b to t2a to s2).

Assume power fails during the t1c-s2 or t2c-s1 transition such that the powerup
state reads s2 or s1 with a partially erased bit (d for s1, c for s2). If the
partially erased bit is read as zero, dbstate will be considered stable, even
though there is an unstable bit. The state may revert to the previous
transition state, which, when read, will be advanced to the same stable state.
Assume d is a weak zero in s1 when dbstate starts to advance. The first
operation on advance is to re-erase d, eliminating the weak state, then start
transition to s2.

Wouldn't all this be simplified if you simply used a non-volatile
memory with a single-cycle read and write? MRAM and FRAM come to
mind. IIRC, they have mechanisms to prevent false writes at
power-down and don't suffer from any awkward in-between states.

Mark Borgerson

Thad Smith · Feb 14, 2010

Mark said:
[Proposal for robust use of EEPROM through power outages]

Click to expand...

Wouldn't all this be simplified if you simply used a non-volatile
memory with a single-cycle read and write? MRAM and FRAM come to
mind. IIRC, they have mechanisms to prevent false writes at
power-down and don't suffer from any awkward in-between states.

It should be, assuming that there is brownout detection. The OP is apparently
working with Smart Card hardware using EEPROM. Sometimes the hardware is dictated.

Francois Grieu · Feb 15, 2010

Thad Smith wrote about the following problem [I added more notes, since
the problem has been partially misunderstood by some]

Disclaimer: This is a hard problem, only minimally disguised into
something on-topic for comp.lang.c

You are programming a C99 conforming freestanding implementation (e.g. a
microprocessor device programmed in C, but we can't use <stdio.h> or
otherwise store data on disk). Disruption (reset or power off/on cycle)
can occur at *ANY* time without advance warning, and stops whatever
operation is underway. The program is re-executed from main() after a
disruption.

The objective is to implement a single bit resistant to disruption,
accessed using two procedures to be written:
int bRead(void); // Return 0 or 1 according to bit state
void bToggle(void); // Change the state of the bit

and such that these two properties are met:
a) The bit is stable in the absence of Toggle. That is, the result given
by any two undisrupted bRead() is the same unless bToggle() was invoked
in-between.
b) The bit is changed by Toggle. That is, the result given by any two
undisrupted bRead() is different if bToggle() was invoked exactly once
in-between and no disruption occurred during that execution of bToggle().

Note: The bit is therefore unspecified from the invocation of bToggle()
until the end of this bToggle() if it is undisrupted, or otherwise until
the end of the next undisrupted bRead(). This is the only definition of
"bit resistant to disruption" worth of interest.

The only way to store information across disruptions is using a number
of independent physical EEPROM cells, designated by index j. Access to
the EEPROM cells is by the following three functions. The properties
stated for theses functions apply for any fixed non-negative integer j
less than the number of cells, and "for that cell" is implied everywhere.

// Read a designated EEPROM cell; always returns 0 or 1.
extern int eRead(int j);

// Erase a designated EEPROM cell.
// After undisrupted eErase, the cell is "erased" and eRead returns
// 0 until eProgram is invoked (regardless of invocation of eErase
// and disruptions)
extern void eErase(int j);

// Program a designated *ERASED* EEPROM cell.
// After undisrupted eProgram, eRead returns 1 until eErase is invoked
// (regardless of disruptions)
// Programming a cell that is not "erased" cause undefined behavior.
extern void eProgram(int j);

An EEPROM cell can be left neither erased nor programmed by a disrupted
eErase unless that cell was already erased, or by a disrupted eProgram.
For a cell in such halfway state, eRead returns a value specified only
to be 0 or 1. That otherwise unspecified value may in particular change
after disruption (due e.g. to uncontrollable random noise and/or
temperature and/or voltage drift in the hardware used by eRead), even
though the sate of cells is unchanged by disruption or/and eRead.

Note: if a cell is not erased (left in a halfway state or programmed),
an eProgram of that cell is prohibited. A workaround is to use the
following rather than eProgram():
void mySafeProgram(int j)
{
eErase(j);
eProgram(j);
}

Before the first run, all EEPROM cells have been erased. eRead, eErase,
and eProgram terminate unless a disruption occurs, assuming constraints
on j and eProgram are met. bRead and bToggle must similarly terminate
unless a disruption occurs.

Can it be done? If yes, what is the minimum number of cells needed? Give
a proof.

Depending on the answer, try to minimize the expected failure rate or
maximize the expected LED flash rate in the following use case:
*/

#define NCELLS 10 // number of cells

extern void sLed(int x); // 0: LED on green then off
// 1: LED on red then off.
extern int eRead(int j); // Read a cell.
extern void eErase(int j); // Erase a cell to 0.
extern void eProgram(int j); // Program an erased cell to 1.

int bRead(void); // Return 0 or 1 according to bit state.
void bToggle(void); // Change the state of the bit.

// Declarations, and the code for bRead/bToggle, go here

int main(void)
{
int vState;
vState = bRead();
while(1)
{
sLed(vState);
bToggle();
vState = !vState;
}
return 0;
}

/*
Assume eProgram, eErase, sLed each last 1 second; everything else is
comparatively instant; a disruption occurs with odds p each second;
unspecified eRead values are assigned at random with mean 0.5; the LED
is off following disruption; a failure is a violation of the
requirement: if a disruption occurs with the LED on in some color, then
the first time the program lights a LED it must be with that color.

Note: any bRead/bToggle conforming to a) b) works in this test program,
and any bRead/bToggle making this test program working can be changed
into bRead/bToggle2 conforming to a) b), by defining bToggle2 as:

void bToggle2(void)
{
(void)bRead();
bToggle();
}

Notice that disruption without warning is the standard under which most
real-life hardware actually operates (with the exception of systems with
physical protection against adversarial reset and a power reserve with
means to sense imminent power loss), even though this is often left out
of the specification. And that real disruption-surviving media (EEPROM,
Flash, CDRW, though not battery-backed SRAM) do have the limitation that
read after interrupted change of state sometime gives an unstable
result. While "programming a cell that is not erased cause undefined
behavior" is not the rule, it can exist due to a poorly specified API,
an overcautious hardware specification, or in some cases due to tue
hardware limitations.

The problem is derived from actual requirements in a Smart Card. This is
neither homework, nor an attempt to gather brainpower for free: I'm the
author of the problem and have conclusively determined the feasibility,
with a concise proof. A similar problem "A bit in bottles" is in
rec.puzzles, some of the replies contain answers, and I committed mine.
http://groups.google.com/group/rec.puzzles/browse_thread/thread/ab892b3cf88af0de

Francois Grieu

> First, this is an uninteresting problem because if a power failure
> occurs during a toggle attempt, the restored state gives no
> information at all.

Is the prefix "UN" intended? Thad Smith shows continued INterest, and
I'm grateful to him for that.

> I'm taking another stab at the problem, based on more analysis and
> an understanding of possible use. Thanks to Daniel Giaimo for
> finding a problem in the earlier proposal.
>
> Let's assume the variable has three state categories: s1, s2, and t.
> s1 and s2 are stable states (will not change spontaneously).
> Other values are transition values. If the variable is read in a
> state t, the variable is then changed to s1 or s2.
>
> This can be used to perform a consistent database state update:
> Assume initially database copy A is consistent and dbstate=s1
> (database A is valid). To update, make an updated copy B. At
> this point both A and B are consistent. Set dbstate=s2 (database B
> is valid). Move B to A. Set dbstate=s1.
>
> If power is lost while dbstate is being changed, both database
> copies are valid, so restoring the state to either stable state
> results in a consistent database.
>
> We just need to make sure that s1 and s2 are always stable when
> an modification is in progress on copy A or B.
>
> In the following list of sequential states, each of the four bits
> are independently erasable and writable (normally implemented in
> separate bytes).
>
> Bit
> dbstate dcba Stable bits
> s1 0001 ca
> t1a 0011 da
> t1b 0111 db
> t1c 0110 db
> s2 0010 ba
> t2a 1010 cb
> t2b 1011 dc
> t2c 1001 ca
> (s1) 0001 ca
>
> To go from state s1 to s2 or s2 to s1, we first re-erase the last
> bit erased, then sequentially write the variable to the following
> transition and stable states, changing a single bit at a time.
>
> When reading dbstate, if value is s1 or s2, return the value.
> If value is txa or txb, erase and rewrite the last bit set,
> pr*o*cede to the next stable state. If value is txc, re-erase the
> last bit erased, then procede to the next stable state.

Going from t2b to t2c, disruption occurs while we erase "b".
On restart, state is "10?1". We read it as "1011" that is t2b.
While we "erase the last bit set", that is "a", disruption occurs.
On restart, state is "10??". We read it as "1010" that is t2a.
[notice the "Stable bits: cb" condition is not met]
While we "erase the last bit set", that is "d", disruption occurs.
On restart, state is "?0??" [or maybe "?0?0" is we have been prudent
and erased "a" right after seeing it as 0].
We read that as "0010" and assume state s2.
Before we make any change/call to bToggle(), disruption occurs
and we read the same state as "0001" and assume state s1, this is
a violation of requirement a.
[is we have erased "a" we will read "0000" and I guess we
will next move to s1 and be equally busted]

> If power fails when in a transition state, a reading of the state
> with the above rules will restore dbstate to a stable value.

Not so, see above.

> Multiple interrupts during a restoration can cause the state values
> to go from t1b to t1a to s1, but it doesn't revert any further

Right, unless I err.

> (similar for t2b to t2a to s2).

I don't think so..

[snip]

I plan to post my answer by the end of the week.

Francois Grieu

Why does 1 represent a negative sign bit?	0	Aug 10, 2022
Comparison of Integer and Pointer (that's supposed to be an Integer). Where did I go wrong?	0	Nov 19, 2022
WIN32 - Update Text in a Window in order to show its size in Pixels and coordinates	0	Oct 4, 2023
bit operations	6	Nov 16, 2009
64 bit design	2	Oct 15, 2010
How to accept text and put each letter into a 2d matrix?	0	Jun 3, 2022
Filter sober in c++ don't pass test	0	Dec 2, 2023
How to try a range of hex values in C# code ?	0	Nov 19, 2022

A bit resistant to disruption

Francois Grieu

Seebs

Francois Grieu

Francois Grieu

Seebs

Francois Grieu

Seebs

Moi

Francois Grieu

Moi

Francois Grieu

Noob

Thad Smith

Daniel Giaimo

Moi

Thad Smith

Thad Smith

Mark Borgerson

Thad Smith

Francois Grieu

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads