crash during file writing, how to recover ?

Nate Smith · May 1, 2004

what about using copy first?
and what about exclusive access?

outline:
---------------------------------------------------------

actually first, grab the oldFile exclusively
if successful
then
open oldFile for Read
open saveFile for Write

copy oldFile to saveFile

close oldFile
close saveFile

if copy was successful
then
open oldFile for Write
open saveFile for Read

do the cloudy processing thing that rewrites oldFile
from the saveFile stuff & the "new stuff", whatever
the new stuff is (the changes, updates, etc.)

close oldFile
close saveFile

in case of crash in middle, saveFile should have the recovery
or if the copy step failed, oldFile is still unchanged,
and politely notify any interested parties

now release the exclusive hold on oldFile & let the race resume

else
wait until can grab oldFile exclusively in time,
where some waiting period has been established
to quit trying, and try again a "second" later
or
in case of time out, notify interested parties and
maybe quit thread with flag (duck out of the race)

NOTE:
presumably there are already other things in place elsewhere
that preserve the "new stuff" (changes, etc.) through crashes

Calum · May 1, 2004

Chris said:
Not surprisingly, it's not specified whether File.renameTo results in a
call to the rename system call. More surprisingly, it's not specified
whether File.renameTo will succeed if a file already exists by the
target name. That said, renameTo returns a success flag (which is ugly
in Java, but nevertheless happens). So it's entirely possible to write:

if (!newFile.renameTo(fileBeingProcessed))
{
fileBeingProcessed.delete();
newFile.renameTo(fileBeingProcessed);
}

Just to be inconsistent, the Microsoft "rename" function specifically
requires "The new name must not be the name of an existing file or
directory". It makes some sense - I wouldn't expect a function called
"rename" to delete a file, but I can see situations where either
behaviour would be desired.

So it's possible Java will have different behaviour on Windows and
POSIX, though I can't be bothered to check this.

I guess for ultra-safety, the old file could be renamed to something
else, before renaming the new file to the target filename. Of course
then you'd just clutter up the directory with old files, but there are
circumstances where you want to be able to roll back.

Calum

Roedy Green · May 2, 2004

I guess for ultra-safety, the old file could be renamed to something
else, before renaming the new file to the target filename. Of course
then you'd just clutter up the directory with old files, but there are
circumstances where you want to be able to roll back.

The problem is a lack of naming convention for temporary file.

If at least Java could name all temps in a standard way, some tool
such as batik could periodically cleanup the trash left over from
crashes.

Norm Dresner · May 2, 2004

goose said:
Joseph wrote:

Why not just write your 'dirty flag' in the same file?

Because if the power fails during this write-op, the entire track of the
media may be lost.

The real question is, what else do you need to hold up your pants
besides a belt and suspenders?

Norm

Nick Landsberg · May 2, 2004

Norm said:
Because if the power fails during this write-op, the entire track of the
media may be lost.

The real question is, what else do you need to hold up your pants
besides a belt and suspenders?

Norm

Well, that depends Norm

How critical is that data? How much is the customer
willing to pay to ensure data integrity and consistency?

How does the customer define "reliabilty" for the OP's
application? Have they quantified it? ("Never losing
any data" is not a reasonable requirement. "Never"
is not a number.) If it's something like "no more than 3
records per 1,000,000 are allowed to be lost", how
will this be validated/tested? At what cost?
What is the test plan? What is the probability
of a hardware outage (on this specific hardware)?
What is the probability of a software bug scribbling
all over the data? (Suggestions about keeping a
previous copy of "good" data address a part of this
problem.) How do you validate any particular
data file at startup? How long will it take?
How long will it take to bring it up to date?
(Is it necessary to bring it up to date, or is
last night's copy ok?)

All of these questions (and more) may have
to be answered before the final
solution is decided upon. Everytime you kick
up the requirement a notch, the development (not just
coding) cost usually goes up proportionately,
which brings us back to "how much is the
customer willing to pay?"

<War Story>

I have heard of (but not personally worked on) a system
where the customer *insisted* on triple-mirroring
(primary and 3 copies). Each string was on a separate
controller and on a separate UPS. Each "CPU-box" was also
duplicated and on a separate UPS and could take over the
processing in case of a single CPU failure. All data was also
replicated to a remote site which could pick up the
load in case of a real disaster (e.g. tornado or
earthquake ... yep, one of the data centers was sitting
right smack dab on the Hayward Fault, the other was
in Kansas... go figure).

(As you may have intuited, this was a *financial* application
with some ungodly number of transactions per hour.
The bean counters get apoplectic when they lose track
of a few cents here of there. To me, this was
an extreme case of overkill, but the customer
was willing to pay for it. This may also give you
some clue as to why your credit card interest
rates are so high

</War Story>

It is highly doubtful that the OP needs this kind
of fault tolerance or fault recovery, but we
really don't know the customer's requirements.

Corey Murtagh · May 2, 2004

Calum said:
Just to be inconsistent, the Microsoft "rename" function specifically
requires "The new name must not be the name of an existing file or
directory". It makes some sense - I wouldn't expect a function called
"rename" to delete a file, but I can see situations where either
behaviour would be desired.

The Win32 API supplies MoveFile() and MoveFileEx().

MoveFile() will fail if the destination exists, or if you're trying to
move a directory to another device.

MoveFileEx() allows you to specify a number of options, including an
option to replace an existing file.

Of course MoveFileEx() isn't available on Win9x/Me...

Another option is to copy oldfile to backup, open oldfile for writing,
replace its contents with newfile, close everything and delete backup
and newfile. While far from being atomic, it is recoverable. If the
operation fails at any point you still have the files (oldfile and
newfile) in one form or another.

Of course if you're talking about a multi-gigabyte data store, this is
gonna take a while :>

perry · May 2, 2004

your talking about a forward error recovery pattern.
check out the design philosophy of POET, the object database engine that
used a technique that's known as Jounalizing under Linux today.
Essentially, they maintained a separate "Ledger" that tracked all
transactions to the objectdatabase and if for any reason on a restart
that that ledger was found out of sync or not properly closed, they knew
there was unfinished work with the object file.

- perry

Gerry Quinn · May 2, 2004

I just checked susv3 does require rename to make
sure at any point in time, the target name will
refere to either the old or the new file. And if
rename fails the target must be unaffected.

Well, that justifies your approach, so.

But then you'd have a window where no file exist with
the given name. The approach I suggested is safe. When
creating the new file first create it with a different
name. And when you have finished writing you rename it
such that it atomically replaces the old file.

There is nothing dangerous to it.

Given the above guarantee (which probably means the OS does something
like I was suggesting!)

The window where no file exists is not a problem because software knows
what the alternative name for the original file is, if the proper file
is corrupt or non-existent.

- Gerry Quinn

Joseph · May 3, 2004

Hi all

I'm thankful to everyone for sharing their opinion. I read everybody's post
and learned allot. It turns out that my algorithm will be simple, maybe
something like this:

Make a copy of existing data_file in the same directory

operate on this new copy (data_file2) to add data

close data_file2 when done

delete data_file (original)

rename data_file2 to data_file

on startup the sw will be able to detect if a crash occurred and respond
appropriately. The code on startup should be pretty straight forward

By the way, the platform is java on linux (probably red hat)

Joseph

Chris Sonnack · May 3, 2004

Calum said:
Just to be inconsistent, the Microsoft "rename" function
specifically requires "The new name must not be the name
of an existing file or directory". It makes some sense - I
wouldn't expect a function called "rename" to delete a file,
but I can see situations where either behaviour would be desired.

I can't. You should *NOT* be able to rename/mv a file ONTOP of
an existing file (IMO, obviously).

I've always considered the unice ability to do so one of those
razor-sharp pointy bits you need to be very, very careful about.

Chris Sonnack · May 3, 2004

Nick said:
The bean counters get apoplectic when they lose track
of a few cents here of there. To me, this was
an extreme case of overkill, but the customer
was willing to pay for it.

Personally, this is a level of overkill I'm glad my bank and
CC companies indulge in.

Maybe YOU don't mind if a few cents disappear from your accounts,
but it makes ME apoplectic! (-:

Nick Landsberg · May 3, 2004

Chris said:
Nick Landsberg wrote:

Personally, this is a level of overkill I'm glad my bank and
CC companies indulge in.

Maybe YOU don't mind if a few cents disappear from your accounts,
but it makes ME apoplectic! (-:

I was referring to the triple mirroring when
I used the word overkill.

I would
agree with you that I, too, would get upset
if my bank balance was wrong. The triple
mirroring was /their/ preconcieved solution
to the problem which could have been solved
more cheaply.

Kasper Dupont · May 3, 2004

Nick said:
I was referring to the triple mirroring when
I used the word overkill. I would
agree with you that I, too, would get upset
if my bank balance was wrong. The triple
mirroring was /their/ preconcieved solution
to the problem which could have been solved
more cheaply.

Actually a bank in Denmark recently introduced a system
with three mirrors. That happened after IBM screwed up
their system with only two copies of the data. IIRC the
bank had to pay around 10 million dollars in expenses to
their customers.

Roedy Green · May 3, 2004

Actually a bank in Denmark recently introduced a system
with three mirrors. That happened after IBM screwed up
their system with only two copies of the data. IIRC the
bank had to pay around 10 million dollars in expenses to
their customers.

If you really wanted to be safe, you have three teams running it and
three programming teams working to the same spec.

I was watching a Sun video the other day where the Sun guy was saying
with a flick of his mouse he could install software on 1000 servers.

On the other hand, he could bring 1000 servers to their knees with a
flick of his mouse.

Must feel like Bush toying with the big red button.

CBFalconer · May 3, 2004

Chris said:
Personally, this is a level of overkill I'm glad my bank and
CC companies indulge in.

Maybe YOU don't mind if a few cents disappear from your accounts,
but it makes ME apoplectic! (-:

Actually, I have no objection whatsoever to a few cents
disappearing from your accounts, with the sole proviso that they
reappear in mine.

Nick Landsberg · May 3, 2004

Kasper said:
Actually a bank in Denmark recently introduced a system
with three mirrors. That happened after IBM screwed up
their system with only two copies of the data. IIRC the
bank had to pay around 10 million dollars in expenses to
their customers.

Three mirrors implies four copies.

There are two situations which could
have cause the problem, one of which is
*probably* not IBM's fault. For example,
a disk goes off-line because of a hardware
failure. Diagnostics are issued, but the
customer does not immediately take steps
to replace that disk. If the customer
waits a week or more to replace it, they
are "at risk" of another disk failure
during that time period, all the more so
if all the disks were from the same
batch. I am no apologist for IBM, and
I do not know all the details of this situation
so this is just conjecture on my part.
(But I have seen it happen in the past.)

The other situation is when a software bug
scribbles all over the data. In this case,
it will scribble over *all* copies of the
data. Adding additional disks does not
solve this problem.

Nick Landsberg · May 3, 2004

CBFalconer said:
Actually, I have no objection whatsoever to a few cents
disappearing from your accounts, with the sole proviso that they
reappear in mine.

Wasn't this the case in one of the "urban legends"
of the early days of computers?

What I remember (and memory is the second thing
to go) is that there was a story about some bank
customer who computed what his compound interest
should have been and found it to be a few pennies
off. (This was in the days before calculators.)
He went to his local bank branch and confronted
them with the data. After checking his data,
they confirmed that he was right. They subsequently
found that the person who had programmed their
system had taken the "breakage" (fractions of cents)
and had it credited to his account. This
amounted to quite a bundle!

I do not know if this story is true, just
relating it as a piece of trivia/memorobilia.

Were you that person, Chuck?

Sudsy · May 4, 2004

Nick Landsberg wrote:

What I remember (and memory is the second thing
to go) is that there was a story about some bank
customer who computed what his compound interest
should have been and found it to be a few pennies
off. (This was in the days before calculators.)
He went to his local bank branch and confronted
them with the data. After checking his data,
they confirmed that he was right. They subsequently
found that the person who had programmed their
system had taken the "breakage" (fractions of cents)
and had it credited to his account. This
amounted to quite a bundle!

I do not know if this story is true, just
relating it as a piece of trivia/memorobilia.

The way I heard it was that it was a bank in California. The programmer
made off with millions and moved to Mexico. Because of publicity fears,
not only did the bank not go after him (extradition and all) but they
hired him (at a six-figure salary!) to work remotely and ensure that
nobody else could pull off a similar stunt.
Again, little concrete documentation as the bank didn't want to be
identified (although I'll admit to having heard the name...)

CBFalconer · May 4, 2004

Nick said:
Wasn't this the case in one of the "urban legends"
of the early days of computers?

What I remember (and memory is the second thing
to go) is that there was a story about some bank
customer who computed what his compound interest
should have been and found it to be a few pennies
off. (This was in the days before calculators.)
He went to his local bank branch and confronted
them with the data. After checking his data,
they confirmed that he was right. They subsequently
found that the person who had programmed their
system had taken the "breakage" (fractions of cents)
and had it credited to his account. This
amounted to quite a bundle!

I do not know if this story is true, just
relating it as a piece of trivia/memorobilia.

Were you that person, Chuck?

Unfortunately, no. However it goes on today, and is known as
'float'. When some bill handling firm receives your payment to
JoesFancyGrocery, they age it for some period of time, collecting
the interest (which may be only one day) and then forward the
original amount.

A better technique (for your tale) would be to have taken
advantage of the bias in rounding. In the US we round 0.50..0 of
anything up to 1.0, which produces a bias of some size. If the
perpetrator had been satisfied with this he might still be
collecting. Rounding down also produces a bias, while round to
even is (normally) unbiased.

Roedy Green · May 4, 2004

The other situation is when a software bug
scribbles all over the data. In this case,
it will scribble over *all* copies of the
data. Adding additional disks does not
solve this problem.

The scariest sort of software bug is one that corrupts data just a
tiny bit so that problem may not be noticed for a long time. By the
all the back ups are corrupt too.

All you could is find a very old database and play transactions
against it, or write some one-shot program to compensate for the
trouble.

In banking it would be catastrophic since you have already sent out
erroneous statements.

The RZ-1000 DMA controller hardware bug was horrible for this same
reason. The corruption was sporadic and minor.

What comes after writing your program.	0	Dec 23, 2022
How many writing disciplines are there?	3	Jan 4, 2023
I am writing a Age of Empires game but it is being played by codes but ı am stuck.	1	Jul 14, 2023
Vercel/NextJS: How to access serverless functions from frontend during local development?	0	Jul 16, 2021
How to create PDF file in Batch	5	May 11, 2022
Writing to file	3	Nov 30, 2011
Javascript problem - "crash browser" "infinite loop" timer	1	Nov 14, 2017
Reading/writing a dictionary to file problem :(	1	Mar 31, 2020

crash during file writing, how to recover ?

Nate Smith

Calum

Roedy Green

Norm Dresner

Nick Landsberg

Corey Murtagh

perry

Gerry Quinn

Joseph

Chris Sonnack

Chris Sonnack

Nick Landsberg

Kasper Dupont

Roedy Green

CBFalconer

Nick Landsberg

Nick Landsberg

Sudsy

CBFalconer

Roedy Green

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads