knowing when file is flushed to disk

John Pote · Aug 9, 2006

Hello,

I'm using a Python CGI script on a web server to log data from a remote site
every few minutes. I do not want to lose any data for whatever rare reason -
power outage/os crash just at the wrong moment etc. So I would like to know
when the data is actually written to disk and the file closed. At that point
I can signal deleting of the data at the remote site which has very limited
storage.

Is there some way from my Python script to know when the data is actually on
the disk. BTW server OS is Linux. Presumably calling flush() and close() on
the output file will initiate the disk write, but do they wait for the
actual disk write or immediately return leaving the OS to do the write when
it sees fit?

Any thoughts appreciated,

John

daftspaniel · Aug 9, 2006

John said:
Is there some way from my Python script to know when the data is actually on
the disk. BTW server OS is Linux. Presumably calling flush() and close() on
the output file will initiate the disk write, but do they wait for the
actual disk write or immediately return leaving the OS to do the write when
it sees fit?

All you can do in Python (or similar) is call flush & close and hope
for the best

There are many factors outwith the control of the language e.g.
* Library behaviour
* OS behaviour
* Hardware cache on the disk itself

That said, I've only found it an issue when a computer is under heavy
load.

Hope this helps,
Davy Mitchell

http://www.latedecember.com/sites/personal/davy/

Neil Hodgson · Aug 10, 2006

John Pote:

Is there some way from my Python script to know when the data is actually on
the disk. BTW server OS is Linux. Presumably calling flush() and close() on
the output file will initiate the disk write, but do they wait for the
actual disk write or immediately return leaving the OS to do the write when
it sees fit?

No, commonly they will schedule these operations and return quickly.
You can try os.fsync but there are no real guarantees about what that
does either. There's an amusing message from Tim Peters about this:
http://mail.zope.org/pipermail/zodb-dev/2004-July/007689.html

Neil

John Pote · Aug 10, 2006

Thanks for the replies. I guessed the situation would be flush() and trust.
The probability of a crash between flush() returning and data actually
written resulting in a trashed disk must be very small. But if you can be
certain without too much effort it's got to be a good idea, so I thought I'd
ask anyway.

How does the banking industry handle this sort of thing? Could be big bucks
if something goes wrong for them!

Thanks again,

John

Slawomir Nowaczyk · Aug 10, 2006

On Wed, 09 Aug 2006 16:13:19 +0000 (GMT)

#> Is there some way from my Python script to know when the data is actually on
#> the disk. BTW server OS is Linux. Presumably calling flush() and close() on
#> the output file will initiate the disk write, but do they wait for the
#> actual disk write or immediately return leaving the OS to do the write when
#> it sees fit?

You may want to look into sqlite -- it is a single-file based SQL
database which is known to be extremely robust in face of problems you
describe. One of its design goals was to provide a replacement for
file storage. There is python binding http://pysqlite.org which is,
IIRC, supposed to be in stdlib for Python 2.5

That said, if your disk and/or OS is lying about the fact whether it
has actually wrote the data or not, there is not much you can do.

--
Best wishes,
Slawomir Nowaczyk
( (e-mail address removed) )

If vegetarians love animals so much, why do they eat all their food???

Dennis Lee Bieber · Aug 10, 2006

How does the banking industry handle this sort of thing? Could be big bucks
if something goes wrong for them!

Redundancy via transactional databases and journalling systems; such
that data is first written to one area, and then later that area is used
to update the main area.

A write failure will only corrupt one of the two areas at a time, so
restarts can examine the system and rebuild a known configuration --
possibly reporting those transactions that were lost so they can be
re-run. If the write from the journal to main fails, you restore a
backup, and rerun all the journal entries. If the write to the journal
failed, the main has not been corrupted up to whenever the last
successful journal->main -- backup the main, flush the journal, rerun
transactions that were made after that last successful transfer.
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/

When deployed to Heroku, python setup.py egg info did not run successfully.	1	Jul 4, 2022
Windows platform, test if a file is fully flushed to disk.	2	Apr 17, 2009
Flushing buffer on file copy on linux	0	Aug 15, 2012
Fastest way to store ints and floats on disk	2	Aug 7, 2008
How can I upload a tar.bz2 file to OpenStack swift object storage container using the Python swift client?	1	Mar 22, 2024
iptcinfo: Can not import: Newbie not really knowing what he is doing	2	Jul 12, 2010
Data not flushed at the moment	1	Nov 22, 2006
Writing byte stream as jpeg format to disk	12	Aug 26, 2010

knowing when file is flushed to disk

John Pote

daftspaniel

Neil Hodgson

John Pote

Slawomir Nowaczyk

Dennis Lee Bieber

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads