Hi reliability files, writing,reading,maintaining

John Pote · Feb 9, 2006

Hello, help/advice appreciated.

Background:
I am writing some web scripts in python to receive small amounts of data
from remote sensors and store the data in a file. 50 to 100 bytes every 5 or
10 minutes. A new file for each day is anticipated. Of considerable
importance is the long term availability of this data and it's gathering and
storage without gaps.

As the remote sensors have little on board storage it is important that a
web server is available to receive the data. To that end two separately
located servers will be running at all times and updating each other as new
data arrives.

I also assume each server will maintain two copies of the current data file,
only one of which would be open at any one time, and some means of
indicating if a file write has failed and which file contains valid data.
The latter is not so important as the data itself will indicate both its
completeness (missing samples) and its newness because of a time stamp with
each sample.
I would wish to secure this data gathering against crashes of the OS,
hardware failures and power outages.

So my request:
1. Are there any python modules 'out there' that might help in securely
writing such files.
2. Can anyone suggest a book or two on this kind of file management. (These
kind of problems must have been solved in the financial world many times).

Many thanks,

John Pote

Martin P. Hellwig · Feb 9, 2006

John Pote wrote:

So my request:
1. Are there any python modules 'out there' that might help in securely
writing such files.
2. Can anyone suggest a book or two on this kind of file management. (These
kind of problems must have been solved in the financial world many times).

<cut>
I can't answer your specific questions but I got the feeling that you're
barking at the wrong tree ;-)

You don't want to solve this in you application, file management is what
the OS and hardware is about. "Military" grade solutions are often
(depending on their criticalness) double or triple hot spares in the
same room which can be seen as a "unit" and these units are duplicated
on remote locations (at least 25km of each other) syncing their data
with standard tools like rsync.

If you don't have military budget, I would suggest to do it a little
less expensive, like having a couple of Solaris machines (three or four
will do it) in the same room, using a part of their diskspace for ZFS.

Then let your application write your data to that ZFS partition and if
you are particular paranoid you can build in a checksum that can be
calculated by other machines without the need for the original received
data (ZFS has a built-in mechanism for that so you might just want to
call that).

There is nothing wrong for assuming a certain level of hardware, well at
least not if its very clearly communicated to all parties.
ZFS is open source (by SUN) but currently only implemented in Solaris,
which you also can (commercially) use without charge.

Now you only got to figure out how to implement a heartbeat mechanism
between you fail-over applications

Cameron Laird · Feb 9, 2006

John Pote wrote:

<cut>
I can't answer your specific questions but I got the feeling that you're
barking at the wrong tree ;-)

You don't want to solve this in you application, file management is what
the OS and hardware is about. "Military" grade solutions are often
(depending on their criticalness) double or triple hot spares in the
same room which can be seen as a "unit" and these units are duplicated
on remote locations (at least 25km of each other) syncing their data
with standard tools like rsync.

If you don't have military budget, I would suggest to do it a little
less expensive, like having a couple of Solaris machines (three or four
will do it) in the same room, using a part of their diskspace for ZFS.

Then let your application write your data to that ZFS partition and if
you are particular paranoid you can build in a checksum that can be
calculated by other machines without the need for the original received
data (ZFS has a built-in mechanism for that so you might just want to
call that).

There is nothing wrong for assuming a certain level of hardware, well at
least not if its very clearly communicated to all parties.
ZFS is open source (by SUN) but currently only implemented in Solaris,
which you also can (commercially) use without charge.

Now you only got to figure out how to implement a heartbeat mechanism
between you fail-over applications

.
.
.
Also, the whole idea of basing reliability on HTTP uploads of 50 bytes
at a time sounds to me a bit ... strained. There *must* be simpler
ways--and simpler goes a long way in the direction of trustworthy.

John Pote · Feb 9, 2006

Thanks for all the replies,

<cut from Cameron Laird>
Also, the whole idea of basing reliability on HTTP uploads of 50 bytes
at a time sounds to me a bit ... strained. There *must* be simpler
ways--and simpler goes a long way in the direction of trustworthy.

The motivation to look at http: is the widespread avaiability of internet
connections and standard servers able to run CGI scripts. In particular the
availability of low cost GPRS modems (solar panel and/or wind gen) is very
attractive for remote locations where there is no power, telephone line or
cable company.

However, I take your point about HTTP uploads of 50 or so bytes is a little
overhead heavy. Recently I've come to know of reasonably priced fully
managed dedicated servers in secure buildings. So I am now thinking of a
direct TCP/IP port connection. I know Python can do this as I found the
appropriate standard modules and set up a simple proving system on my
private network (one port listener, one sender - no problem!). Are there
security issues in this approach?

I realise http: reliability is not a great as it might be but we have a
reasonable amount of time for re-tries (an hour or two) if some packets get
lost. My recent experience of telneting to our server in London (I'm in the
midlands) is quite revealing. Round trip time is generally sub .5 sec. Makes
me think all that waiting for web pages is the server rather than the
internet itself.

Any thoughts on *simpler* ways to achieve my goal of remote data to web
servers greatly appriciated.

It would still be useful to have anyones thoughts on securely writing the
data to disk files. The off the shelf "military" approach suggested by
Martin Hellwig is outside our budget.

Thanks again.

John

Cameron Laird · Feb 10, 2006

.
.
.

The motivation to look at http: is the widespread avaiability of internet
connections and standard servers able to run CGI scripts. In particular the
availability of low cost GPRS modems (solar panel and/or wind gen) is very
attractive for remote locations where there is no power, telephone line or
cable company.

.
.
.
Ah! I *completely* understand, now. We actually do some of the
same ... Now that I have a fuller grasp of your situation, let
me think over the weekend. I like projects like this.

Hi reliability files, writing,reading and maintaining	8	Feb 7, 2006
How can I train a neural network by reading different csv files	0	Nov 24, 2022
I need help in understanding these files on my phone, Could someone help me understand these files? Urgent help needed. Please help.	1	Jun 4, 2023
remove header line when reading/writing files	5	Oct 11, 2007
Problem reading/writing files	8	Aug 4, 2006
Reading by positions plain text files	13	Nov 30, 2010
reliability problem with Finance::QuoteHist::Yahoo	13	Jul 12, 2008
Writing & reading same file	2	Sep 13, 2010

Hi reliability files, writing,reading,maintaining

John Pote

Martin P. Hellwig

Cameron Laird

John Pote

Cameron Laird

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads