python-noob - which container is appropriate for later exportinginto mySql + matplotlib ?

S

someone

Hi,

Here's my data:
-----------------------
20130315T071500 39000. 10 26 48000. 1 40
20130315T071501 39000. 10 26 48000. 2 42
20130315T071501 39000. 10 26 47520. 15 69
20130315T071501 39000. 10 26 47160. 1 70
20130315T071501 39000. 10 26 47000. 1 72
20130315T071501 39000. 10 26 47000. 2 81
20130315T071501 39000. 10 26 47000. 6 85
20130315T071501 39000. 10 26 46520. 10 95
20130315T071501 43000. 10 36 46520. 10 95
20130315T071501 43200. 4 43 46520. 10 104
20130315T071501 44040. 1 45 46520. 10 108
20130315T071501 44080. 3 48 46520. 10 109
20130315T071501 44080. 3 48 46520. 11 113
20130315T071501 44080. 3 48 46400. 2 131
20130315T071501 45080. 1 51 46400. 2 145
20130315T071501 45080. 1 51 46200. 1 147
20130315T071501 45080. 1 60 46120. 1 182
20130315T071501 45520. 1 65 46120. 1 225
20130315T071501 45520. 1 73 46120. 2 247
20130315T080000 45760. 1 133 46120. 2 378
20130315T080241 45760. 2 199 46120. 2 453
20130315T080945 45760. 3 217 46120. 2 456
20130315T081103 45760. 3 217 46080. 1 457
20130315T081105 45760. 3 218 46080. 2 458
20130315T081106 45760. 4 222 46080. 2 458
20130315T081107 45800. 1 229 46080. 2 458
20130315T082754 45800. 8 266 46080. 2 514
......
.....
....
etc.
-----------------------

The first column is date + time. I know how to use simple tuples, list's
and dict's. But I don't think it's a good idea to make each line an
element in a list/dict ?

I want to put this table into an appropriate container such that
afterwards I want to:

1) Put the data into a mySql-table
2) Be able to easily plot column 1 vs. either of the other columns using
matplotlib etc...

I consider myself a python-newbie so I'm not that good with
containers... I found something here:
http://docs.python.org/dev/library/collections.html but I'm not really
sure what is a good pythonic way of storing this data?

I also think that maybe I ought to convert the first data+time column
into a single number, because right now it's a string (because there's a
"T" in the middle of column 1)... Letting the date+time-column (=column
1) become entirely numbers, makes it easier to plot in matplotlib
afterwards...

I'm not sure how to store this in an efficient manner... What (=how?)
would you do it?

Thanks...
 
C

Cousin Stanley

someone said:
....
I want to put this table into an appropriate container
such that afterwards I want to:

1) Put the data into a mySql-table
2) Be able to easily plot column 1 vs. either of the other columns
using matplotlib etc...
....

Consider editing your data file
into a csv file named someone.csv ....

20130315T071500,39000.,10,26,48000.,1,40
20130315T071501,39000.,10,26,48000.,2,42
20130315T071501,39000.,10,26,47520.,15,69
20130315T071501,39000.,10,26,47160.,1,70
20130315T071501,39000.,10,26,47000.,1,72
20130315T071501,39000.,10,26,47000.,2,81
20130315T071501,39000.,10,26,47000.,6,85
20130315T071501,39000.,10,26,46520.,10,95
20130315T071501,43000.,10,36,46520.,10,95
20130315T071501,43200.,4,43,46520.,10,104
20130315T071501,44040.,1,45,46520.,10,108
20130315T071501,44080.,3,48,46520.,10,109
20130315T071501,44080.,3,48,46520.,11,113
20130315T071501,44080.,3,48,46400.,2,131
20130315T071501,45080.,1,51,46400.,2,145
20130315T071501,45080.,1,51,46200.,1,147
20130315T071501,45080.,1,60,46120.,1,182
20130315T071501,45520.,1,65,46120.,1,225
20130315T071501,45520.,1,73,46120.,2,247
20130315T080000,45760.,1,133,46120.,2,378
20130315T080241,45760.,2,199,46120.,2,453
20130315T080945,45760.,3,217,46120.,2,456
20130315T081103,45760.,3,217,46080.,1,457
20130315T081105,45760.,3,218,46080.,2,458
20130315T081106,45760.,4,222,46080.,2,458
20130315T081107,45800.,1,229,46080.,2,458
20130315T082754,45800.,8,266,46080.,2,514

# -----------------------------------------------
#
# The csv data can be loaded using the csv module
#
# named tuples might be used
# for convenience to access
# individual columns


#!/usr/bin/env python

import csv

from collections import namedtuple as NT

file_source = open( 'someone.ssv' )

# -------------------> individual column names ---------------

nt = NT( 'csv_data' , 'date time col1 col2 col3 col4 col5 col6' )

list_tuples = [ ]

for this_row in csv.reader( file_source ) :

# unpack the current row

zed , one , two , tre , fur , fiv , six = this_row

# split the date and time

d , t = zed.split( 'T' )

# convert individual columns in row to a named tuple

this_tuple = nt( d ,
t ,
float( one ) ,
int( two ) ,
int( tre ) ,
float( fur ) ,
int( fiv ) ,
int( six ) )

# save the current named tuple into a list

list_tuples.append( this_tuple )

# update_data_base( this_tuple )
# .... or ....
# update_data_base( choose individual columns )


# individual elements of the named tuples
# can be accessed by name
#
# this might be convenient for settup up
# data for plots of diffeent columns


print

for row in list_tuples :

print ' ' , row.date , row.time , row.col1 , row.col3 , row.col4

file_source.close()
 
S

someone

for row in list_tuples :

print ' ' , row.date , row.time , row.col1 , row.col3 , row.col4

file_source.close()

Oh, that's great - thank you - I didn't know this named-tuple container
before... I'm still wondering whether or not it's the optimal container
type for me, because I just added a bit of matplotlib-code:

-----------------
#!/usr/bin/env python

import csv
from collections import namedtuple as NT

file_source = open( 'someone.csv' )
# -------------------> individual column names ---------------
nt = NT( 'csv_data' , 'date time col1 col2 col3 col4 col5 col6' )
list_tuples = [ ]
for this_row in csv.reader( file_source ) :
# unpack the current row
zed , one , two , tre , fur , fiv , six = this_row
# split the date and time
d , t = zed.split( 'T' )
# convert individual columns in row to a named tuple
this_tuple = nt( d ,
t ,
float( one ) ,
int( two ) ,
int( tre ) ,
float( fur ) ,
int( fiv ) ,
int( six ) )
# save the current named tuple into a list
list_tuples.append( this_tuple )
# update_data_base( this_tuple )
# .... or ....
# update_data_base( choose individual columns )

file_source.close()
# individual elements of the named tuples
# can be accessed by name
#
# this might be convenient for settup up
# data for plots of diffeent columns

x=[]
y=[]
print
for row in list_tuples :
print ' ' , row.date , row.time , row.col1 , row.col3 , row.col4
x.append(row.col3)
y.append(row.col4)

import matplotlib.pyplot as plt
plt.plot(x,y)
plt.ylabel('some numbers')
plt.show()
-----------------


As you can see, in order for me to make the x- and y-vectors, I need to
make a for-loop to access the individual rows in list_tuples and then I
append to the x- and y- lists...

Is there any clever way of avoiding this for loop, for either this
container or another clever container type?

If there isn't, then this is absolutely also an acceptable/good solution
for me... I also use Matlab and for matrices you can type e.g. plot(
matrix:),3), matrix:),4) ) to plot columns 3 against column 4. But
Matlab also has this problem, that it cannot store strings and numbers
in the same matrix - matrices must entirely be numeric, which my data
isn't (due to 1st column)...

Thanks for any input, if someone has any good ideas...
 
S

someone

On 2013-04-11 03:39, Cousin Stanley wrote:
Is there any clever way of avoiding this for loop, for either this
container or another clever container type?

Ah, I see - I can also just add a numpy array, i.e:
------------------------------
import matplotlib.pyplot as plt

test=numpy.random.rand(8,2)
new_tuple = nt(d,t, float(one), int(two), int(tre), float(fur),
int(fiv), test)

#new_tuple is now:
#csv_data(date='20130315', time='071500', col1=39000.0, col2=10,
#col3=26, col4=48000.0, col5=1, col6=array([[ 0.77714064, 0.06729907],
# [ 0.20418563, 0.97882722],
# [ 0.39130897, 0.06611205],
# [ 0.94938335, 0.50254674],
# [ 0.82047434, 0.71624034],
# [ 0.66618477, 0.92025612],
# [ 0.2789423 , 0.19212809],
# [ 0.7048946 , 0.79112071]]))

x=new_tuple.col6[:,0]
y=new_tuple.col6[:,1]
plt.plot(x,y)
plt.show()
 
C

Cousin Stanley

someone said:
....
I want to put this table into an appropriate container
such that afterwards I want to:

1) Put the data into a mySql-table
....

You might consider using sqlite3 as a database manager
since it is "batteries included" with python ....

The stand-alone sqlite interpreter can first be used
to create an empty database named some.sql3
and create a table named xdata in that data base ....

sqlite3 some.sql3 '.read xdata_create.sql'

where the file xdata_create.sql contains ....

create table xdata
(
xdate integer ,
xtime integer ,
col1 real ,
col2 integer ,
col3 integer ,
col4 real ,
col5 integer ,
col6 integer
) ;


# -----------------------------------------------------------

The csv data file can then be inserted into the xdata table
in the some.sql3 database via python ....

import sqlite3 as DBM

fs = open( 'some.csv' )

ls = [ ]

dbc = DBN.connect( 'some.sql3' )

cur = dbc.cursor()

sql = 'insert into xdata values( ? , ? , ? , ? , ? , ? , ? , ? )'

for row in fs :

dt, col1, col2, col3, col4,col5, col6 = row.strip().split(',' )

xdate , xtime = dt.split( 'T' )

xtuple = ( xdate, xtime, col1, col2, col3, col4, col5, col6 )

cur.execute( sql , xtuple )

fs.close()

dbc.commit()

dbc.close()


# ----------------------------------------------------------------

# python data selection example
# for column 4 between 8 and 9


import sqlite3 as DBM

fs = open( 'some.csv' )

ls = [ ]

dbc = DBM.connect( 'some.sql3' )

dbc.row_factory = DBM.Row

cur = dbc.cursor()

list_sql = [
'select xtime , col4' ,
'from xdata' ,
'where xtime >= 80000 and xtime <= 90000 ; ' ]

str_sql = '\n'.join( list_sql )

cur.execute( str_sql )

for row in cur :

print row[ 'xtime' ] , row[ 'col4' ]

fs.close()

dbc.close()


# ----------------------------------------------------

You can be creative with the data selections
and pass them off to be plotted as needed ....

If mysql is used instead of sqlite3
you should only have to monkey with
the data type declarations in xdata_create.sql
and the dbc.connect strings in the python code ....
 
C

Cousin Stanley

Cousin said:
The stand-alone sqlite interpreter can first be used
to create an empty database named some.sql3
and create a table named xdata in that data base ....

sqlite3 some.sql3 '.read xdata_create.sql'

This step can also be done in python
without using the stand-alone sqlite interpreter ....


# -----------------------------------------

import sqlite3 as DBM

dbc = DBM.connect( 'some.sql3' )

cur = dbc.cursor()

list_sql = [
'create table if not exists xdata ' ,
'( ' ,
' xdate integer , ' ,
' xtime integer , ' ,
' col1 real , ' ,
' col2 integer , ' ,
' col3 integer , ' ,
' col4 real , ' ,
' col5 integer , ' ,
' col6 integer ' ,
') ; ' ]

str_sql = '\n'.join( list_sql )

cur.execute( str_sql )

dbc.commit()

dbc.close()
 
S

someone

someone wrote:
You can be creative with the data selections
and pass them off to be plotted as needed ....

If mysql is used instead of sqlite3
you should only have to monkey with
the data type declarations in xdata_create.sql
and the dbc.connect strings in the python code ....

Uh, thank you very much for providing me with this (+ also the example
in the other post)! Unfortunately, I'm struggling a bit with my code
(I'm making some python-class'es), so it'll take a few days before I
begin on the SQL-stuff...

I'll get back, if the SQL-code you suggested causes any problems - thank
you VERY much for both examples (in both posts)... I'll try it out ASAP,
when I've made my code object-oriented and well-organized :)

Thanks!
 
S

someone

You might consider using sqlite3 as a database manager
since it is "batteries included" with python ....

The stand-alone sqlite interpreter can first be used
to create an empty database named some.sql3
and create a table named xdata in that data base ....

sqlite3 some.sql3 '.read xdata_create.sql'

where the file xdata_create.sql contains ....

create table xdata
(
xdate integer ,
xtime integer ,
col1 real ,
col2 integer ,
col3 integer ,
col4 real ,
col5 integer ,
col6 integer
) ;

Oh, thank you very much! Now I understand this (I haven't really worked
much with sql before, so this was/is new to me, thanks!).
The csv data file can then be inserted into the xdata table
in the some.sql3 database via python .... ......... and .......
# python data selection example
# for column 4 between 8 and 9

I combined both code snippets into:

==============================
#!/usr/bin/python

import sqlite3 as DBM
import ipdb

fs = open( 'some.csv' )
ls = [ ]
dbc = DBM.connect( 'some.sql3' )
cur = dbc.cursor()
if 0:
sql = 'insert into xdata values( ? , ? , ? , ? , ? , ? , ? , ? )'
for row in fs :
dt, col1, col2, col3, col4,col5, col6 = row.strip().split(',' )
xdate , xtime = dt.split( 'T' )
xtuple = ( xdate, xtime, col1, col2, col3, col4, col5, col6 )
cur.execute( sql , xtuple )
dbc.commit()
else:
list_sql = [
'select xtime , col4' ,
'from xdata' ,
'where xtime >= 80000 and xtime <= 90000 ; ' ]
str_sql = '\n'.join( list_sql )
cur.execute( str_sql )
for row in cur :
#ipdb.set_trace()
# I get: TypeError: "tuple indices must be integers, not str"
# "ipdb> row" says: "(80000, 46120.0)"
#print row[ 'xtime' ] , row[ 'col4' ]
print row[0] , row[1]

fs.close()
dbc.close()
==============================

I don't fully understand it yet, but it's nice to see that it works!
Thank you very much for that! Now I'll have to concentrate on figuring
out how/why it works :)
You can be creative with the data selections
and pass them off to be plotted as needed ....

Yes, I understand. Thank you very much. As you can see, on my system I
had to use:

print row[0] , row[1]

instead of:

print row[ 'xtime' ] , row[ 'col4' ]

I'm not sure exactly why - is it because you have another version of
sqlite3 ? This is a bit strange, but anyway I can live with that - at
least for now...
If mysql is used instead of sqlite3
you should only have to monkey with
the data type declarations in xdata_create.sql
and the dbc.connect strings in the python code ....

Actually I didn't knew anything about this sqlite3 before now. This is
the first time I try it out, so I don't really know what's the
difference between sqlite3 and mysql...

But thank you very much for providing some code I can now study and
learn from !!! Much appreciated....
 
S

someone

This step can also be done in python
without using the stand-alone sqlite interpreter ....

Ah, that's great (and even better so I don't have to create the
xdata_create.sql file) - thank you!

I collected what you wrote and put together this script:

=====================
#!/usr/bin/python

import sqlite3 as DBM
import ipdb

# ls = [ ] # this seems to be un-used ?
dbc = DBM.connect( 'some.sql3' )
cur = dbc.cursor()

fs = open( 'some.csv' )
if 0: # select whether to create new database file or query from it?
if 0: # switch between "create table xdata" and "... if not exists"
list_sql = [
'create table xdata ',
'( ',
' xdate integer , ',
' xtime integer , ',
' col1 real , ',
' col2 integer , ',
' col3 integer , ',
' col4 real , ',
' col5 integer , ',
' col6 integer ',
') ;' ]
else:
list_sql = [
'create table if not exists xdata ' ,
'( ' ,
' xdate integer , ' ,
' xtime integer , ' ,
' col1 real , ' ,
' col2 integer , ' ,
' col3 integer , ' ,
' col4 real , ' ,
' col5 integer , ' ,
' col6 integer ' ,
') ; ' ]
# ---------------------------------------------
str_sql = '\n'.join( list_sql )
cur.execute( str_sql )
# ---------------------------------------------
# Insert data from input file fs ("some.csv")
sql = 'insert into xdata values( ? , ? , ? , ? , ? , ? , ? , ? )'
for row in fs :
dt, col1, col2, col3, col4,col5, col6 = row.strip().split(',' )
xdate , xtime = dt.split( 'T' )
xtuple = ( xdate, xtime, col1, col2, col3, col4, col5, col6 )
cur.execute( sql , xtuple )
dbc.commit()

else:
list_sql = [
'select xtime , col4' ,
'from xdata' ,
'where xtime >= 80000 and xtime <= 81104 ; ' ]
str_sql = '\n'.join( list_sql )
cur.execute( str_sql )
for row in cur :
#ipdb.set_trace()
# I get: TypeError: "tuple indices must be integers, not str"
# "ipdb> row" says: "(80000, 46120.0)"
#print row[ 'xtime' ] , row[ 'col4' ]
print row[0] , row[1]

fs.close()
dbc.close()
=====================

I think I can learn a lot from google, based on this code - I'm very
grateful for your help!

Now I just need to make a nice interface and couple it to matplotlib, so
it's easy to plot - I think there's a good chance that I can come up
with a good solution from here, based on the help I got from you people...

Thanks again!
 
C

Cousin Stanley

someone said:
As you can see, on my system I
had to use:

print row[0] , row[1]

instead of:

print row[ 'xtime' ] , row[ 'col4' ]

I'm not sure exactly why

The magic there is setting up the row_factory
after the database connection ....

dbc = DBM.connect( 'some.sql3' )

dbc.row_factory = DBM.Row
I don't really know what's the difference
between sqlite3 and mysql...

MySQL is used through a client/server system
where the db server is always running
and client processes submit requests to it
in the form of sql statements ....

SQLite is used as a stand-alone single process
with no external server involved ....

Both speak sql but there are some differences
mostly in data base connection strings
and data type declarations ....

Basic sql selection is ....

select these fields
from these files
where these conditions are met

And that part of sql doesn't vary much
among different data base managers ....
 
S

someone

someone said:
As you can see, on my system I
had to use:

print row[0] , row[1]

instead of:

print row[ 'xtime' ] , row[ 'col4' ]

I'm not sure exactly why

The magic there is setting up the row_factory
after the database connection ....

dbc = DBM.connect( 'some.sql3' )

dbc.row_factory = DBM.Row

Ah, thanks a lot - now it works! This is much more "user-friendly"...
MySQL is used through a client/server system
where the db server is always running
and client processes submit requests to it
in the form of sql statements ....

SQLite is used as a stand-alone single process
with no external server involved ....

Ok, I see... So SQLite is very good for "practicing"... I'll remember
that, thank you.
Both speak sql but there are some differences
mostly in data base connection strings
and data type declarations ....

Basic sql selection is ....

select these fields
from these files
where these conditions are met

And that part of sql doesn't vary much
among different data base managers ....

Great, thank you very much...

Looks like everything is on track now... I just have to sit and play
with it and make a good interface with matplotlib, but I think I should
be able to come up with something nice, based on the help I god in this
thread...

Thanks again... I just love this python language - makes it possible to
do so much, in so little time and without being an expert at all...
 
C

Cousin Stanley

someone said:
....
So SQLite is very good for "practicing"
....

Yes it is but it is also very good
for much more than just practice ....

Check the wikipedia info ....

http://en.wikipedia.org/wiki/Sqlite

"It is arguably the most widely deployed database engine,
as it is used today by several widespread browsers,
operating systems, and embedded systems, among others"

The firefox browser keeps different sqlite database files
for various uses ....

If you use firefox check its default directory
and you will see several files with .sqlite
file type extensions ....

Under debian debian linux ....

~/.mozilla/firefox/*.default

Many programmers, including pythonistas,
use sqlite for a convenient and persistent
data store where data can be stashed now
and used later in many different ways
through the diversity of sql selections ....

Thanks again ....

You're welcome ....
I just love this python language

Me too .... :)
 
S

someone

Yes it is but it is also very good
for much more than just practice ....

Check the wikipedia info ....

http://en.wikipedia.org/wiki/Sqlite

Very interesting...
"It is arguably the most widely deployed database engine,
as it is used today by several widespread browsers,
operating systems, and embedded systems, among others"

The firefox browser keeps different sqlite database files
for various uses ....

I should remember to use this in the future for my small apps...
If you use firefox check its default directory
and you will see several files with .sqlite
file type extensions ....

Under debian debian linux ....

~/.mozilla/firefox/*.default

You're right:

/home/myUser/.mozilla/firefox/pv079lxv.default/addons.sqlite
/home/myUser/.mozilla/firefox/pv079lxv.default/chromeappsstore.sqlite
/home/myUser/.mozilla/firefox/pv079lxv.default/content-prefs.sqlite
/home/myUser/.mozilla/firefox/pv079lxv.default/cookies.sqlite
/home/myUser/.mozilla/firefox/pv079lxv.default/downloads.sqlite
/home/myUser/.mozilla/firefox/pv079lxv.default/extensions.sqlite
/home/myUser/.mozilla/firefox/pv079lxv.default/formhistory.sqlite
/home/myUser/.mozilla/firefox/pv079lxv.default/permissions.sqlite
/home/myUser/.mozilla/firefox/pv079lxv.default/places.sqlite
/home/myUser/.mozilla/firefox/pv079lxv.default/search.sqlite
/home/myUser/.mozilla/firefox/pv079lxv.default/signons.sqlite
/home/myUser/.mozilla/firefox/pv079lxv.default/urlclassifier3.sqlite
/home/myUser/.mozilla/firefox/pv079lxv.default/webappsstore.sqlite

Very interesting, I didn't knew that :)
Many programmers, including pythonistas,
use sqlite for a convenient and persistent
data store where data can be stashed now
and used later in many different ways
through the diversity of sql selections ....

I'll try to do this in the future also... I just have to practice a bit
more with the SQL commands, but now I can create, update, delete, query
and I guess that's the most important things to know :)

Thanks for your help and for providing interesting background info :)
 
S

Steven D'Aprano

The firefox browser keeps different sqlite database files for various
uses ....

Yes, and I *really* wish they wouldn't. It's my number 1 cause of major
problems with Firefox. E.g.

http://kb.mozillazine.org/Bookmarks_history_and_toolbar_buttons_not_working_-_Firefox


Using a database for such lightweight data as bookmarks is, in my
opinion, gross overkill and adds to the complexity of Firefox. More
complexity leads to more bugs, e.g.:

https://bugzilla.mozilla.org/show_bug.cgi?id=465684#c11

https://bugzilla.mozilla.org/show_bug.cgi?id=431558


Please don't use a full-featured database if you don't need the overhead
of ACID compliance. And if you do, well, Sqlite is not fully ACID compliant.
 
S

someone

Yes, and I *really* wish they wouldn't. It's my number 1 cause of major
problems with Firefox. E.g.

http://kb.mozillazine.org/Bookmarks_history_and_toolbar_buttons_not_working_-_Firefox

Oh, sorry to hear that... Actually I mostly use chromium (but I also
have firefox installed). I just opened a few of my sqlite files from
within sqlite3 - that was very interesting to see the contents of these
files without being anything like a "hacker"... :)
Using a database for such lightweight data as bookmarks is, in my
opinion, gross overkill and adds to the complexity of Firefox. More
complexity leads to more bugs, e.g.:

https://bugzilla.mozilla.org/show_bug.cgi?id=465684#c11

https://bugzilla.mozilla.org/show_bug.cgi?id=431558

On the other hand, I guess it's in the spirit of "open source" that it's
easy for everyone to go in an see what's in the configuration files and
(if one wants) modify and/or make own improvements/programs that tamper
with these sql-files ?
Please don't use a full-featured database if you don't need the overhead

Ok, you're saying there's overhead I should think of... Most of my
programs are rather small in comparison with commercial programs so I
think I don't have to worry about overhead (I don't have any real
speed-critical applications).
of ACID compliance. And if you do, well, Sqlite is not fully ACID compliant.

I just had to google what ACID compliance means and accordingly to this:

http://en.wikipedia.org/wiki/SQLite

"SQLite is ACID-compliant and implements most of the SQL standard, using
a dynamically and weakly typed SQL syntax that does not guarantee the
domain integrity."

So you seem to disagree with wikipedia?

I however don't understand what it means "to not guarantee domain
integrity"... As I read this, I get the feeling that sqlite *IS* ACID
compliant (wikipedia however doesn't use the wording: "fully ACID
compliant", maybe this is the culprit) ?
 
C

Chris Angelico

I just had to google what ACID compliance means and accordingly to this:

http://en.wikipedia.org/wiki/SQLite

"SQLite is ACID-compliant and implements most of the SQL standard, using a
dynamically and weakly typed SQL syntax that does not guarantee the domain
integrity."

So you seem to disagree with wikipedia?

Disagreeing with Wikipedia doesn't mean much, but try this:

http://www.sqlite.org/atomiccommit.html

Note that there's a caveat: You have to tell SQLite to be ACID
compliant, effectively.

ChrisA
 
S

someone

Disagreeing with Wikipedia doesn't mean much, but try this:

http://www.sqlite.org/atomiccommit.html

Ok, thanks - I didn't read it all, but interesting.
Note that there's a caveat: You have to tell SQLite to be ACID
compliant, effectively.

So, you're saying to me that by default SQLite isn't ACID compliant, if
I begin to use it in my own small programs?

I don't know so much about it - maybe it's a matter of definition... If
I just google for the 3 words: "sqlite acid compliance" I get:

Hit no. 1 is wikipedia.
Hit no. 3 says: "SQLite is an ACID-compliant embedded relational
database management system"
Hit no. 4 says: "SQLite implements ACID-compliance by way of a
transaction journal"
Hit no. 5 says: "SQLite transactions are fully ACID-compliant, allowing
safe access from.."
Hit no. 6 says: "Techopedia explains SQLite. SQLite is atomicity,
consistency, isolation, durability (ACID) compliant."
Hit no. 7: "Tell me what you know about SQLite, the ACID-compliant
embedded relational"
Hit no. 9: "SQLite is superior to Jet for the major reason that SQLite
is ACID-compliant whereas Jet, unfortunately, isn't..."
Hit no. 10: "SQLite for Linux 3.6.17. An ACID-compliant relational
database management system"

I think maybe being it's a question of definitions, i.e. "well, Sqlite
is not fully ACID compliant" vs. all the google hits that just tells
that sqlite is "ACID compliant"...

Do I understand you correct, that by "You have to tell SQLite to be ACID
compliant, effectively", you're saying that by default SQLite isn't ACID
compliant ?

Next question: Is it something I should worry about in my own programs
(I'm not sure, I'm an SQL noob)... ?

Thanks.
 
C

Chris Angelico

So, you're saying to me that by default SQLite isn't ACID compliant, if I
begin to use it in my own small programs?
...
Do I understand you correct, that by "You have to tell SQLite to be ACID
compliant, effectively", you're saying that by default SQLite isn't ACID
compliant ?

First off: I am NOT inherently familiar with sqlite. I'm more familiar
with PostgreSQL, DB2, and MySQL. I'm also not an expert at database
engine design, so this discussion is from the point of view of an
applications developer who has used databases from his apps.

True ACID compliance demands support at every level:

1) The application has to operate in logical units of work, which -
apart from with DB2 - requires an explicit "BEGIN" query, or
single-statement transactions.

2) The database engine must employ some form of write-ahead log.
Different databases do this somewhat differently (according to the
page I linked to, SQLite does this in reverse, maintaining a log
that's sufficient to *undo* the transaction, while PostgreSQL does
this forwards, maintaining a log that's sufficient to *redo* it as
well - more effort, but it can be used for database replication), but
one way or another, there must be a way to detect half-done
transactions.

3) The operating system and filesystem must support a forced file
synchronization (fsync/fdatasync), so the database engine can wait for
the data to be written to disk.

4) The underlying media (hard disk, SSD, USB stick, etc) must respond
to the fsync call by actually writing the content to persistent
storage before returning.

Failure at any level means the overall system is not ACID compliant.
PostgreSQL has a huge amount of code in it to try to deal with (or at
least recognize) a level-3 failure, but nothing in the database engine
can deal with level 1 or 4 issues.

You'd have to actually test it. The easiest way is to get two
computers, side by side, and run the database engine on one and a
monitor on the other. To test some SSDs at work, I knocked together a
little program that worked somewhat thus:

* Connect to the database over TCP/IP (easy, as we were doing this
with PostgreSQL)
* Create a table with a number of rows with an ID and a counter,
initialized to 0
* Repeatedly, in parallel, perform a transaction:
- Increment the counter on one of the rows (at random)
- Increment a "possible" in-memory counter for that row
- Commit the database transaction
- Increment a "confirmed" in-memory counter for that row
* When an error of "database seems to be down" is detected, wait for
it to come up again, then query the table. The counters must all be at
least their corresponding "possible" value and at most the
"confirmed".

With that running, I simply pulled the plug on the database computer.
With a properly-configured hard disk, every one of the counters was
within its correct range. With a lying SSD, though, they could be
anywhere from "pretty close" (with a low workload - simulated by
having only a single thread doing transactions and having it sleep for
a few ms each iteration) to "pretty appalling" (with a bunch of
threads spinning tightly, keeping the workload high). Once the SSD
starts doing major write reordering, its throughput soars, but at the
cost of trustworthiness.
Next question: Is it something I should worry about in my own programs (I'm
not sure, I'm an SQL noob)... ?

Yes, it most certainly is. If you have any data that you care about,
put together some kind of test that will allow you to literally pull
the plug on the database, while still knowing whether or not your
transaction was completed (so you'll most likely need some kind of
"possible" / "confirmed" counter pair as I used above).

ChrisA
 
R

Roy Smith

Chris Angelico said:
2) The database engine must employ some form of write-ahead log.
[...]
one way or another, there must be a way to detect half-done
transactions.

3) The operating system and filesystem must support a forced file
synchronization (fsync/fdatasync), so the database engine can wait for
the data to be written to disk.

4) The underlying media (hard disk, SSD, USB stick, etc) must respond
to the fsync call by actually writing the content to persistent
storage before returning.

Some of the early Unix file systems were very fragile. One of the
(often under-appreciated) major advances in BSD (it was certainly in
4.2, not sure how much earlier) was a new filesystem which was much more
robust in the face of hardware failures and system crashes. Prior to
BSD, the on-disk data could be left in an inconsistent state if the
system crashed at the wrong time. In BSD, data was written to disk in
such a way that every operation could either be backed out cleanly or
had enough information to complete the transaction.
 
S

someone

First off: I am NOT inherently familiar with sqlite. I'm more familiar
with PostgreSQL, DB2, and MySQL. I'm also not an expert at database
engine design, so this discussion is from the point of view of an
applications developer who has used databases from his apps.

Ok, would be nice to hear the opinion from an sqlite expert then...
True ACID compliance demands support at every level:

1) The application has to operate in logical units of work, which -
apart from with DB2 - requires an explicit "BEGIN" query, or
single-statement transactions.

2) The database engine must employ some form of write-ahead log.
Different databases do this somewhat differently (according to the
page I linked to, SQLite does this in reverse, maintaining a log
that's sufficient to *undo* the transaction, while PostgreSQL does
this forwards, maintaining a log that's sufficient to *redo* it as
well - more effort, but it can be used for database replication), but
one way or another, there must be a way to detect half-done
transactions.

3) The operating system and filesystem must support a forced file
synchronization (fsync/fdatasync), so the database engine can wait for
the data to be written to disk.

4) The underlying media (hard disk, SSD, USB stick, etc) must respond
to the fsync call by actually writing the content to persistent
storage before returning.
Ok.

Failure at any level means the overall system is not ACID compliant.

Roger... But google says sqlite is supposed to be ACID compliant
(although maybe not "fully" as you indicate, I'm not sure about this)...
PostgreSQL has a huge amount of code in it to try to deal with (or at
least recognize) a level-3 failure, but nothing in the database engine
can deal with level 1 or 4 issues.

You'd have to actually test it. The easiest way is to get two
computers, side by side, and run the database engine on one and a
monitor on the other. To test some SSDs at work, I knocked together a
little program that worked somewhat thus:

* Connect to the database over TCP/IP (easy, as we were doing this
with PostgreSQL)
* Create a table with a number of rows with an ID and a counter,
initialized to 0
* Repeatedly, in parallel, perform a transaction:
- Increment the counter on one of the rows (at random)
- Increment a "possible" in-memory counter for that row
- Commit the database transaction
- Increment a "confirmed" in-memory counter for that row
* When an error of "database seems to be down" is detected, wait for
it to come up again, then query the table. The counters must all be at
least their corresponding "possible" value and at most the
"confirmed".

Ok, that doesn't sound to be so simple after all...
With that running, I simply pulled the plug on the database computer.
With a properly-configured hard disk, every one of the counters was
within its correct range. With a lying SSD, though, they could be
anywhere from "pretty close" (with a low workload - simulated by
having only a single thread doing transactions and having it sleep for
a few ms each iteration) to "pretty appalling" (with a bunch of
threads spinning tightly, keeping the workload high). Once the SSD
starts doing major write reordering, its throughput soars, but at the
cost of trustworthiness.

Ok, it would be nice to hear/read the opinion from another in here
who've been working (a lot?) with sqlite...
Yes, it most certainly is. If you have any data that you care about,
put together some kind of test that will allow you to literally pull
the plug on the database, while still knowing whether or not your
transaction was completed (so you'll most likely need some kind of
"possible" / "confirmed" counter pair as I used above).

I'm not so rich, so I prefer to go for a free database solution rather
than an expensive license... I've heard good things about oracle and
that's also what they used at my previous company, but it's not
something I am willing to pay for, from my private/own money for my
sparetime-projects...

Maybe what you've written explains why somebody got corrupted firefox
sqlite files... I'll just practice a bit more and remember your advice
about testing - at least for "important" projects, I'll remember how you
tested this with pulling out the plug and monitoring the data...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,479
Members
44,900
Latest member
Nell636132

Latest Threads

Top