Improve reduce functions of SQLite3 request

S

Steffen Mutter

Hi,

I am writing some code to manage handball leagues more easy.
Problem:
MISSON:
Get single club ids glued together with the shortest teamname.

EXAMPLE:
SELECT homenr as nr, home as club FROM Runde20122013 WHERE
place="karlsruhe" UNION SELECT guestnr as nr, guest as club FROM
20122013 WHERE place="karlsruhe" GROUP BY nr LIMIT 10

ACTUAL RESULT:
359|TV Calmbach
21101|SG Heidel/Helm
21236|JSG Neuth/Büch
23108|TG Eggenstein
23108|TG Eggenstein 2
23109|TV Ettlingenw
23109|TV Ettlingenw 2
23112|TSV Jöhlingen
23112|TSV Jöhlingen 2
23112|TSV Jöhlingen 3

NEEDED RESULT:
359|TV Calmbach
21101|SG Heidel/Helm
21236|JSG Neuth/Büch
23108|TG Eggenstein
23109|TV Ettlingenw
23112|TSV Jöhlingen

the nr needs to be unique together with the shortest clubname returned
by the where clause.

Any hints how to get this done either with SQLite3 tecneeqs or python
functions.

Kind regards.
SMut
 
D

Dennis Lee Bieber

EXAMPLE:
SELECT homenr as nr, home as club FROM Runde20122013 WHERE
place="karlsruhe" UNION SELECT guestnr as nr, guest as club FROM
20122013 WHERE place="karlsruhe" GROUP BY nr LIMIT 10
the nr needs to be unique together with the shortest clubname returned
by the where clause.

Any hints how to get this done either with SQLite3 tecneeqs or python
functions.
Untested:

SELECT DISTINCT * from
(select homenr as nr, home as club FROM Runde20122013
WHERE place="karlsruhe"
UNION SELECT guestnr as nr, guest as club FROM 20122013
WHERE place="karlsruhe")
limit 10
 
S

Steffen Mutter

Dennis said:
Untested:

SELECT DISTINCT * from
(select homenr as nr, home as club FROM Runde20122013
WHERE place="karlsruhe"
UNION SELECT guestnr as nr, guest as club FROM 20122013
WHERE place="karlsruhe")
limit 10

Hi Dennis,

here the output of your suggested solution:
SELECT DISTINCT * FROM (
SELECT HeimNr as nr, Heim as club FROM Runde20122013
WHERE kreis ="karlsruhe"
UNION
SELECT GastNr as nr, gast as club FROM Runde20122013
WHERE kreis ="karlsruhe") LIMIT 10;

359|TV Calmbach
21101|SG Heidel/Helm
21236|JSG Neuth/Büch
23108|TG Eggenstein
23108|TGEggenstein 2 <-
23109|TV Ettlingenw
23109|TV Ettlingenw 2 <-
23112|TSV Jöhlingen
23112|TSV Jöhlingen 2 <-
23112|TSV Jöhlingen 3 <-

Still not like what I'm looking for.
Maybe I should iterate through the list, pick out the nr and look for
the club stick it to a new list and leave out those ones, with the longe
r club name...
 
D

Dave Angel

On 02/04/2013 09:30 AM, Steffen Mutter wrote:
359|TV Calmbach
21101|SG Heidel/Helm
21236|JSG Neuth/Büch
23108|TG Eggenstein
23108|TGEggenstein 2 <-
23109|TV Ettlingenw
23109|TV Ettlingenw 2 <-
23112|TSV Jöhlingen
23112|TSV Jöhlingen 2 <-
23112|TSV Jöhlingen 3 <-

Still not like what I'm looking for.
Maybe I should iterate through the list, pick out the nr and look for
the club stick it to a new list and leave out those ones, with the longe
r club name...

Perhaps you should define how you choose to match a shorter & longer
club name as "equivalent". For example, why isn't 'E' the shorter name
to for Ettlingneow and Eggenstein ?

Are you maybe just throwing everything away after the first blank? Or what?
 
P

Peter Otten

Steffen said:
Hi Dennis,

here the output of your suggested solution:
SELECT DISTINCT * FROM (
SELECT HeimNr as nr, Heim as club FROM Runde20122013
WHERE kreis ="karlsruhe"
UNION
SELECT GastNr as nr, gast as club FROM Runde20122013
WHERE kreis ="karlsruhe") LIMIT 10;

359|TV Calmbach
21101|SG Heidel/Helm
21236|JSG Neuth/Büch
23108|TG Eggenstein
23108|TGEggenstein 2 <-
23109|TV Ettlingenw
23109|TV Ettlingenw 2 <-
23112|TSV Jöhlingen
23112|TSV Jöhlingen 2 <-
23112|TSV Jöhlingen 3 <-

Still not like what I'm looking for.
Maybe I should iterate through the list, pick out the nr and look for
the club stick it to a new list and leave out those ones, with the longe
r club name...

How about

SELECT nr, min(club) FROM (
SELECT HeimNr as nr, Heim as club FROM Runde20122013
WHERE kreis ="karlsruhe"
UNION
SELECT GastNr as nr, gast as club FROM Runde20122013
WHERE kreis ="karlsruhe")
GROUP BY nr;

However, I'm smelling a data normalization issue. It looks like you are
interested in the club, but that there may be multiple teams per club and
you are trying to derive the club from the team name. If that's the case you
should consider a database layout with tables similar to the following (* to
mark primary keys)

matches
-------
matchID*, hometeamID, guestteamID, ...

teams
-----
teamID*, clubID, teamname, ...

clubs
-----
clubID*, clubname, ...

With such a layout you could get all clubs with (untested, of course)

SELECT clubs.clubId, clubs.clubname FROM (
SELECT hometeamID as teamID from matches
UNION
SELECT guestteamID as teamID from matches) as participants
INNER JOIN teams on teams.teamID = participants.teamID
INNER JOIN clubs on teams.clubID = clubs.teamID;

I hope you can make sense of it...
 
D

Dennis Lee Bieber

here the output of your suggested solution:
SELECT DISTINCT * FROM (
SELECT HeimNr as nr, Heim as club FROM Runde20122013
WHERE kreis ="karlsruhe"
UNION
SELECT GastNr as nr, gast as club FROM Runde20122013
WHERE kreis ="karlsruhe") LIMIT 10;

359|TV Calmbach
21101|SG Heidel/Helm
21236|JSG Neuth/Büch
23108|TG Eggenstein
23108|TGEggenstein 2 <-
23109|TV Ettlingenw
23109|TV Ettlingenw 2 <-
23112|TSV Jöhlingen
23112|TSV Jöhlingen 2 <-
23112|TSV Jöhlingen 3 <-

Still not like what I'm looking for.

I presume you mean the " 2" and " 3" on those entries should not be
considered when matching "distinct".

I suspect you have a poorly normalized database (what does that
trailing number identify? Heck, are the leading initials unique to the
subsequent name?). The trailing number should probably be something
stored as a separate field. If the initials are unique, they should be a
separate field used as a foreign reference to retrieve the longer name.

For MySQL, MAYBE (there is a single space in the ' ')

SELECT DISTINCT nr, substring_index(cb, ' ', 2) as club FROM (
SELECT HeimNr as nr, Heim as cb FROM Runde20122013
WHERE kreis ="karlsruhe"
UNION
SELECT GastNr as nr, gast as cb FROM Runde20122013
WHERE kreis ="karlsruhe") LIMIT 10;

This function keeps only the part before the 2nd space (if such is
found) in the club name.

Oh wait -- you are using SQLite3... SQLite3 doesn't have the
substring_index function, and I don't even find an index function.
However, you CAN write a Python function to do the equivalent, and
register it with SQLite3 so that it can be called from within the SQL
statement. I'm not going to demonstrate that, though it may be better
than another alternative:

SELECT DISTINCT nr, min(cb) as club FROM (
SELECT HeimNr as nr, Heim as cb FROM Runde20122013
WHERE kreis ="karlsruhe"
UNION
SELECT GastNr as nr, gast as cb FROM Runde20122013
WHERE kreis ="karlsruhe")
GROUP BY nr
LIMIT 10;

Let's see...
E:\UserData\Wulfraed\My Documents>sqlite3 demo
SQLite version 3.6.3
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> create table test
...> (
...> ID integer primary key autoincrement,
...> nr integer,
...> club varchar(50)
...> );
sqlite> insert into test (nr, club) values (359, "TV Calmbach");
sqlite> insert into test (nr, club) values (21101,"SG Heidel/Helm");
sqlite> insert into test (nr, club) values (23112,"TSV Johlingen 3");
sqlite> insert into test (nr, club) values (23109,"TV Ettlingenw 2");
sqlite> insert into test (nr, club) values (23112,"TSV Johlingen");
sqlite> insert into test (nr, club) values (23109,"TV Ettlingenw");
sqlite> insert into test (nr, club) values (23108,"TG Eggenstein");
sqlite> insert into test (nr, club) values (21236,"JSG Neuth/Buch");
sqlite> insert into test (nr, club) values (23112,"TSV Johlingen 2");
sqlite> insert into test (nr, club) values (23108,"TG Eggenstein 2");
sqlite> .mode column
sqlite> .head on
sqlite> .width 10, 25, 25
sqlite> select * from test;
ID nr club
---------- ------------------------- -------------------------
1 359 TV Calmbach
2 21101 SG Heidel/Helm
3 23112 TSV Johlingen 3
4 23109 TV Ettlingenw 2
5 23112 TSV Johlingen
6 23109 TV Ettlingenw
7 23108 TG Eggenstein
8 21236 JSG Neuth/Buch
9 23112 TSV Johlingen 2
10 23108 TG Eggenstein 2
sqlite> select nr, min(club) from test group by nr;
nr min(club)
---------- -------------------------
359 TV Calmbach
21101 SG Heidel/Helm
21236 JSG Neuth/Buch
23108 TG Eggenstein
23109 TV Ettlingenw
23112 TSV Johlingen
sqlite>

Don't even need the DISTINCT with the GROUP BY.
 
S

Steffen Mutter

Dennis said:
I suspect you have a poorly normalized database (what does that
trailing number identify? Heck, are the leading initials unique to the
subsequent name?). The trailing number should probably be something
stored as a separate field. If the initials are unique, they should be a
separate field used as a foreign reference to retrieve the longer name.

There's much more stuff in the table of the database, but these ones
does not matter, the table Runde20122013 stores all the data needed for
gameplay during this season and is made to handle ALL leagues in germany
and if you enter another federation name as 'dhb' it could handle even
more...
Jutst to explain: the numbers define the team(s) of the club in a
special category (male/female, youth E,D,C,B,A adults, senior), playing
in a league.
Those leagues matter in the where clause, so the area
'karlsruhe' represents one of the lowest areas a team can play in -
called 'Kreisklasse'.
So, if a club has only team 2 playing here, and
the team (1) is playing higher, the number 2 needs to remain here.
Just to give you a peek on what I am doing:
https://handball.ws/generator.html has sometimes over 50.000 clicks in
a weekend - and manages it very nicely with a very small hetzner
shared server running lighttpd and python cgi.

At home I am working on a new bugfixed version which I hopefully get
online before planning for season 20132014 starts (April)

-- SNIP --
sqlite> select nr, min(club) from test group by nr;
nr min(club)
---------- -------------------------
359 TV Calmbach
21101 SG Heidel/Helm
21236 JSG Neuth/Buch
23108 TG Eggenstein
23109 TV Ettlingenw
23112 TSV Johlingen
sqlite>

Don't even need the DISTINCT with the GROUP BY.

Woha!
Why didn't I get this out by myself?
So, let's see what my machine spits out:

SQL:
SELECT nr, min(club) FROM (
SELECT HeimNr as nr, Heim as club
FROM Runde20122013 WHERE kreis = "karlsruhe"
UNION SELECT GastNr as nr, gast as club
FROM Runde20122013 WHERE kreis= "karlsruhe")
GROUP BY nr;

REPLY:
359|TV Calmbach
21101|SG Heidel/Helm
21236|JSG Neuth/Büch
23108|TG Eggenstein
23109|TV Ettlingenw
23112|TSV Jöhlingen
23113|HC Karlsbad
23114|MTV Karlsruhe
23115|Post Südst KA
23117|TSV Bulach
23118|TS Durlach
23119|TV Knielingen
23120|TS Mühlburg
23121|TG Neureut
23122|TSV Rintheim
23123|TUS Rüppurr
23124|SV Langenstb.
23125|FV Leopoldshfn
23126|TV Malsch
23128|TV Wössingen
23130|HSG Ettl/Bruch
23132|HSG Li-Ho-Li
23133|HSG PSV/SSC KA
23136|HSG Ri/Wei/Grö 2 <--- PERFECT!!!
23138|HSG Wei/Grö
23231|SG Stutensee
23234|KIT SC 2010
23251|SG MTV/Bulach
23503|SG RüBu
23516|SG Malsch/Ettl
25149|HC Neuenbürg
25201|SG PF/Eutingen
25224|HSG Pforzheim
25232|JSG Goldst. P

I need to glue home/guest aka heim/gast together, because there are K.O.
leagues, where a team may play only as guest, before kicked out of the
game (Amateur Deutschlandpokal)

HSG Rintheim-Weingarten-Grötzingen 2nd team plays in Karlsruhe, the 1st
one plays Badenliga.

If you come to Karlsruhe one day, send me an email, I take you out for a
beer or so...

Kind regars,
Steffen (very happy)

Steffen
 
D

Dennis Lee Bieber

There's much more stuff in the table of the database, but these ones
does not matter, the table Runde20122013 stores all the data needed for
gameplay during this season and is made to handle ALL leagues in germany
and if you enter another federation name as 'dhb' it could handle even
more...
Jutst to explain: the numbers define the team(s) of the club in a
special category (male/female, youth E,D,C,B,A adults, senior), playing
in a league.

Which does, to me, imply an unnormalized database. The
"team/category" should be a separate field.

club(*ID*, name)

team(*ID*, /club/, category)

{where *..* is primary key, /../ is a foreign key}

and league should probably be another table (and "club" may have a
/league/ entry).

Yes, that would imply using JOINs to get a full identifier, and
maybe more complex GROUP BY terms...
Woha!
Why didn't I get this out by myself?

Well, I only got the idea after someone else mentioned using min();
but I don't recall if they had GROUP BY in that suggestion. GROUP BY
ensures the min() only applies when the "nr" is the same.
 
S

Steffen Mutter

Dennis said:
Which does, to me, imply an unnormalized database. The
"team/category" should be a separate field.
club(*ID*, name)

team(*ID*, /club/, category)

{where *..* is primary key, /../ is a foreign key}

You are right, but as I mentioned above I had to use some data stored
somwhere, strip it out of html, seperate it and store it in a database,
where it is useful.
To be specific: the only thing really needed is the club number. Which
team plays in the specific league is not really important, cause a club
can only have one team in a league.
and league should probably be another table (and "club" may have a
/league/ entry).

The way I did it is this:
store all data for a complete season in one table. There are more tables
especially for the gyms or stadium data if more information is requested
by the user.
My plan is to do it better and user friendly than the commercial stuff,
but I need their results and design to get my thing to work. I offer the
game-plans as html for direct viewing and cvs and pdf files for
download - onother feature no commercial site offers.
Yes, that would imply using JOINs to get a full identifier, and
maybe more complex GROUP BY terms...

Certainly a more complex normalization would imply a complete
reprogramming of all SQL-requests send to the DB. I am a great fan of SQ
Lite, because it is fast, easy to use and very easy to maintain and
backup. (just a nice rsync job backups all without any hassle)

I prefer you help next time I run into a problem I can't fix on my own
:)
Well, I only got the idea after someone else mentioned using min();
but I don't recall if they had GROUP BY in that suggestion. GROUP BY
ensures the min() only applies when the "nr" is the same.

I played around with min() already but I got stuck with the 'DISTINCT'
at the beginning so I got only a single result. This SQL spell you were
casting for me gave me a deeper understanding, how SQL works and what
can be done with it. Thanks :)

Actually I am fixing bugs and think about the user managment and
permission handling and privacy of data.

Just give you a peek:
referees need to be kept safe.
But they need to be planned and stored in the DB.
How to handle this?
I think I use GnuPG.
Every referee and referee-date-planner needs a private and public key so
only these persons can see it, the planner every referee he dates and
the referee only these games he is planned for.

You see: a lot of work to do - after fixing some bugs and get the
<select id="team" size=7 multiple> to work, that will be a big task...
 
D

Dennis Lee Bieber

Actually I am fixing bugs and think about the user managment and
permission handling and privacy of data.

Just give you a peek:
referees need to be kept safe.
But they need to be planned and stored in the DB.
How to handle this?
I think I use GnuPG.
Every referee and referee-date-planner needs a private and public key so
only these persons can see it, the planner every referee he dates and
the referee only these games he is planned for.

You see: a lot of work to do - after fixing some bugs and get the
<select id="team" size=7 multiple> to work, that will be a big task...

If this is meant to be a web-based system, you probably should be
looking into using some sort of framework: Django, Zope/Plone, etc.
These often have at least two classes of users (controlling what
operations they are permitted to perform), if not more, with session
management controls.

Most of these frameworks use some form of object-relational mapper
(ORM) to go between Python code objects (each table is a class, and each
record is an object instance, to put it is very general terms) and the
database itself.

Problem with going that route is that you really need to normalize
the database (as soon as you said you just dump everything into one
table I shuddered), since a lot of the capability is tied to using
foreign key linkages to filter data.

You shouldn't need the public key cryptography system since the
application itself should be able to control visibility of the data
using some privilege table and a log-in identifier. Using PKCS to
encrypt data would mean the creator of that data would be unable to edit
it, as they'd need the private key to decrypt it for editing. Not to
mention you'd need to have a way of transmitting the keys to the
appropriate people. If you kept the keys on the database, you'd then
need a secure way to ensure the proper key for the user was picked --
anything that secure could be used to control the direct access to the
data itself without encrypting it.
 
S

Steffen Mutter

Hi Dennis,
If this is meant to be a web-based system, you probably should be
looking into using some sort of framework: Django, Zope/Plone, etc.

I tried both frameworks but I always see, that these frameworks
improve the progress to get things done but also limit what you can do
and how you can do it.
I really want to code all by myself:
I want a browser user interface based on HTML5, jQuery and jQueryUI, CSS
on the client side, JSON to submit the data and all stuff to display,
alter and modify is almost completely done on the client.
So the server itself just waits for something to do (en-/decrypt the dat
a for the referees for example)
I've seen those commercial handball sites use something like TYPO3,
they can't even get the easiest things to work I've been asking for,
even their sites are not SSL secured.
The userinterface I offer to my users is specially designed for the
needs of handball play.
There are still a lot of features missing (my competitors don't even
think about) so if you return as a user to the page using HTML5s
session- or localstorage - it would improve the usage of the site
again, so the user sees automatically the club, team and league he's
interested in, stores his GnuPG password in his browser...
Most of these frameworks use some form of object-relational mapper
(ORM) to go between Python code objects (each table is a class, and each
record is an object instance, to put it is very general terms) and the
database itself.

I am not very good in object oriented programming yet, I'm just a
hobbyist. I bet you would losse patience if you'd see my coding...
Problem with going that route is that you really need to normalize
the database (as soon as you said you just dump everything into one
table I shuddered), since a lot of the capability is tied to using
foreign key linkages to filter data.

I don't think normalization would really improve the speed on getting
the data needed for the user interface, would it?
You shouldn't need the public key cryptography system since the
application itself should be able to control visibility of the data
using some privilege table and a log-in identifier.

No it should not. Not even I as the 'master' of the system shall not be
able to see the referee stuff - see, if the club got teams in higher
leagues some guys would certainly like to pay referees to lead the game
in a certain direction...
That's why I thought about using GnuPG...
It's quite easy to use with a plugin, available as a tarball.
http://packages.python.org/python-gnupg/
I got very exited when I did some tests, but found out that I need a lot
of entropy to work properly. So I summoned a haveged deamon on my
systems (available as Debian Squeeze backport)
Using PKCS to
encrypt data would mean the creator of that data would be unable to edit
it, as they'd need the private key to decrypt it for editing.

That's indeed my idea:
The GnuPG software on the server handles the keyrings
and all the decryption for the users.
I will force them to use a password for their private
key, when they initially set up their pub/sec rings so the server can
decrypt it for them:

They will be asked for their private key when needed the session is SSL
secured (I use CAcert as CA) so no externals can spy their password.

Okay, I could do some server side scripting to create a save-passwords
tool, but that would break user privacy and make my efforts useless.
It thought about doing it this way: The keyring will reside in a
directory. Every user gets his own directory inside of it for
the keys, named as their userID. So is easy to pick the right key
for the logged in user, isn't it?
Not to
mention you'd need to have a way of transmitting the keys to the
appropriate people.

Users can get download their individual pub/sec-keyring via https.
If you kept the keys on the database, you'd then
need a secure way to ensure the proper key for the user was picked --
anything that secure could be used to control the direct access to the
data itself without encrypting it.

Er - but the data in the database would be readable, wouldn't it?
That's exactly not what I want...

Maybe my thinking is absolute nonsense:
If you have an idea, how to get it done in an easier way, there are
only 2 things to get done:
- data in the database needs to be kept save
- even I can't see it (without breaking spying out the users)
- only the users who need it for gameplay can see it

I am listening
:)
 
D

Dennis Lee Bieber

I want a browser user interface based on HTML5, jQuery and jQueryUI, CSS
on the client side, JSON to submit the data and all stuff to display,
alter and modify is almost completely done on the client.

I'll get back to that "alter and modify" later...
So the server itself just waits for something to do (en-/decrypt the dat
a for the referees for example)
I've seen those commercial handball sites use something like TYPO3,
they can't even get the easiest things to work I've been asking for,
even their sites are not SSL secured.

I'll confess that I've not looked at any such sites -- mainly
because I wouldn't understand enough about the sport to understand why
one would do something one way or another. I'm mainly responding from
the point of view of a general software developer with a bit too much
understanding of relational database theory.
I don't think normalization would really improve the speed on getting
the data needed for the user interface, would it?
It may not improve initial speed -- but ideally it should in the end
simplify the data you have to maintain. Any time you have a table design
that requires duplicating data from record to record, you run a risk of
inconsistent data on input.
No it should not. Not even I as the 'master' of the system shall not be
able to see the referee stuff - see, if the club got teams in higher
leagues some guys would certainly like to pay referees to lead the game
in a certain direction...
That's why I thought about using GnuPG...
It's quite easy to use with a plugin, available as a tarball.
http://packages.python.org/python-gnupg/
I got very exited when I did some tests, but found out that I need a lot
of entropy to work properly. So I summoned a haveged deamon on my
systems (available as Debian Squeeze backport)


That's indeed my idea:
The GnuPG software on the server handles the keyrings
and all the decryption for the users.
I will force them to use a password for their private
key, when they initially set up their pub/sec rings so the server can
decrypt it for them:

Here we get to your "alter and modify". The originator (the person
using the "public" key of a key pair) will NOT be able to come back
later and modify the data -- only the owner of the "secret" key can
decrypt it.

You've created a "write once" data system. The only operations the
originator will be able to perform are to delete the record, or replace
the contents blindly (and anyone with delete/replace privileges could do
that unless you implement some sort of database privilege scheme that
doesn't use the PKCS system to lock records to the originator user and
to the destination [reading] user).

Given those restrictions, you might as well just email the encrypted
data to the reader and forget about storing it anywhere.
They will be asked for their private key when needed the session is SSL
secured (I use CAcert as CA) so no externals can spy their password.
In other words -- you are going to use a single session symmetric
encryption system just to transmit a passphrase to unlock a PKCS key.
AND you want the server keyrings to have both public and secret keys for
each user... If I were paranoid, I sure wouldn't trust that. When PGP
was originally released, the recommendation was that the key-rings
should not even be on the computer hard-drive, but on a floppy that the
user would insert only long enough to perform an encryption/decryption
operation -- and PGP then went and wiped any memory that the key had
passed through. GnuPG, as I recall, is an open source replacement based
on PGP after PGP became a commercial product. Only public keys were ever
given to anyone else.
 
S

Steffen Mutter

Hi Dennis,

I really appreciate your input :)
I'll confess that I've not looked at any such sites -- mainly
because I wouldn't understand enough about the sport to understand why
one would do something one way or another.

You better do not. The first one I had to use was horrible in design and
there was only one person doing the programming stuff who stopped
development. So the leagues of the 'Badischer Handball Verband' went to
a commercial solution, done by the 'Württembergischer Handball Verband'.
I'm mainly responding from
the point of view of a general software developer with a bit too much
understanding of relational database theory.

Which is fine, I'll get to that aspect later, you showed me a row where
I have a duplicate field, which is not needed at all.
It may not improve initial speed -- but ideally it should in the end
simplify the data you have to maintain. Any time you have a table design
that requires duplicating data from record to record, you run a risk of
inconsistent data on input.

Okay, let's have a look at the scheme.
Every row represents all the data needed to display a single game fully,
here it is:
tables for each season are named like this: Runde[begin-year][end-year]
so:
CREATE TABLE "Runde20122013" (
"Spiel" INTEGER PRIMARY KEY,
"Staffel" TEXT NOT NULL,
"Datum" TEXT,
"Zeit" TEXT,
"Halle" INTEGER,
"Heim" TEXT,
"Gast" TEXT,
"HeimNr" INTEGER,
"GastNr" INTEGER,
"Termin" TEXT, <--- OOPS! Thank you for pointing on the design!!!
"HT" INTEGER,
"GT" INTEGER,
"HP" INTEGER,
"GP" INTEGER,
"melderID" INTEGER,
"bemerkung" TEXT,
"verband" TEXT,
"regional" TEXT,
"landes" TEXT,
"kreis" TEXT,
"schiriID" INTEGER,
"editorID" INTEGER,
"lastchange" TEXT );

The 'Spiel' column holds the number of the game, it's the primary key
which could be used to join data later for a game report, where the
players and the scores they did are inserted by their trainers.
Not implemented yet...

So 'Staffel' = league. There are many similar leagues with the exactly
the same name all over the table, so I need some other rows to get
things sorted correctly.
Together with 'verband','regional','landes','kreis' you can pick the
correct position of this special league in the command structure of
the german handball federation called 'dhb'.

/dhb/shv/baden/karlsruhe/mJC-KL1 is my team I train.

Halle is the ID-Number of
the place where the game is played.

Datum = date, Time = start time of the game.
And you are right, I have done a double insert of those two rows in a
third row: 'Termin' - I should remove this one, because sorting is
easy done by ORDER BY Datum, Zeit - instead ORDER BY Termin, I actually
use.

lastchange represents the last timestamp when the last modification
was done.

melderID represents the one (man or machine), who inserted
the result of the game.

editorID the last real personID who changed
someting.
bemerkung = remarks is for the editor what was edited.
SchiriID is the column which will be changed to TEXT so it can hold
the GnuPG text. This is a required field, depending in how the remark
looks like an icon is displayed to sign the viewer that there is
something he should take care about.

Okay, I see, I have to do some changes:
Remove (Termin) from the table.
Alter schiriID to TEXT the get the GnuPG stuff in there.
(Besides that I remember SQLite does not really care about that)
Here we get to your "alter and modify". The originator (the person
using the "public" key of a key pair) will NOT be able to come back
later and modify the data -- only the owner of the "secret" key can
decrypt it.

I think I will do it this way:
Let's assume the planner for the referees for all
places in 'karlsruhe' logs in.
(Meaning all those places with the Halle ID between 23000-23999 in the
league-area 'karlsruhe' will be provided with referees by him - see, a
lot of data needed and stored in extra tables)

So, as soon as he clicks on the button to do
'referee planning' he will be asked for his private-GnuPG to decrypt
the data.
The data will be displayed with ALL games in those places,
even the games with a higher level he CAN NOT edit - those referee
sets will hold a closed 'lock' as a sign or something like that.

So, let's say he has a complete blank referee section and starts
planning (it will be about August 2013, just to mention) the system
offers him the matching referees who can be used for this game in a
select box.
The referee must match these requirements:
permission to lead he lead this game in hierarchy?
he's not a member of a club involved in this game
can lead only a max. number of 3 games a day
So, he picks a referee, the save button appears in the box. Nothing
happens until he klicks on it to store the data.
When he does, the schiriID TEXT will be encrypted by GnuPG using his AND
the referees public key.

Same thing happens, when a referee logs in and wants to see the games he
was already planned for:
asking for the unlock phrase, accessing the db,
decrypting the data and displaying it.
I have no idea about the performing issue yet - but I think it's worth a
try :)
I think it will perform nicely for the bhv (country section) maybe for
shv(region section) - but if ALL german referees want to get
their individual games plan and all trainers, players come to my place
too and want it as a PDF file generated. Wooooo....
Disk store caching needed...
You've created a "write once" data system. The only operations the
originator will be able to perform are to delete the record, or replace
the contents blindly (and anyone with delete/replace privileges could do
that unless you implement some sort of database privilege scheme that
doesn't use the PKCS system to lock records to the originator user and
to the destination [reading] user).

See above. Use more public keys to encrypt the data and more people can
see it.
Given those restrictions, you might as well just email the encrypted
data to the reader and forget about storing it anywhere.

That would imply the referees have basic knowledge about PGP or GnuPG
and how to use it. I prefer the way to give them a system, anyone can
use as long as they remember their passwords.
In other words -- you are going to use a single session symmetric
encryption system just to transmit a passphrase to unlock a PKCS key.
AND you want the server keyrings to have both public and secret keys for
each user... If I were paranoid, I sure wouldn't trust that.

That is always the same I run into:
The referee stuff needs to be secure and safe.
But needs to be accessible by the persons who need access to it.
Seems to be a classic 'hen and egg' problem - what's first...

The thing is, that I could give the handball.db to anyone, noone could
get the referee data readable without having the matching private keys.
When PGP
was originally released, the recommendation was that the key-rings
should not even be on the computer hard-drive, but on a floppy that the
user would insert only long enough to perform an encryption/decryption
operation -- and PGP then went and wiped any memory that the key had
passed through. GnuPG, as I recall, is an open source replacement based
on PGP after PGP became a commercial product. Only public keys were ever
given to anyone else.

That is meant to keep your personal data safe like programs, credit
card, bank balances, pictures noone should see - whatever.
Not so for my system I am planning: as soon as the game is played
noone would want to pay the referee for extras :)
 
D

Dennis Lee Bieber

Okay, let's have a look at the scheme.
Every row represents all the data needed to display a single game fully,
here it is:
tables for each season are named like this: Runde[begin-year][end-year]
so:
CREATE TABLE "Runde20122013" (

Is that table name specifying a playing season? What happens next
season -- you create a new table and duplicate all the code or edit the
code to use the new name (and lose all access to the previous seasons)?

The playing season should be a field in the table so that you don't
have to change the application each year.
"Spiel" INTEGER PRIMARY KEY,
"Staffel" TEXT NOT NULL,
"Datum" TEXT,
"Zeit" TEXT,
"Halle" INTEGER,
"Heim" TEXT,
"Gast" TEXT,
"HeimNr" INTEGER,
"GastNr" INTEGER,

These four items appear, to me, to be a weakness. They should,
(again) to me, be a pair of foreign key references (to the same single
table). (warning, the following is pseudo code, not real SQL -- also,
I'm going to be mixing your German names with English names where I'm
creating something totally new)

"HeimID" integer foreign key references Teams (ID),
"GastID" integer foreign key references Teams (ID),

where

create table Teams
(
ID integer primary key autoincrement,
Name text,
Number integer,
unique index (Name, Number)
)

You have only one place where a team name&number appears, and a
combination appears only once. This avoids the possibility of someone
changing "Heim", say, but not changing "HeimNr" at the same time. It
also ensures that you don't have different spellings appear for the same
team.

"Heim" and "Gast" are just the role the teams are taking for that
game; but overall there is no difference between teams, so just one
Teams table. A Teams table could also be expanded to hold information
about where the team is based (a contact address, say); and gives you
something your player list could link to

create table TeamPlayers
(
ID integer primary key autoincrement,
TeamID integer foreign key references Teams(ID),
PlayerID integer foreign key references Players(ID),
StartDate date not null #date the player started on this team
EndDate date default null #date player leaves team
)

create table Players
(
ID integer primary key autoincrement,
Name text,
Address text,
Phone text,
otherstuff
)

"Termin" TEXT, <--- OOPS! Thank you for pointing on the design!!!
"HT" INTEGER,
"GT" INTEGER,
"HP" INTEGER,
"GP" INTEGER,
"melderID" INTEGER,
"bemerkung" TEXT,
"verband" TEXT,
"regional" TEXT,
"landes" TEXT,
"kreis" TEXT,
"schiriID" INTEGER,
"editorID" INTEGER,
"lastchange" TEXT );

The 'Spiel' column holds the number of the game, it's the primary key
which could be used to join data later for a game report, where the
players and the scores they did are inserted by their trainers.
Not implemented yet...

create table PlayerScores
(
ID integer primary key autoincrement,
SpielID integer foreign key references Rundes(Spiel),
PlayerID integer foreign key references Players(ID),
Score integer not null
)
So 'Staffel' = league. There are many similar leagues with the exactly
the same name all over the table, so I need some other rows to get
things sorted correctly.

Just like I did with Teams above -- this should be

StaffelID integer foreign key references Leagues(ID)

with

create table Leagues
(
ID integer primary key autoincrement,
Name text,
whatever-else-is-needed #to fully identify the specific league
)

(oh, by the way, I think what you are calling "rows" are normally called
"columns"; a "row" is one record with everything in the schema)
Together with 'verband','regional','landes','kreis' you can pick the
correct position of this special league in the command structure of

Okay, those are probably the "whatever-else-is-needed".
the german handball federation called 'dhb'.

/dhb/shv/baden/karlsruhe/mJC-KL1 is my team I train.

Halle is the ID-Number of
the place where the game is played.

Since you say "ID-Number", this is probably a candidate for another
foreign key to a table of "Halles" (which could hold a formal name, the
address, etc.)
lastchange represents the last timestamp when the last modification
was done.

melderID represents the one (man or machine), who inserted
the result of the game.

editorID the last real personID who changed
someting.
bemerkung = remarks is for the editor what was edited.

These all come across as fields that should be handled by the
application, and never entered/changed by users (except the remarks
entry)...

See above. Use more public keys to encrypt the data and more people can
see it.

Okay -- looking at GPG sites does indicate it can do multi-recipient
encryption -- but that's likely to result in very large blocks of
cypher-text (depending on what it uses internally -- one copy of the
message per recipient, or a smaller one-time key that is encrypted for
each and then used to decrypt the single message.

I'd still prefer to use, as much as possible, the database system
itself to handle this task...

Stuff the "private" information into another table, and use an
intersection table to link it to those permitted to see it, and to the
particular game (I could over normalize this, but won't)

create table PrivateViewers
(
ID integer primary key autoincrement,
MessageID integer foreign key references Messages(ID),
ViewerID integer foreign key references Viewers(ID),
unique index (SpielID, ViewerID)
)

create table Viewers
(
ID integer primary key autoincrement,
Name text,
Role text #planner or referee, or player, or unprivileged
)

Note: "Viewers" may just be your authorized user login system --
somehow you need to get an ID for each user authorized to make changes
to the data

create table Messages
(
ID integer primary key autoincrement,
SpielID integer foreign key references Rundes(Spiel),
Message text
)

You only allow access to Messages IF the current user is in
PrivateViewers.

select m.Message from Messages as m
inner join PrivateViewers as pv
on m.ID = pv.MessageID
inner join Viewers as v
on pv.ViewerID = v.ID
where m.SpielID = <desired game ID>
and v.Name = <current user name if not identified some other way>
and v.Role in ("planner", "referee"); #others not allowed access
 
S

Steffen Mutter

Dennis said:
Is that table name specifying a playing season?
Yes.

What happens next
season -- you create a new table and duplicate all the code or edit the
code to use the new name (and lose all access to the previous seasons)?

Of course I create a new table for the next season, with exactly the
same code. (Okay, I will leave out those doubled 'Termin' column)
The playing season should be a field in the table so that you don't
have to change the application each year.

I prefer a table, because it's easy to drop or backup whenever I want
:)


You really put much effort to improve my database design, so I print
this out and think about it later, what sounds practicable.

You were thinking about players in the team, and gamereport.
So I will explain how I need to do it and how it is noted down during
the game.

To be a 'player' is just a role for a user.
Users and their personal data reside in the users table.
The role table represents the user permissions, what he can do.

The users table will have an ID for every user which will be generated
when a user registeres. To make it really unique and random (and not
too easy to remember) I'll create it like this:
import uuid, OpenSSL
userID = uuid.UUID(bytes = OpenSSL.rand.bytes(16))

This userID is the primary, unique key which will be placed into other
tables.
Now exact planning is needed, to normalize data to get all things
sorted in a way, the system can display a game (which will open, when
you click on the gamesID (Spiel).

So, 14 players join the game for each club, makes 28 players.
(Professional teams may have 16 which makes 32 players)
Up to three teamleaders for each team,
a secretary and a timekeeper and the referee(s) (max 2).
The numbers of the players shirts may vary from game to game, to make it
more difficult.

These are just the persons :p
The secretary writes down the goal result every time a goal is scored.
(Professionals do this even with the goalgetter and the exact playing
time with their high-end equipment)

The amateur-stuff will look like this
hometeam:guestteam
Example,
0:1, 1:1, 2:1, 3:1, 3:2, 3:3, 4:3, 4:4, 4:5...
When a player gets amonitioned (yellow card) the minute is noted down
(round up 4:11 = 5th minute). When a player gets a 2 minute punishment
(still yellow card) the exact time is noted down, same with
disqualification (red card), with a ban for doing handball sports it
is exactly noted down what happend (this is not public and will be
taken to court)
Example: referee Wilbur McDonald was attacked and knocked out by player
John Smith (No. 12 team Panic Poodles) after a discussion about a
decision of Mr McDonald. Mr John Smith is banned until further notice

All this data shall be displayed according to the status the user has.

Complex, hm :) ?

So, what would be nice is to diplay the data for the game, when you
klick on the game-number. If you klick on a players name, its data would
be nice to show: Games played in the actual seasion games played in
life, goals scored , yellow cards, 2min punishmends and red cards and
persons banned for lifetime won't show up at all :)
BUT: only for those ersons who are allowed to see it: trainers
responsible for the team
the player
the other players in the team

By the way: no handall site offers actually this complex data, but it
would be very impressive if I could offer this.

Okay, but I think I will go to bed now and back to fix some bugs
tomorrow.
 
S

Steffen Mutter

Dennis said:
If you use separate tables you make it more difficult to generate
the SQL (as you have to create the SQL with the season specific table
name, instead of just using a "where ...." clause to restrict data), and
lose the potential to produce reports covering multiple seasons.

I think, when the season is over, the table will be closed to change its
contents, so the history data for this season will be written to
the data tables of the clubs or users.
You gave me a lot of input about databases and how to put them together.
I removed the double time/data stuff from the seasons tables and it
works fine I made a speed-test and it is a bit faster now.

Now there are some features missing, I will put into, a checkbox 'only
today' shows you only the games played today. In combination with a
checkbox 'games without result' gives the visitor an easy way to find
out if a result wasn't reported yet - interesting for coaches, league
managers and players, too.
<snip>
Have you considered defining "views" and "triggers" -- based upon
the user privileges you run queries using views that only expose the
permitted data (and use triggers to permit properly updating the
underlying tables when edited).

I will do the user managment based on the tables in the
user/club/federation tables I am actually thinking about - views are an
interesting feature.
This is very complex, the user table will be the beginning to start
from, where the personal data is stored. userID, name, gender, date of
birth, club and e-mail as must haves.
Based on this there is a role table, where the roles for all persons for
an area in the system will be placed. Not very easy to do, but an
interesting task to plan...
Fuller DBMS would allow you to define access controls on individual
columns -- though it would mean you couldn't made the main database
connection until you know the access level of the user, as you'd connect
using a name/password specific to the level of the user (I'm not saying
each user has a database name/password, though that is an alternative --
it just means you'd have to do a lot of database administration each
time a user is created).

Hopefully my python code will handle that for me :)

I really like SQLite, very easy to use and to backup.
The idea behind the user managment is, that the users manage themselves,
I will only step in when something does not work as it should...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top