How can I reduce the number of queries to my PostgreSQL database?

S

SR

As a starter project for learning Python/PostgreSQL, I am building a
Books database that stores information on the books on my bookshelf.

Say I have three tables.

Table "books" contains rows for book_id, title, subtitle, ISBN.

Table "authors" contains rows for author_id, author surname, author
first names, biographical notes.

Table "bookauthors" contains two rows: book_id, author_id.

The bookauthors table links the books and authors tables.

Scenario: I have a python script which creates web page listing all
books in the database, and all authors for each book. My python script
does essentially three things:

1. retrieve a list of all book_ids and book_titles.

2. for each book_id, query the bookauthors table and retrieve all
author names for that book_id.

3. display it all out as an html table on a web page.

The script works fine, if a little slow. I think that's because if I
have 50 books in my database, my script performs 51 database queries (1
for all book names; then 1 for each book). A colleague of mine
suggested that I could get away with two queries, 1 to read the book
ids and titles, and 1 to read the bookauthors table to pull in *all*
relations, and then do all the work in Python.

I think I know where he's coming from, but I don't know where to begin.
Any clues? Is there a specific name for this technique?
 
F

Frank Millman

SR said:
As a starter project for learning Python/PostgreSQL, I am building a
Books database that stores information on the books on my bookshelf.

Say I have three tables.

Table "books" contains rows for book_id, title, subtitle, ISBN.

Table "authors" contains rows for author_id, author surname, author
first names, biographical notes.

Table "bookauthors" contains two rows: book_id, author_id.

The bookauthors table links the books and authors tables.

Scenario: I have a python script which creates web page listing all
books in the database, and all authors for each book. My python script
does essentially three things:

1. retrieve a list of all book_ids and book_titles.

2. for each book_id, query the bookauthors table and retrieve all
author names for that book_id.

3. display it all out as an html table on a web page.

The script works fine, if a little slow. I think that's because if I
have 50 books in my database, my script performs 51 database queries (1
for all book names; then 1 for each book). A colleague of mine
suggested that I could get away with two queries, 1 to read the book
ids and titles, and 1 to read the bookauthors table to pull in *all*
relations, and then do all the work in Python.

I think I know where he's coming from, but I don't know where to begin.
Any clues? Is there a specific name for this technique?

The specific name you are looking for is to 'join' tables. There will
be many references and tutorials available, but I suggest you start
with the PostgreSQL tutorial, which is part of the documentation
supplied with PostgreSQL.

Here is a link to the 'join' command in the online manual.

http://www.postgresql.org/docs/8.1/interactive/tutorial-join.html

HTH

Frank Millman
 
M

Martin Christensen

SR> Scenario: I have a python script which creates web page listing
SR> all books in the database, and all authors for each book. My
SR> python script does essentially three things:

SR> 1. retrieve a list of all book_ids and book_titles.

SR> 2. for each book_id, query the bookauthors table and retrieve all
SR> author names for that book_id.

SR> 3. display it all out as an html table on a web page.

That's one query, if you're willing to make it advanced enough,
although you need to make an aggregate to enable PostgreSQL to
concatenate and comma separate author names. However, this aggregate
will typically need more than one database function. Such an aggregate
could be as follows:

CREATE OR REPLACE FUNCTION author_agg_sfunc(TEXT, authors.name%TYPE)
RETURNS TEXT AS '
SELECT $1 || '', '' || $2;
' LANGUAGE sql;

CREATE OR REPLACE FUNCTION author_agg_ffunc(TEXT)
RETURNS TEXT AS '
SELECT trim(trailing '', '' from $1);
' LANGUAGE sql;

CREATE AGGREGATE author_agg (
basetype = VARCHAR(100),
sfunc = author_agg_sfunc,
stype = TEXT,
finalfunc = author_agg_ffunc,
initcond = ''
);

Then you could use it as follows:

SELECT author_agg(authors.name),
foo,
bar
FROM authors, writes, books
WHERE authors.id = writes.author_id
AND writes.book_id = books.id
GROUP BY foo, bar;

This is the solution that I would use after working nearly a decade
with databases. It is neither simple nor obvious to the novice, but
it's the Right Way To Do It. For a learning exercise, this is way over
the top, but I thought you might benefit from seeing that - so long as
you only need information that would reasonably fit in one table on a
web page or the equivalent - one query is always enough. Or perhaps
that should be One Query Is Always Enough. :) Learn at your own pace,
though, but you might want to keep this in mind for future reference.

Martin
 
D

Dennis Lee Bieber

As a starter project for learning Python/PostgreSQL, I am building a
Books database that stores information on the books on my bookshelf.

Say I have three tables.
Only three? <G> {My never completed design runs more like:

Book(_ID_, Title, Description, Keywords, ISBN, ShelfNumber, pubID,
categoryID, formatID)
Author(_ID_, Lastname, Firstname)
BookAuthor(_bookID, authorID_)
Publisher(_ID_, PublisherName, ...)
Category(_ID_, CategoryName, Description)
Format(_ID_, Format) #paperback, HC, tradePB, etc.
}
Table "books" contains rows for book_id, title, subtitle, ISBN.
"rows" are records in simple nomenclature... I think you mean
"columns" -- each "row" has all four columns...
Table "authors" contains rows for author_id, author surname, author
first names, biographical notes.

Table "bookauthors" contains two rows: book_id, author_id.

The bookauthors table links the books and authors tables.

Scenario: I have a python script which creates web page listing all
books in the database, and all authors for each book. My python script
does essentially three things:
said:
I think I know where he's coming from, but I don't know where to begin.
Any clues? Is there a specific name for this technique?

Read the documentation for your RDBM for specifics, or any text on
SQL for generics. Two subjects: normalization, and joins.

At the moment, you've already normalized the tables for storage...
To reduce the query overhead, you need a SELECT statement that uses a
JOIN to return "unnormalized" data.

Something like {untested... Might need to do a subselect for the
second JOIN}:

SELECT book_id, title, subtitle, ISBN, surname, firstname, notes from
books
LEFT OUTER JOIN bookauthors on books.book_id = bookauthors.book_id
JOIN authors on bookauthors.author_id = authors.author_id
ORDER BY books.book_id

The reason for the LEFT OUTER JOIN, if I recall the syntax, is to
ensure that you get any books that don't have any authors. The sort
order is to: one) make sure the records are grouped properly for later
processing

The output will duplicate the book information for those books that
have multiple authors (the simple meaning of "unnormalized"):

2,A Book, Of Nothing, 123, Who, Guess, something
2,A Book, Of Nothing, 123, Second, I'm, or other

To make your report, you would output the book specific information
only when it changes (this means you need to initialize a temp record to
null data, and compare each record to the temp; when the compare fails,
put out the new book data, and copy it into the temp -- in this example,
just saving the book ID number would be sufficient, as long as it is a
unique/primary key). THEN, put out the Author information. If the
comparison of book data passes, it is the same book with an additional
author, you just need to output the author data.

tmp_bookID = None
for bk in theCursor:
if tmp_bookID != bk[0]: #assumes book_id is first field
Output_Book_Data(bk)
tmp_bookID = bk[0]
Output_Author_Data(bk)

--
 
A

Alan Morgan

As a starter project for learning Python/PostgreSQL, I am building a
Books database that stores information on the books on my bookshelf.

Say I have three tables.

Table "books" contains rows for book_id, title, subtitle, ISBN.

Table "authors" contains rows for author_id, author surname, author
first names, biographical notes.

Table "bookauthors" contains two rows: book_id, author_id.

The bookauthors table links the books and authors tables.

Scenario: I have a python script which creates web page listing all
books in the database, and all authors for each book. My python script
does essentially three things:

1. retrieve a list of all book_ids and book_titles.

2. for each book_id, query the bookauthors table and retrieve all
author names for that book_id.

3. display it all out as an html table on a web page.

The script works fine, if a little slow. I think that's because if I
have 50 books in my database, my script performs 51 database queries (1
for all book names; then 1 for each book). A colleague of mine
suggested that I could get away with two queries, 1 to read the book
ids and titles, and 1 to read the bookauthors table to pull in *all*
relations, and then do all the work in Python.

I think I know where he's coming from, but I don't know where to begin.
Any clues? Is there a specific name for this technique?

Yup. The technique is called "using a relational database". This is
precisely the sort of thing SQL does well. Let's say you want to find
out who wrote 'The Hitchhikers Guide to the Galaxy'. You could do the
following (all sql untested and, let's face it, probably not understood
by author):

1. Query for that book to get the book_id
SELECT id FROM books WHERE title='The Hitchhikers Guide To The Galaxy'

2. Look up that author id in the bookauthor database
SELECT author_id FROM bookauthors WHERE book_id=<book id>

3. Look up that author in the author database
SELECT name FROM authors WHERE id=<author id>

or do

SELECT name FROM authors, books, bookauthors
WHERE books.id=bookauthors.book_id
AND authors.id=bookauthors.author_id
AND title='The Hitchhikers Guide To The Galaxy'

Slick, no? You want something like:

SELECT title, name, book_id FROM authors, books, bookauthors
WHERE books.id=bookauthors.book_id
AND authors.id=bookauthors.author_id

If you have more than one author for a book then the book will
appear in the table multiple times. You'll have to combine
those yourself (the book_id row can help here. I don't know
if you can leverage more SQL for that job).

You can optimize some of these SQL queries if you like.
Optimizing JOINs, which is what these are) is a serious
business, but for piddly databases of this size it really
isn't necessary.

Alan
 
L

Lawrence D'Oliveiro

"SR said:
The script works fine, if a little slow. I think that's because if I
have 50 books in my database, my script performs 51 database queries (1
for all book names; then 1 for each book)

If your database is that small, why bother with sophisticated
relational-database queries at all? Why not just load everything into
memory, and use sets and dicts and things to put it all together?

This is feasible for databases with up to thousands or even tens of
thousands of records in them, on today's machines.
 
S

SR

Martin Christensen said:
SR> Scenario: I have a python script which creates web page listing
SR> all books in the database, and all authors for each book. My
SR> python script does essentially three things:

SR> 1. retrieve a list of all book_ids and book_titles.

SR> 2. for each book_id, query the bookauthors table and retrieve all
SR> author names for that book_id.

SR> 3. display it all out as an html table on a web page.

That's one query, if you're willing to make it advanced enough,
although you need to make an aggregate to enable PostgreSQL to
concatenate and comma separate author names. However, this aggregate
will typically need more than one database function. Such an aggregate
could be as follows:

This is the solution that I would use after working nearly a decade
with databases. It is neither simple nor obvious to the novice, but
it's the Right Way To Do It. For a learning exercise, this is way over
the top, but I thought you might benefit from seeing that - so long as
you only need information that would reasonably fit in one table on a
web page or the equivalent - one query is always enough. Or perhaps
that should be One Query Is Always Enough. :) Learn at your own pace,
though, but you might want to keep this in mind for future reference.

Thanks for that... I'm not going to argue with a decade's experience!
I'd never heard of aggregates before, but I'll look into them. Perhaps
I'll be able to impress my friends with them one day.

The reason for keeping the authors separate was to wrap them with an
appropriate HTML href, but presumably your solution could be adapted
for this purpose?

Cheers,

Shay
 
S

SR

Say I have three tables.
Only three? <G>

Well, yeah, OK, it's more than that, but after years of being worn away
by "Post a minimal example" requests on comp.text.tex, a minimal
example is what you got...
Something like {untested... Might need to do a subselect for the
second JOIN}:

SELECT book_id, title, subtitle, ISBN, surname, firstname, notes from
books
LEFT OUTER JOIN bookauthors on books.book_id = bookauthors.book_id
JOIN authors on bookauthors.author_id = authors.author_id
ORDER BY books.book_id

The reason for the LEFT OUTER JOIN, if I recall the syntax, is to
ensure that you get any books that don't have any authors. The sort
order is to: one) make sure the records are grouped properly for later
processing

Thanks for the stuff on LEFT OUTER JOIN. Authorless books would be one
of those things I wouldn't have noticed going astray.
The output will duplicate the book information for those books that
have multiple authors (the simple meaning of "unnormalized"):

2,A Book, Of Nothing, 123, Who, Guess, something
2,A Book, Of Nothing, 123, Second, I'm, or other

I think this goes along with what I thought of immediately after
posting the question: one query to gather all info needed, then
post-process in Python to order it all (so *that's* why I posted
here...). My thoughts had been to turn

[ 1, "Puppetry", "Bill" ]
[ 1, "Puppetry", "Ben" ]
[ 1, "Puppetry", "Flowerpot Men" ]

into

[ 1, "Puppetry", [ "Bill", "Ben", "Flowerpot Men" ] ]

(if that's not overcomplicating it a bit)...
To make your report, you would output the book specific information
only when it changes (this means you need to initialize a temp record to
null data, and compare each record to the temp; when the compare fails,
put out the new book data, and copy it into the temp -- in this example,
just saving the book ID number would be sufficient, as long as it is a
unique/primary key). THEN, put out the Author information. If the
comparison of book data passes, it is the same book with an additional
author, you just need to output the author data.

tmp_bookID = None
for bk in theCursor:
if tmp_bookID != bk[0]: #assumes book_id is first field
Output_Book_Data(bk)
tmp_bookID = bk[0]
Output_Author_Data(bk)

.... which appears to be along the lines of what your code does! (Where
Output_Author_Data(bk) could append to the author list of the current
book.

I'll go away and see how I can 'adapt' your example code.

Thanks!

Shay
 
D

Dennis Lee Bieber

Well, yeah, OK, it's more than that, but after years of being worn away
by "Post a minimal example" requests on comp.text.tex, a minimal
example is what you got...
<heh> If you were supplying an SQL backup/dump of the schema and
data, yeah...
Thanks for the stuff on LEFT OUTER JOIN. Authorless books would be one
of those things I wouldn't have noticed going astray.
Hopefully you never have to worry about a bookless author said:
I think this goes along with what I thought of immediately after
posting the question: one query to gather all info needed, then
post-process in Python to order it all (so *that's* why I posted
here...). My thoughts had been to turn

[ 1, "Puppetry", "Bill" ]
[ 1, "Puppetry", "Ben" ]
[ 1, "Puppetry", "Flowerpot Men" ]

into

[ 1, "Puppetry", [ "Bill", "Ben", "Flowerpot Men" ] ]

(if that's not overcomplicating it a bit)...
said:
... which appears to be along the lines of what your code does! (Where
Output_Author_Data(bk) could append to the author list of the current
book.
Well, I was sort of focused on your mention of "web page listing"...

I had some old MySQL dump at work (they discovered I had MySQL
running on my desktop and I had to remove it). With some modifications,
I was able to feed the data into SQLite (since SQLite is not a server,
per se, but just a file handler it should be invisible to the corporate
security scans). I then hacked up this little thing:

booklist.py
-=-=-=-=-=-=-=-=-
#
# Simple test using SQLite and a port of test data from MySQL
#

from pysqlite2 import dbapi2 as sqlite
import os

DATABASE = "booklist"
OUTPUT = "booklist.html"

THESELECT = """
select title.ID,
Title,
publisher.Name,
Form,
Last,
First from title
left outer join title_author
on title.ID = Title_ID
join author
on Author_ID = author.ID
join publisher
on Publisher_ID = publisher.ID
join format
on Format_ID = format.ID
order by Title,
title_author.ID
"""

HTMLSTART = """<html>
<head>
<title>Simple Database Retrieval Test</title>
</head>
<body bgcolor="#AAAAAA">
<table width="95%" align="center" frame="box" border="3">
<tr><th>Title</th><th>Publisher</th><th>Format</th><th>Author(s)</th></tr>
"""


html = open(OUTPUT, "w")
html.write(HTMLSTART)

con = sqlite.connect(DATABASE)
crs = con.cursor()

crs.execute(THESELECT)

lastID = None

for row in crs:
(ID, Title, Publisher, Format, Last, First) = row
if lastID != ID:
if lastID:
html.write("</table>\n</td>\n</tr>\n")
html.write("""<tr>
<td>%s</td><td>%s</td><td>%s</td><td>
<table border="0">""" %
(Title, Publisher, Format))
lastID = ID

if Last and First:
html.write("<tr><td>%s, %s</td></tr>\n" %
(Last, First))
elif not Last:
html.write("<tr><td>%s</td></tr>\n" % First)
elif not First:
html.write("<tr><td>%s</td></tr>\n" % Last)


html.write("</table>\n</td>\n</tr>\n</table>\n</body></html>")

html.close()
crs.close()
con.close()

os.startfile(OUTPUT) #windows only
-=-=-=-=-=-=-=-=-=-

It generated the following HTML (hope it isn't too long... Cut and
save everything between the -=-=-=-=-=-=- and save with as some .html
file, then open it in a browser...

-=-=-=-=-=-=-=-=-
<html>
<head>
<title>Simple Database Retrieval Test</title>
</head>
<body bgcolor="#AAAAAA">
<table width="95%" align="center" frame="box" border="3">
<tr><th>Title</th><th>Publisher</th><th>Format</th><th>Author(s)</th></tr>
<tr>
<td>Access Database: Design &
Programming</td><td>O'Reilly</td><td>Soft-cover</td><td>
<table border="0"><tr><td>Roman, Steven</td></tr>
</table>
</td>
</tr>
<tr>
<td>Ada for Software Engineers</td><td>John Wiley &
Sons</td><td>Soft-cover</td><td>
<table border="0"><tr><td>Ben-Ari, M.</td></tr>
</table>
</td>
</tr>
<tr>
<td>Learning Python</td><td>O'Reilly</td><td>Soft-cover</td><td>
<table border="0"><tr><td>Lutz, Mark</td></tr>
<tr><td>Ascher, David</td></tr>
</table>
</td>
</tr>
<tr>
<td>Linux in a
Nutshell</td><td>O'Reilly</td><td>Soft-cover</td><td>
<table border="0"><tr><td>Hekman, Jessica Perry</td></tr>
</table>
</td>
</tr>
<tr>
<td>MySQL Administrator's Guide</td><td>MySQL
Press</td><td>Soft-cover</td><td>
<table border="0"><tr><td>MySQL AB</td></tr>
</table>
</td>
</tr>
<tr>
<td>MySQL Language Reference</td><td>MySQL
Press</td><td>Soft-cover</td><td>
<table border="0"><tr><td>MySQL AB</td></tr>
</table>
</td>
</tr>
<tr>
<td>MySQL Reference Manual</td><td>O'Reilly Community
Press</td><td>Soft-cover</td><td>
<table border="0"><tr><td>Widenius, Michael
"Monty"</td></tr>
<tr><td>Axmark, David</td></tr>
<tr><td>MySQL AB</td></tr>
</table>
</td>
</tr>
<tr>
<td>Programming Perl, 2nd
Edition</td><td>O'Reilly</td><td>Soft-cover</td><td>
<table border="0"><tr><td>Wall, Larry</td></tr>
<tr><td>Christiansen, Tom</td></tr>
<tr><td>Schwartz, Randal L.</td></tr>
</table>
</td>
</tr>
<tr>
<td>Python & XML</td><td>O'Reilly</td><td>Soft-cover</td><td>
<table border="0"><tr><td>Jones, Christopher A.</td></tr>
<tr><td>Drake Jr., Fred L.</td></tr>
</table>
</td>
</tr>
<tr>
<td>Python and Tkinter
Programming</td><td>Manning</td><td>Soft-cover</td><td>
<table border="0"><tr><td>Grayson, John E.</td></tr>
</table>
</td>
</tr>
<tr>
<td>SQL in a
Nutshell</td><td>O'Reilly</td><td>Soft-cover</td><td>
<table border="0"><tr><td>Kline, Kevin</td></tr>
<tr><td>Kline, Daniel</td></tr>
</table>
</td>
</tr>
<tr>
<td>UML in a
Nutshell</td><td>O'Reilly</td><td>Soft-cover</td><td>
<table border="0"><tr><td>Si Alhir, Sinan</td></tr>
</table>
</td>
</tr>
</table>
</body></html>
-=-=-=-=-=-=-=-=-
--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top