[ANN] Struqtural: High level database interface library


N

Nathan Rice

Struqtural makes it easy to get data into a database, and easy to work
with it once it's there. Some of the big features include:

* Automatically generate all tables and relations needed to represent
XML in a database, including one to one, one to many, many to one and
many to many relationships (JSON and YAML support is planned for a
future release), returning mapped objects.
* Automatically generate python objects for a specified table in a
database, with support for discovery of all types of relationships
(sort of like SQL Soup on 4 different kinds of steroids).
* Automatically create persisted representations of python objects in
a database, along with a persistent version of the object class.
* Automatically infer SQL data types and create table representations
for delimited text files (i.e. CSV/TSV), returning mapped objects.
* Easily generate and query EAV/vertical attribute tables.
* Easily generate and query graphs/trees/directed acyclic graphs.
* Easily manage session configuration and creation for multiple databases.
* Easily override almost all behavior with custom code or
configuration variables
* And much more...


Once you're up and running, it's SQL Alchemy under the hood, with a
few tune-ups that you are free to bypass if you don't like them.


Home page: http://www.turnkey-analytics.com/struqtural/
PyPI: http://pypi.python.org/pypi/Struqtural
Source: https://code.launchpad.net/pydatastep
Bugs: https://bugs.launchpad.net/pydatastep


To see Struqtural in action, take a look at the tutorial...
=================================================

Struqtural has been designed to be as smart as possible. Because of
that, most common use cases are retardedly simple. Let’s take a look,
shall we?

First, let’s just examine the most basic use case:
from struqtural.structures.structure import Structure
from struqtural.loaders import loader
data = [{"A":True, "B":1, "C":1.5, "D":"a"},
.... {"A":False, "B":2, "C":2.5, "D":"b"},
.... {"A":True, "B":3, "C":3.5, "D":"c"},
.... {"A":False, "B":4, "C":4.5, "D":"d"}]primary keys: ['B']
# Note that you could have also specified the primary key using an iterabletable columns: ['SimpleInstances.A', 'SimpleInstances.C',
'SimpleInstances.B', 'SimpleInstances.D'].... print true_instance
SimpleInstance(A=True, C=3.5, B=3, D="c")
SimpleInstance(A=False, C=4.5, B=4, D="d").... print small_true_instance
SimpleInstance(A=True, C=1.5, B=1, D="a")
Another nice feature is that Struqtural is pretty flexible about how
you pass it data:
from struqtural.structures.structure import Structure
from struqtural.loaders import loader
more_data = [["T", "1", "1.0"],
.... ["FALSE", "2", "2.0"],
.... ["true", "3", "3.0"],
.... ["False", "4", "4.0"]]
# Note that more_data is strings, which isn't uncommon in the real world
data_headers = ["A", "B", "E"]
categories = Structure(loader.CollectionLoader, (data_headers, more_data), ["B", "E"], "Category") # Just a quick aside to demonstrate that Struqtural gets your table names right
print "table name:", categories.table table name: Categories
for category in categories:
.... print category
....
Category(A=True, B=1, E=1.0)
Category(A=False, B=2, E=2.0)
Category(A=True, B=3, E=3.0)
Category(A=False, B=4, E=4.0)
As you can see the strings have been handled elegantly. Type
conversion is completely controllable and extensible. A small
collection of useful type inference and conversion functions have been
included.

How about if we want to create a new structure out of multiple
previously existing ones:
example_structure = Structure(loader.CollectionLoader, data, "B", "SimpleInstance")
connector = example_structure.connector
categories = Structure(loader.CollectionLoader, (data_headers, more_data), ["B", "E"], "Category", connector)
joint_structure = Structure(loader.JoinLoader, categories, example_structure)
for instance in joint_structure:
.... print instance
Category(A=True, B=1, E=1.0, C=1.5, D="a")
Category(A=False, B=2, E=2.0, C=2.5, D="b")
Category(A=True, B=3, E=3.0, C=3.5, D="c")
Category(A=False, B=4, E=4.0, C=4.5, D="d")
Note that we took advantage of some intelligence on the join we did
there, since we have identically named, identically typed columns in
the two tables. If you needed to be explicit about it you could just
as easily have called structure like so (where the dictionary key is
the column on the left table, in this case categories, and the value
is the column on the joined table):
Next, let’s create persisted versions of pre-existing objects:
.... def __init__(self, a, b, c):
.... self.a = a
.... self.b = b
.... self.c = c
mocks = [Mock(1, 2, 3), Mock(4, 5, 6), Mock(7, 8, 9)]
another_example_structure = Structure(loader.ObjectLoader, mocks, "a")
for mock in another_example_structure.query():
.... print mock
Mock(a=1, c=3, b=2)
Mock(a=4, c=6, b=5)
Mock(a=7, c=9, b=8)
Pretty easy, right? Notice that you got a nice __repr__ for free
there, too. Any type which can be converted to a known SQL primitive
will be mapped automatically by default, though this is easily
configurable. If a variable cannot be directly mapped, or more than
one type occurs on for a given variable name, by default it will be
pickled so that what comes off the database is identical to what went
in, though this can also be easily disabled.

Note
This does not directly instrument pre-existing objects. It populates
the database with the data of the objects. Any changes made to the
original Mock objects in the above example will not be persisted. I
suggest replacing the original objects with the new objects from the
session immediately. Also, rather than modify the original class, a
modified subclass is created, so it is advisable to save the modified
class as well. This was a design decision, however it would be trivial
to switch, so the behavior may change depending on user feedback.
Now on to something a little bit more complicated... Struqtural can
automatically generate a schema from XML. As a basic example, let’s
use the following document:

<Sales>
<Order>
<Number>123</Number>
<Date>10/29/00</Date>
<Customer>
<CustNum>007</CustNum>
<Name>Bond, Inc.</Name>
</Customer>
<Line>
<LineNum>1</LineNum>
<Quantity>3</Quantity>
<Part>
<PartNum>ABC</PartNum>
<Price>12.95</Price>
</Part>
</Line>
</Order>
from struqtural.structures.structure import MultiStructure
from struqtural.loaders import loader, file as file_
multi = MultiStructure(file_.XmlMultiLoader, fake_file_1)
order_structure = multi.properties["Order"]
for order in order_structure.query():
.... print "Order number:", order.number
.... print "Order date:", order.date
.... print "Order customer count:", len(order.customers)
.... for customer in order.customers:
.... print "Order customer:", customer
.... print "Order line count:", len(order.lines)
.... for line in order.lines:
.... print "Order line:", line
.... print "Order line part count:", len(line.parts)
.... for part in line.parts:
.... print "Order line part:", part
....
Order number: 123
Order date: 2000-10-29
Order customer count: 1
Order customer: Customer(order_id=1, custnum=7, name="Bond, Inc.", id=1)
Order line count: 1
Order line: Line(order_id=1, linenum=1, id=1, quantity=3)
Order line part count: 1
Order line part: Part(line_id=1, price=12.95, id=1, partnum="ABC")
Ok, Struqtural automatically figured out all the relations and
properties from the XML file, that’s pretty neat. Because we didn’t
set any tag primary keys or provide a SchemaFormatter object with a
primary key identification engine, an id key is automatically added.
Since this behavior is typically not desirable, it is very easy to
override.

Note
You will probably notice that our customer has an order_id... In this
case, that customer appears only on the one Order, so that is not too
odd. Had the customer appeared on multiple orders, the order would
have had a customer_id. Even better, if multiple customers appeared on
any order, a secondary table would be automatically created to handle
the relationship, as we’ll demonstrate in a minute.
So Struqtural can get data into a database easily, you say. What if my
data is already in the database? What can you do for me? No problem,
check it out:
from struqtural.structures.structure import MultiStructure
from struqtural.loaders import loader, file as file_
multi = MultiStructure(file_.XmlMultiLoader, fake_file_1)
order_structure = multi.properties["Order"]
connector = order_structure.connector
# This isn't strictly needed, but gives us nice, informative names.
# The key in names is the table name, the value is the name you would
like to give
# the instance you are creating to represent that table.
names = {"Customers":"Customer", "Lines":"Line", "Parts":"Part"}
multi = MultiStructure(loader.DatabaseMultiLoader, "Orders", connector, names)
order_structure = multi.properties["Orders"]
for order in order_structure.query():
.... print "Order number:", order.number
.... print "Order date:", order.date
.... print "Order customer count:", len(order.customers)
.... for customer in order.customers:
.... print "Order customer:", customer
.... print "Order line count:", len(order.lines)
.... for line in order.lines:
.... print "Order line:", line
.... print "Order line part count:", len(line.parts)
.... for part in line.parts:
.... print "Order line part:", part
Order number: 123
Order date: 2000-10-29
Order customer count: 1
Order customer: Customer(order_id=1, custnum=7, name="Bond, Inc.", id=1)
Order line count: 1
Order line: Line(order_id=1, linenum=1, id=1, quantity=3)
Order line part count: 1
Order line part: Part(line_id=1, price=12.95, partnum="ABC", id=1)
You just got the “Orders” table and everything connected with it
(including stuff that points to it as well as stuff it points to)
loaded and mapped, and it took almost no work! You’re free to do
interesting stuff! Pretty awesome, huh?

Let’s push things to the edge now with a quick demo of many to many
relationship support. For this example we’re going to be using the
following XML:

<Departments>
<Department>
<DeptNum>123</DeptNum>
<DeptName>Sales</DeptName>
<Employee>
<Number>143</Number>
<Name>Raul Lopez</Name>
</Employee>
<Employee>
<Number>687</Number>
<Name>John Smith</Name>
</Employee>
<Employee>
<Number>947</Number>
<Name>Ming Chu</Name>
</Employee>
</Department>
<Department>
<DeptNum>456</DeptNum>
<DeptName>Marketing</DeptName>
<Employee>
<Number>157</Number>
<Name>Jim Jones</Name>
</Employee>
<Employee>
<Number>687</Number>
<Name>John Smith</Name>
</Employee>
<Employee>
<Number>947</Number>
<Name>Ming Chu</Name>
</Employee>
</Department>
</Departments>
Note that we’re also going to be passing tag primary keys here, so
that things work properly:
from struqtural.structures.structure import MultiStructure
from struqtural.loaders import loader, file as file_
primary_keys = {"Department":"DeptNum", "Employee":"Number"}
multi = MultiStructure(file_.XmlMultiLoader, fake_file_2, .... primary_keys=primary_keys)
department_structure = multi.properties["Department"]
for department in department_structure.query():
.... print "Department number:", department.deptnum
.... print "Employee count:", len(department.employees)
.... for employee in department.employees:
.... print "Employee name:", employee.name
.... print "Employee number:", employee.number
....
Department number: 123
Employee count: 3
Employee name: Raul Lopez
Employee number: 143
Employee name: John Smith
Employee number: 687
Employee name: Ming Chu
Employee number: 947
Department number: 456
Employee count: 3
Employee name: Jim Jones
Employee number: 157
Employee name: John Smith
Employee number: 687
Employee name: Ming Chu
Employee number: 947
If you already have a database with many to many relationships, that
is no problem. As long as the secondary tables for the relationship
don’t contain any non-foreign key columns or foreign keys for more
than two tables, you’re in good shape. For example:
from struqtural.structures.structure import MultiStructure
from struqtural.loaders import loader, file as file_
primary_keys = {"Department":"DeptNum", "Employee":"Number"}
multi = MultiStructure(file_.XmlMultiLoader, fake_file_2, .... primary_keys=primary_keys)
department_structure = multi.properties["Department"]
connector = department_structure.connector
names = {"Departments":"Department", "Employees":"Employee"}
multi = MultiStructure(loader.DatabaseMultiLoader, "Departments", connector, .... names, backref_by_default=True)
departments = multi.properties["Departments"]
for department in department_structure.query():
.... print "Department number:", department.deptnum
.... print "Employee count:", len(department.employees)
.... for employee in department.employees:
.... print "Employee name:", employee.name
.... print "Employee number:", employee.number
....
Department number: 123
Employee count: 3
Employee name: Raul Lopez
Employee number: 143
Employee name: John Smith
Employee number: 687
Employee name: Ming Chu
Employee number: 947
Department number: 456
Employee count: 3
Employee name: Jim Jones
Employee number: 157
Employee name: John Smith
Employee number: 687
Employee name: Ming Chu
Employee number: 947
You will note that in all these examples we used a memory resident
SQLite database. If you want to use a real database, you will need to
pass a connector to your structure. You create one like so:
That’s it! In fact, Struqtural will even automatically look in a few
logical places for your configuration information, so you can set it
up once and you’ll never have to pass it the location of a
configuration file again. Just to give you an example of the format
for the configuration file (this is included in the resources
subdirectory):

[current engine]
engine=<db engine>

[parameter values]
user=<user>
pass=<pass>
host=<host>
db=<database>

[parameter lists]
mysql=mysql://{user}:{pass}@{host}/{db}
postgresql=postgresql://{user}:{pass}@{host}/{db}
Note
The configuration file shown here does not include driver, port or
other information that you would probably want to be able to specify
in a real case. Those parameters are supported, so if you modify the
configuration to include them, the
struqtural.database.connector.Databaseconnector will work as expected.
 
Ad

Advertisements

S

sturlamolden

Let’s push things to the edge now with a quick demo of many to many
relationship support. For this example we’re going to be using the
following XML:

<Departments>
    <Department>
        <DeptNum>123</DeptNum>
        <DeptName>Sales</DeptName>
        <Employee>
            <Number>143</Number>
            <Name>Raul Lopez</Name>
        </Employee>
        <Employee>
            <Number>687</Number>
            <Name>John Smith</Name>
        </Employee>
        <Employee>
            <Number>947</Number>
            <Name>Ming Chu</Name>
        </Employee>
    </Department>
    <Department>
        <DeptNum>456</DeptNum>
        <DeptName>Marketing</DeptName>
        <Employee>
            <Number>157</Number>
            <Name>Jim Jones</Name>
        </Employee>
        <Employee>
            <Number>687</Number>
            <Name>John Smith</Name>
        </Employee>
        <Employee>
            <Number>947</Number>
            <Name>Ming Chu</Name>
        </Employee>
    </Department>
</Departments>


Oh yes, I'd rather write pages of that rather than some SQL in a
Python string.
 
E

Emile van Sebille

Let’s push things to the edge now with a quick demo of many to many
relationship support. For this example we’re going to be using the
following XML:

<Departments>
<Department>
<DeptNum>123</DeptNum>
<DeptName>Sales</DeptName>
<Employee>
<Number>143</Number>
<Name>Raul Lopez</Name>
</Employee>
<Employee>
<Number>687</Number>
<Name>John Smith</Name>
</Employee>
<Employee>
<Number>947</Number>
<Name>Ming Chu</Name>
</Employee>
</Department>
<Department>
<DeptNum>456</DeptNum>
<DeptName>Marketing</DeptName>
<Employee>
<Number>157</Number>
<Name>Jim Jones</Name>
</Employee>
<Employee>
<Number>687</Number>
<Name>John Smith</Name>
</Employee>
<Employee>
<Number>947</Number>
<Name>Ming Chu</Name>
</Employee>
</Department>
</Departments>


Oh yes, I'd rather write pages of that rather than some SQL in a
Python string.
[/QUOTE]

That's not the point. I've got examples of XML content that I don't
create that could be tamed quite easily (if I understand from a quick
once over).

This looks really interesting. I've got to go read more now...

Emile
 
Ad

Advertisements

N

Nathan Rice

Oh yes, I'd rather write lines of that rather than pages of SQL in a Python string.

(not to mention, avoid some easy to fall into security flaws, not have
to worry about porting dialect specific SQL code, etc, etc).

Fixed that for you. I can't take the credit for that part though,
that magic comes from SQL Alchemy. All Struqtural does is let you
move data into a database easily, move data from a database to Python
easily, and simplify the creation of some common complex schemas.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top