Storing files in a BLOB field via SQL

J

Juergen Gerner

Hello Python fans,

I'm trying and searching for many days for an acceptable solution...
without success. I want to store files in a database using BLOB
fields. The database table has an ID field (INT, auto_increment), an
ORDER field (INT, for knowing the right order) and a "normal" BLOB
field. It is planned to split large files in 64k-parts and sort these
parts by the ORDER field.

Here's some pseudo code how I wanted to implement this in my app:

file = file_open(myFileName, read_only)
order = 0
data = file.read(65535)
while (data):
query = "INSERT INTO table (order,data) VALUES (%i,%s)", order, data
mysql_exec(query)
order = order + 1
data = file.readBlock(65535)

The main problem is the handling of the binary data. There might be
errors in the SQL syntax if some special chars (quotas etc.) appear,
or the SQL statement is incorrect because of strange chars that can't
be encoded by the current codeset. Another problem is, you can't strip
these chars because you would change the binary data or make it bigger
than 64k.

Does anybody of you have an idea?
Any suggestions would be very helpful.

Additionally, I want to compress the data and store a checksum
somewhere. Any hint (links, sites, ...) is welcome...!

Thanks in advance,
Juergen
 
I

Irmen de Jong

Juergen said:
field. It is planned to split large files in 64k-parts and sort these
parts by the ORDER field.

Is there a special reason why you can't store the whole file in a
single BLOB? That's what it's a BLOB for, after all... L=Large :)

Additionally, I want to compress the data and store a checksum
somewhere. Any hint (links, sites, ...) is welcome...!

Compress: look at the zlib module (gzip-style compression)
Checksum: what kind? 32 bit-crc: zlib.crc32 md5: md5.md5

--Irmen
 
J

Juergen Gerner

Hi Irmen,

first of all, thanks for your help about compression & checksum!
Is there a special reason why you can't store the whole file in a
single BLOB? That's what it's a BLOB for, after all... L=Large :)

Yes, there's a special reason. After reading a lot of documentation I
think it's better splitting large files in little blobs. It doesn't
matter if the SQL server is on the same machine as the application,
but if both parts are on different machines, large files have to be
transmitted over the network. During this transfer the application
isn't responding, I guess. So splitting would be much more flexible.
Additionally I think, splitting files makes the database more scalable
and the space on the harddrive better used.

But the splitting isn't my main problem. It's the way I transmit the
binary data to the database via an SQL syntax. Today I saw how
PhpMyAdmin handles binary data: it codes each byte in hexadecimal
values ("\0x..."). Is there any way to do something (or similar) with
Python, or maybe with PyQt/QString/QByteArray?

Thanks in advance!
Juergen
 
I

Irmen de Jong

Juergen said:
Yes, there's a special reason. After reading a lot of documentation I
think it's better splitting large files in little blobs. It doesn't
matter if the SQL server is on the same machine as the application,
but if both parts are on different machines, large files have to be
transmitted over the network. During this transfer the application
isn't responding, I guess. So splitting would be much more flexible.

How would splitting the file in chunks improve the responsiveness
of the application? This would only work if your app needs only
a specific chunk of the larger file to work on. If you need to read
the full file, reading 10 chunks will take even longer than reading
one big BLOB.
You may decide to do it 'in the background' using a thread, but
then again, you could just as well load the single big BLOB inside
that separate thread.
Additionally I think, splitting files makes the database more scalable
and the space on the harddrive better used.

In my humble opinion these kind of assumptions are generally false.
Let the database decide what the most efficient storage method is
for your 100 Mb BLOB. I don't want to make these kind of assumptions
about the inner workings of my database server, and I certainly don't
want to wire them into my application code... what happens when you
switch platforms/DBMS? Is your code still 'the most efficient' then?
Just my €0.02
But the splitting isn't my main problem. It's the way I transmit the
binary data to the database via an SQL syntax.

Sorry can't help you with this. I would expect the database driver module
to do the 'right' escaping.


--Irmen
 
M

Michael Porter

Juergen Gerner said:
Hello Python fans,

The main problem is the handling of the binary data. There might be
errors in the SQL syntax if some special chars (quotas etc.) appear,
or the SQL statement is incorrect because of strange chars that can't
be encoded by the current codeset. Another problem is, you can't strip
these chars because you would change the binary data or make it bigger
than 64k.

Does anybody of you have an idea?
Any suggestions would be very helpful.

You can either use the MySQL hex literal format (x'AABBCC'...) or use the
Python DB API which will handle the parameter conversion for you...

In the first case your query becomes somethings like:

query = "INSERT INTO table (order,data) VALUES (%i,x'%s')" % (order,
data.encode('hex'))

In the second, preferable version you use something like:

cur = conn.cursor()
cur.execute("INSERT INTO table (order,data) VALUES (?,?)", (order, data))

and the DBAPI/Database driver takes care of the rest.

Additionally, I want to compress the data and store a checksum
somewhere. Any hint (links, sites, ...) is welcome...!

compressed = data.encode('zip') # Compress the data


Mike.
 
B

Bruno Widmann

Juergen said:
But the splitting isn't my main problem. It's the way I transmit the
binary data to the database via an SQL syntax. Today I saw how
PhpMyAdmin handles binary data: it codes each byte in hexadecimal
values ("\0x..."). Is there any way to do something (or similar) with
Python, or maybe with PyQt/QString/QByteArray?

Don't know if this is of help to you, but i use the following
to insert binary data into a MS-SQL Db with ADO.
Suppose var "rawdata" contains the binary data:

def bcd2str(bcs):
""" converts a BCD coded string to a ascii coded string

Note: does also work for all hex values, ie. '\x2d' """

out = ''
for c in bcs:
out = out + (hex(ord(c))[2:]).zfill(2)
return out


def str2hex(s):
""" converts binary byte (hex 0x00 - 0xff)
data in a python string into format needed to
insert into binary datatype on sql server """

return '0x' + bcd2str(s)


insertstring = "insert into foo (sID, RawData) VALUES (%s, %s)" \
% (str2hex(rawdata))
adoconn.Execute(insertstring)


regards,
Bruno
 
F

francois lepoutre

Hi
I'm trying and searching for many days for an acceptable solution...
without success. I want to store files in a database using BLOB
fields.

This is a reasonable data organization scheme.
The database table has an ID field (INT, auto_increment), an
ORDER field (INT, for knowing the right order) and a "normal" BLOB
field. It is planned to split large files in 64k-parts and sort these
parts by the ORDER field.

Why split the BLOBs in 64k parts? A db works with pages blocks
that will not map these parts anyway. The structure of the database
should map your "application logic" not the other way round.

In a perfect world your blob data should be stored "as is",
whatever their size with no translation of any kind.

But this is personal opinion not law... Since most midlleware
software still tend to choke at transfer of heavy binay data.
A pity in 2004:)
The main problem is the handling of the binary data. ...
Does anybody of you have an idea?
Any suggestions would be very helpful.

Blobs are now commonly supported by most db engines.
But midlleware is often getting in the way with hapazard
code translation (you want none) or size limitation (you
need none as well).

I have not tested mysql. Cannot talk. Test another
midlleware, if any. If it does not (i should), try an
other database and their associated midlleware.

I would dare to propose firebird (open source) thru
its python middleware or sybase asa (commercial).
They both have strong blob support.

François
 
M

Martin Bless

[[email protected] (Juergen Gerner)]
I want to store files in a database using BLOB [...]
Does anybody of you have an idea?

Maybe its a version issue. I recently grabbed the new version 1.0.0 of
MySQLdb. In the readme.html of the Win binary package you will find
this note:

"""
MySQL-Python 1.0.0 for win32 Notes:
June 28 2004
I needed to get mysql-python working for win32, so I compiled it. I
know a lot of people are looking for this, so enjoy... With 0.9.2,
BLOBs weren't working properly for me, ...
"""

Second:
Skimming over the docs I noticed that the Python API converts BLOBs to
array. Don't know if this hint is of significance in your case.

Hope it helps,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top