Storing files in a BLOB field via SQL

Discussion in 'Python' started by Juergen Gerner, Jun 6, 2004.

  1. Hello Python fans,

    I'm trying and searching for many days for an acceptable solution...
    without success. I want to store files in a database using BLOB
    fields. The database table has an ID field (INT, auto_increment), an
    ORDER field (INT, for knowing the right order) and a "normal" BLOB
    field. It is planned to split large files in 64k-parts and sort these
    parts by the ORDER field.

    Here's some pseudo code how I wanted to implement this in my app:

    file = file_open(myFileName, read_only)
    order = 0
    data = file.read(65535)
    while (data):
    query = "INSERT INTO table (order,data) VALUES (%i,%s)", order, data
    mysql_exec(query)
    order = order + 1
    data = file.readBlock(65535)

    The main problem is the handling of the binary data. There might be
    errors in the SQL syntax if some special chars (quotas etc.) appear,
    or the SQL statement is incorrect because of strange chars that can't
    be encoded by the current codeset. Another problem is, you can't strip
    these chars because you would change the binary data or make it bigger
    than 64k.

    Does anybody of you have an idea?
    Any suggestions would be very helpful.

    Additionally, I want to compress the data and store a checksum
    somewhere. Any hint (links, sites, ...) is welcome...!

    Thanks in advance,
    Juergen
     
    Juergen Gerner, Jun 6, 2004
    #1
    1. Advertising

  2. Juergen Gerner wrote:

    > field. It is planned to split large files in 64k-parts and sort these
    > parts by the ORDER field.


    Is there a special reason why you can't store the whole file in a
    single BLOB? That's what it's a BLOB for, after all... L=Large :)


    > Additionally, I want to compress the data and store a checksum
    > somewhere. Any hint (links, sites, ...) is welcome...!


    Compress: look at the zlib module (gzip-style compression)
    Checksum: what kind? 32 bit-crc: zlib.crc32 md5: md5.md5

    --Irmen
     
    Irmen de Jong, Jun 6, 2004
    #2
    1. Advertising

  3. Hi Irmen,

    first of all, thanks for your help about compression & checksum!

    > Is there a special reason why you can't store the whole file in a
    > single BLOB? That's what it's a BLOB for, after all... L=Large :)


    Yes, there's a special reason. After reading a lot of documentation I
    think it's better splitting large files in little blobs. It doesn't
    matter if the SQL server is on the same machine as the application,
    but if both parts are on different machines, large files have to be
    transmitted over the network. During this transfer the application
    isn't responding, I guess. So splitting would be much more flexible.
    Additionally I think, splitting files makes the database more scalable
    and the space on the harddrive better used.

    But the splitting isn't my main problem. It's the way I transmit the
    binary data to the database via an SQL syntax. Today I saw how
    PhpMyAdmin handles binary data: it codes each byte in hexadecimal
    values ("\0x..."). Is there any way to do something (or similar) with
    Python, or maybe with PyQt/QString/QByteArray?

    Thanks in advance!
    Juergen
     
    Juergen Gerner, Jun 7, 2004
    #3
  4. Juergen Gerner wrote:

    >>Is there a special reason why you can't store the whole file in a
    >>single BLOB? That's what it's a BLOB for, after all... L=Large :)

    >
    >
    > Yes, there's a special reason. After reading a lot of documentation I
    > think it's better splitting large files in little blobs. It doesn't
    > matter if the SQL server is on the same machine as the application,
    > but if both parts are on different machines, large files have to be
    > transmitted over the network. During this transfer the application
    > isn't responding, I guess. So splitting would be much more flexible.


    How would splitting the file in chunks improve the responsiveness
    of the application? This would only work if your app needs only
    a specific chunk of the larger file to work on. If you need to read
    the full file, reading 10 chunks will take even longer than reading
    one big BLOB.
    You may decide to do it 'in the background' using a thread, but
    then again, you could just as well load the single big BLOB inside
    that separate thread.

    > Additionally I think, splitting files makes the database more scalable
    > and the space on the harddrive better used.


    In my humble opinion these kind of assumptions are generally false.
    Let the database decide what the most efficient storage method is
    for your 100 Mb BLOB. I don't want to make these kind of assumptions
    about the inner workings of my database server, and I certainly don't
    want to wire them into my application code... what happens when you
    switch platforms/DBMS? Is your code still 'the most efficient' then?
    Just my €0.02

    > But the splitting isn't my main problem. It's the way I transmit the
    > binary data to the database via an SQL syntax.


    Sorry can't help you with this. I would expect the database driver module
    to do the 'right' escaping.


    --Irmen
     
    Irmen de Jong, Jun 7, 2004
    #4
  5. "Juergen Gerner" <> wrote in message
    news:...
    > Hello Python fans,
    >
    > The main problem is the handling of the binary data. There might be
    > errors in the SQL syntax if some special chars (quotas etc.) appear,
    > or the SQL statement is incorrect because of strange chars that can't
    > be encoded by the current codeset. Another problem is, you can't strip
    > these chars because you would change the binary data or make it bigger
    > than 64k.
    >
    > Does anybody of you have an idea?
    > Any suggestions would be very helpful.


    You can either use the MySQL hex literal format (x'AABBCC'...) or use the
    Python DB API which will handle the parameter conversion for you...

    In the first case your query becomes somethings like:

    query = "INSERT INTO table (order,data) VALUES (%i,x'%s')" % (order,
    data.encode('hex'))

    In the second, preferable version you use something like:

    cur = conn.cursor()
    cur.execute("INSERT INTO table (order,data) VALUES (?,?)", (order, data))

    and the DBAPI/Database driver takes care of the rest.


    > Additionally, I want to compress the data and store a checksum
    > somewhere. Any hint (links, sites, ...) is welcome...!


    compressed = data.encode('zip') # Compress the data


    Mike.
     
    Michael Porter, Jun 7, 2004
    #5
  6. Juergen Gerner wrote:
    >
    > But the splitting isn't my main problem. It's the way I transmit the
    > binary data to the database via an SQL syntax. Today I saw how
    > PhpMyAdmin handles binary data: it codes each byte in hexadecimal
    > values ("\0x..."). Is there any way to do something (or similar) with
    > Python, or maybe with PyQt/QString/QByteArray?
    >


    Don't know if this is of help to you, but i use the following
    to insert binary data into a MS-SQL Db with ADO.
    Suppose var "rawdata" contains the binary data:

    def bcd2str(bcs):
    """ converts a BCD coded string to a ascii coded string

    Note: does also work for all hex values, ie. '\x2d' """

    out = ''
    for c in bcs:
    out = out + (hex(ord(c))[2:]).zfill(2)
    return out


    def str2hex(s):
    """ converts binary byte (hex 0x00 - 0xff)
    data in a python string into format needed to
    insert into binary datatype on sql server """

    return '0x' + bcd2str(s)


    insertstring = "insert into foo (sID, RawData) VALUES (%s, %s)" \
    % (str2hex(rawdata))
    adoconn.Execute(insertstring)


    regards,
    Bruno
     
    Bruno Widmann, Jun 7, 2004
    #6
  7. Hi

    > I'm trying and searching for many days for an acceptable solution...
    > without success. I want to store files in a database using BLOB
    > fields.


    This is a reasonable data organization scheme.

    >The database table has an ID field (INT, auto_increment), an
    > ORDER field (INT, for knowing the right order) and a "normal" BLOB
    > field. It is planned to split large files in 64k-parts and sort these
    > parts by the ORDER field.


    Why split the BLOBs in 64k parts? A db works with pages blocks
    that will not map these parts anyway. The structure of the database
    should map your "application logic" not the other way round.

    In a perfect world your blob data should be stored "as is",
    whatever their size with no translation of any kind.

    But this is personal opinion not law... Since most midlleware
    software still tend to choke at transfer of heavy binay data.
    A pity in 2004:)

    > The main problem is the handling of the binary data. ...
    >Does anybody of you have an idea?
    > Any suggestions would be very helpful.


    Blobs are now commonly supported by most db engines.
    But midlleware is often getting in the way with hapazard
    code translation (you want none) or size limitation (you
    need none as well).

    I have not tested mysql. Cannot talk. Test another
    midlleware, if any. If it does not (i should), try an
    other database and their associated midlleware.

    I would dare to propose firebird (open source) thru
    its python middleware or sybase asa (commercial).
    They both have strong blob support.

    François
     
    francois lepoutre, Jun 10, 2004
    #7
  8. Juergen Gerner

    Martin Bless Guest

    [ (Juergen Gerner)]

    >I want to store files in a database using BLOB

    [...]
    >Does anybody of you have an idea?


    Maybe its a version issue. I recently grabbed the new version 1.0.0 of
    MySQLdb. In the readme.html of the Win binary package you will find
    this note:

    """
    MySQL-Python 1.0.0 for win32 Notes:
    June 28 2004
    I needed to get mysql-python working for win32, so I compiled it. I
    know a lot of people are looking for this, so enjoy... With 0.9.2,
    BLOBs weren't working properly for me, ...
    """

    Second:
    Skimming over the docs I noticed that the Python API converts BLOBs to
    array. Don't know if this hint is of significance in your case.

    Hope it helps,
    Martin
     
    Martin Bless, Jul 6, 2004
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Tony
    Replies:
    1
    Views:
    590
    Murray
    Aug 2, 2004
  2. Replies:
    3
    Views:
    2,583
  3. Replies:
    4
    Views:
    1,536
    Bwig Zomberi
    Sep 2, 2010
  4. Steve
    Replies:
    4
    Views:
    399
    James Willmore
    Nov 28, 2003
  5. Bazil
    Replies:
    3
    Views:
    356
    William Herrera
    Nov 30, 2003
Loading...

Share This Page