pyodbc data corruption problem

Discussion in 'Python' started by Joe Salmeri, May 19, 2007.

  1. Joe Salmeri

    Joe Salmeri Guest

    I have found a data corruption problem with pyodbc.

    OS = Windows XP SP2
    DB = Microsoft Access XP

    PROBLEM:

    When selecting columns from a table that are of type Memo the value
    returned is padded with a bunch of null characters at the end.

    The problems does not seem to occur until the length of the Memo column
    exceeds 2048 bytes.

    I have attached several scripts to help demonstrate the problem.

    To recreate the problem:

    1. Create a blank Access database named test.
    2. Create a ODBC DSN named test for that database
    3. Run the createtable.py script to create the table
    and load it with the dummy data
    4. Run the broke.py script to show the problem.

    The issue is when the data value is > 2048 bytes.

    The size in the createtable.py is 2046 bytes plus 3 bytes at the end that
    contain "JOE" for a total of 2049 bytes.

    If you change it from 2046 to 2045 (or less) then the problem does not
    occur.

    #
    # createtable.py script
    #

    import pyodbc

    dbs = pyodbc.connect('dsn=test')

    c = dbs.cursor()

    try:
    sql = 'drop table test_memo'
    c.execute(sql)
    dbs.commit()
    except:
    # ignore drop table failure
    pass

    sql = 'create table test_memo (c1 int not null, c2 memo not null)'

    c.execute(sql)

    dbs.commit()

    sql = 'insert into test_memo values(1, ?)'

    c2_value = '1' * 2046

    c2_value = '%sJOE' % (c2_value)

    c.execute(sql, (c2_value,))

    dbs.commit()

    c.close()
    dbs.close()

    #
    # broke.py script
    #


    import pyodbc

    dbs = pyodbc.connect('dsn=test')

    c = dbs.cursor()

    sql = 'select c2, len(c2) as c2_db_len from test_memo where c1 = 1'

    c.execute(sql)

    row = c.fetchone()

    (
    c2,
    c2_db_len
    ) = row

    print repr(c2)

    print 'c2 length :', len(c2)
    print 'c2_db_length :', c2_db_len

    print 'before nul length:', len(c2[0:c2.find('\x00')])

    c.close()
    dbs.close()
     
    Joe Salmeri, May 19, 2007
    #1
    1. Advertising

  2. Joe Salmeri

    Guest

    On May 18, 6:46 pm, "Joe Salmeri" <> wrote:
    > I have found a data corruption problem with pyodbc.
    >
    > OS = Windows XP SP2
    > DB = Microsoft Access XP
    >
    > PROBLEM:
    >
    > When selecting columns from a table that are of type Memo the value
    > returned is padded with a bunch of null characters at the end.
    >
    > The problems does not seem to occur until the length of the Memo column
    > exceeds 2048 bytes.


    Interesting. MS-Access has had a bug about Memo fields
    ever since Version 2.0 (the last time it worked). I was trying
    to use a Memo field on a form and append short status messages.
    This mysteriously stopped working in the version following 2.0
    (Access 95?) and has never worked since. Seems there is some
    kind of 2048 buffer involved that was created by some punk
    MicroSoft gave the lowly job of software revision to while
    those more educated did more important things.

    My guess is you won't find a Python solution short of
    accepting what Access gives you and dealing with it.

    >
    > I have attached several scripts to help demonstrate the problem.
    >
    > To recreate the problem:
    >
    > 1. Create a blank Access database named test.
    > 2. Create a ODBC DSN named test for that database
    > 3. Run the createtable.py script to create the table
    > and load it with the dummy data
    > 4. Run the broke.py script to show the problem.
    >
    > The issue is when the data value is > 2048 bytes.
    >
    > The size in the createtable.py is 2046 bytes plus 3 bytes at the end that
    > contain "JOE" for a total of 2049 bytes.
    >
    > If you change it from 2046 to 2045 (or less) then the problem does not
    > occur.
    >
    > #
    > # createtable.py script
    > #
    >
    > import pyodbc
    >
    > dbs = pyodbc.connect('dsn=test')
    >
    > c = dbs.cursor()
    >
    > try:
    > sql = 'drop table test_memo'
    > c.execute(sql)
    > dbs.commit()
    > except:
    > # ignore drop table failure
    > pass
    >
    > sql = 'create table test_memo (c1 int not null, c2 memo not null)'
    >
    > c.execute(sql)
    >
    > dbs.commit()
    >
    > sql = 'insert into test_memo values(1, ?)'
    >
    > c2_value = '1' * 2046
    >
    > c2_value = '%sJOE' % (c2_value)
    >
    > c.execute(sql, (c2_value,))
    >
    > dbs.commit()
    >
    > c.close()
    > dbs.close()
    >
    > #
    > # broke.py script
    > #
    >
    > import pyodbc
    >
    > dbs = pyodbc.connect('dsn=test')
    >
    > c = dbs.cursor()
    >
    > sql = 'select c2, len(c2) as c2_db_len from test_memo where c1 = 1'
    >
    > c.execute(sql)
    >
    > row = c.fetchone()
    >
    > (
    > c2,
    > c2_db_len
    > ) = row
    >
    > print repr(c2)
    >
    > print 'c2 length :', len(c2)
    > print 'c2_db_length :', c2_db_len
    >
    > print 'before nul length:', len(c2[0:c2.find('\x00')])
    >
    > c.close()
    > dbs.close()
     
    , May 19, 2007
    #2
    1. Advertising

  3. Joe Salmeri

    Joe Salmeri Guest

    Thank you for your response but this is not an Access problem.

    The exact same code using mx.ODBC or using the old odbc.py that comes with
    the win32 files works fine.

    It only fails with pyodbc.

    <> wrote in message
    news:...
    > On May 18, 6:46 pm, "Joe Salmeri" <> wrote:
    >> I have found a data corruption problem with pyodbc.
    >>
    >> OS = Windows XP SP2
    >> DB = Microsoft Access XP
    >>
    >> PROBLEM:
    >>
    >> When selecting columns from a table that are of type Memo the value
    >> returned is padded with a bunch of null characters at the end.
    >>
    >> The problems does not seem to occur until the length of the Memo column
    >> exceeds 2048 bytes.

    >
    > Interesting. MS-Access has had a bug about Memo fields
    > ever since Version 2.0 (the last time it worked). I was trying
    > to use a Memo field on a form and append short status messages.
    > This mysteriously stopped working in the version following 2.0
    > (Access 95?) and has never worked since. Seems there is some
    > kind of 2048 buffer involved that was created by some punk
    > MicroSoft gave the lowly job of software revision to while
    > those more educated did more important things.
    >
    > My guess is you won't find a Python solution short of
    > accepting what Access gives you and dealing with it.
    >
    >>
    >> I have attached several scripts to help demonstrate the problem.
    >>
    >> To recreate the problem:
    >>
    >> 1. Create a blank Access database named test.
    >> 2. Create a ODBC DSN named test for that database
    >> 3. Run the createtable.py script to create the table
    >> and load it with the dummy data
    >> 4. Run the broke.py script to show the problem.
    >>
    >> The issue is when the data value is > 2048 bytes.
    >>
    >> The size in the createtable.py is 2046 bytes plus 3 bytes at the end that
    >> contain "JOE" for a total of 2049 bytes.
    >>
    >> If you change it from 2046 to 2045 (or less) then the problem does not
    >> occur.
    >>
    >> #
    >> # createtable.py script
    >> #
    >>
    >> import pyodbc
    >>
    >> dbs = pyodbc.connect('dsn=test')
    >>
    >> c = dbs.cursor()
    >>
    >> try:
    >> sql = 'drop table test_memo'
    >> c.execute(sql)
    >> dbs.commit()
    >> except:
    >> # ignore drop table failure
    >> pass
    >>
    >> sql = 'create table test_memo (c1 int not null, c2 memo not null)'
    >>
    >> c.execute(sql)
    >>
    >> dbs.commit()
    >>
    >> sql = 'insert into test_memo values(1, ?)'
    >>
    >> c2_value = '1' * 2046
    >>
    >> c2_value = '%sJOE' % (c2_value)
    >>
    >> c.execute(sql, (c2_value,))
    >>
    >> dbs.commit()
    >>
    >> c.close()
    >> dbs.close()
    >>
    >> #
    >> # broke.py script
    >> #
    >>
    >> import pyodbc
    >>
    >> dbs = pyodbc.connect('dsn=test')
    >>
    >> c = dbs.cursor()
    >>
    >> sql = 'select c2, len(c2) as c2_db_len from test_memo where c1 = 1'
    >>
    >> c.execute(sql)
    >>
    >> row = c.fetchone()
    >>
    >> (
    >> c2,
    >> c2_db_len
    >> ) = row
    >>
    >> print repr(c2)
    >>
    >> print 'c2 length :', len(c2)
    >> print 'c2_db_length :', c2_db_len
    >>
    >> print 'before nul length:', len(c2[0:c2.find('\x00')])
    >>
    >> c.close()
    >> dbs.close()

    >
    >
     
    Joe Salmeri, May 19, 2007
    #3
  4. Joe Salmeri

    Joe Salmeri Guest

    I have done some additiona investigate into this problem and found the
    following:

    As described below the problem does not begin to appear until the return
    value size is > 2048.

    Once the return value is greater than 2048 bytes the value returned by
    pyodbc is 2 times the actual size of the return value data.

    The return value data is padded by exactly the same number of null
    characters as there are in the actual data size.

    In other words if the actual database value is 4868 bytes, then pyodbc will
    return a value that is 9736 bytes long. The first 4868 bytes will be the
    real data, followed by 4868 bytes of nulls.

    I did a second test where the actual data size was 11,109 bytes. In that
    case pyodbc returned a value that was 22,218 bytes long. The first 11,109
    bytes are the real data, followed by 11,109 null bytes.

    This seems to confirm the bug.

    "Joe Salmeri" <> wrote in message
    news:...
    >I have found a data corruption problem with pyodbc.
    >
    > OS = Windows XP SP2
    > DB = Microsoft Access XP
    >
    > PROBLEM:
    >
    > When selecting columns from a table that are of type Memo the value
    > returned is padded with a bunch of null characters at the end.
    >
    > The problems does not seem to occur until the length of the Memo column
    > exceeds 2048 bytes.
    >
    > I have attached several scripts to help demonstrate the problem.
    >
    > To recreate the problem:
    >
    > 1. Create a blank Access database named test.
    > 2. Create a ODBC DSN named test for that database
    > 3. Run the createtable.py script to create the table
    > and load it with the dummy data
    > 4. Run the broke.py script to show the problem.
    >
    > The issue is when the data value is > 2048 bytes.
    >
    > The size in the createtable.py is 2046 bytes plus 3 bytes at the end that
    > contain "JOE" for a total of 2049 bytes.
    >
    > If you change it from 2046 to 2045 (or less) then the problem does not
    > occur.
    >
    > #
    > # createtable.py script
    > #
    >
    > import pyodbc
    >
    > dbs = pyodbc.connect('dsn=test')
    >
    > c = dbs.cursor()
    >
    > try:
    > sql = 'drop table test_memo'
    > c.execute(sql)
    > dbs.commit()
    > except:
    > # ignore drop table failure
    > pass
    >
    > sql = 'create table test_memo (c1 int not null, c2 memo not null)'
    >
    > c.execute(sql)
    >
    > dbs.commit()
    >
    > sql = 'insert into test_memo values(1, ?)'
    >
    > c2_value = '1' * 2046
    >
    > c2_value = '%sJOE' % (c2_value)
    >
    > c.execute(sql, (c2_value,))
    >
    > dbs.commit()
    >
    > c.close()
    > dbs.close()
    >
    > #
    > # broke.py script
    > #
    >
    >
    > import pyodbc
    >
    > dbs = pyodbc.connect('dsn=test')
    >
    > c = dbs.cursor()
    >
    > sql = 'select c2, len(c2) as c2_db_len from test_memo where c1 = 1'
    >
    > c.execute(sql)
    >
    > row = c.fetchone()
    >
    > (
    > c2,
    > c2_db_len
    > ) = row
    >
    > print repr(c2)
    >
    > print 'c2 length :', len(c2)
    > print 'c2_db_length :', c2_db_len
    >
    > print 'before nul length:', len(c2[0:c2.find('\x00')])
    >
    > c.close()
    > dbs.close()
    >
    >
    >
     
    Joe Salmeri, May 24, 2007
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Rajendran
    Replies:
    3
    Views:
    769
    Doug Phillips
    Jun 17, 2007
  2. john.goodleaf

    Problem with pyodbc, Python?

    john.goodleaf, Aug 1, 2008, in forum: Python
    Replies:
    0
    Views:
    313
    john.goodleaf
    Aug 1, 2008
  3. MacRules
    Replies:
    9
    Views:
    827
    Dennis Lee Bieber
    Sep 5, 2009
  4. Frank Millman

    pyodbc - problem passing None as parameter

    Frank Millman, Oct 22, 2009, in forum: Python
    Replies:
    0
    Views:
    287
    Frank Millman
    Oct 22, 2009
  5. Tim Golden
    Replies:
    0
    Views:
    660
    Tim Golden
    Oct 22, 2009
Loading...

Share This Page