Literal Escaped Octets

Discussion in 'Python' started by Chason Hayes, Feb 6, 2006.

  1. Chason Hayes

    Chason Hayes Guest

    I am trying to convert raw binary data to data with escaped octets in
    order to store it in a bytea field on postgresql server. I could do this
    easily in c/c++ but I need to do it in python. I am not sure how to read
    and evaluate the binary value of a byte in a long string when it is a non
    printable ascii value in python. I read some ways to use unpack from the
    struct module, but i really couldn't understand where that would help. I
    looked at the MIMIEncode module but I don't know how to convert the object
    to a string. Is there a module that will convert the data? It seems to me
    that this question must have been answered a million times before but I
    can't find anything.



    See http://www.postgresql.org/docs/8.1/interactive/datatype-binary.html
    for a description of the problem domain.
     
    Chason Hayes, Feb 6, 2006
    #1
    1. Advertising

  2. Chason Hayes <> wrote:
    ...
    > easily in c/c++ but I need to do it in python. I am not sure how to read
    > and evaluate the binary value of a byte in a long string when it is a non
    > printable ascii value in python.


    If you have a bytestring (AKA plain string) s, the binary value of its
    k-th byte is ord(s[k]).


    Alex
     
    Alex Martelli, Feb 6, 2006
    #2
    1. Advertising

  3. Chason Hayes

    Steve Holden Guest

    Chason Hayes wrote:
    > I am trying to convert raw binary data to data with escaped octets in
    > order to store it in a bytea field on postgresql server. I could do this
    > easily in c/c++ but I need to do it in python. I am not sure how to read
    > and evaluate the binary value of a byte in a long string when it is a non
    > printable ascii value in python. I read some ways to use unpack from the
    > struct module, but i really couldn't understand where that would help. I
    > looked at the MIMIEncode module but I don't know how to convert the object
    > to a string. Is there a module that will convert the data? It seems to me
    > that this question must have been answered a million times before but I
    > can't find anything.
    >
    >
    >
    > See http://www.postgresql.org/docs/8.1/interactive/datatype-binary.html
    > for a description of the problem domain.
    >
    >

    The URL you reference is discussing how you represent arbitrary values
    in string literals. If you already have the data in a Python string the
    best advise is to use a parameterized query - that way your Python DB
    API module will do the escaping for you!

    regards
    Steve
    --
    Steve Holden +44 150 684 7255 +1 800 494 3119
    Holden Web LLC www.holdenweb.com
    PyCon TX 2006 www.python.org/pycon/
     
    Steve Holden, Feb 6, 2006
    #3
  4. Chason Hayes

    Chason Hayes Guest

    On Mon, 06 Feb 2006 13:39:17 +0000, Steve Holden wrote:

    > Chason Hayes wrote:
    >> I am trying to convert raw binary data to data with escaped octets in
    >> order to store it in a bytea field on postgresql server. I could do this
    >> easily in c/c++ but I need to do it in python. I am not sure how to read
    >> and evaluate the binary value of a byte in a long string when it is a non
    >> printable ascii value in python. I read some ways to use unpack from the
    >> struct module, but i really couldn't understand where that would help. I
    >> looked at the MIMIEncode module but I don't know how to convert the object
    >> to a string. Is there a module that will convert the data? It seems to me
    >> that this question must have been answered a million times before but I
    >> can't find anything.
    >>
    >>
    >>
    >> See http://www.postgresql.org/docs/8.1/interactive/datatype-binary.html
    >> for a description of the problem domain.
    >>
    >>

    > The URL you reference is discussing how you represent arbitrary values
    > in string literals. If you already have the data in a Python string the
    > best advise is to use a parameterized query - that way your Python DB
    > API module will do the escaping for you!
    >
    > regards
    > Steve


    Thanks for the input. I tried that with a format string and a
    dictionary, but I still received a database error indicating illegal
    string values. This error went away completely when I used a test file
    consisting only of text, but reproduced everytime with a true binary file.
    If you can let me know where I am wrong or show me a code snippet with a
    sql insert that contains a variable with raw binary data that works,
    I would greatly appreciate it.

    Chason
     
    Chason Hayes, Feb 6, 2006
    #4
  5. Chason Hayes

    Chason Hayes Guest

    On Sun, 05 Feb 2006 21:07:23 -0800, Alex Martelli wrote:

    > Chason Hayes <> wrote:
    > ...
    >> easily in c/c++ but I need to do it in python. I am not sure how to read
    >> and evaluate the binary value of a byte in a long string when it is a non
    >> printable ascii value in python.

    >
    > If you have a bytestring (AKA plain string) s, the binary value of its
    > k-th byte is ord(s[k]).
    >
    >
    > Alex


    Thank you very much, That is the function that I was looking for to write
    a filter.

    Chason
     
    Chason Hayes, Feb 6, 2006
    #5
  6. Chason Hayes

    Steve Holden Guest

    Chason Hayes wrote:
    > On Mon, 06 Feb 2006 13:39:17 +0000, Steve Holden wrote:

    [...]
    >>
    >>The URL you reference is discussing how you represent arbitrary values
    >>in string literals. If you already have the data in a Python string the
    >>best advise is to use a parameterized query - that way your Python DB
    >>API module will do the escaping for you!
    >>
    >>regards
    >> Steve

    >
    >
    > Thanks for the input. I tried that with a format string and a
    > dictionary, but I still received a database error indicating illegal
    > string values. This error went away completely when I used a test file
    > consisting only of text, but reproduced everytime with a true binary file.
    > If you can let me know where I am wrong or show me a code snippet with a
    > sql insert that contains a variable with raw binary data that works,
    > I would greatly appreciate it.
    >

    I tried and my experience was exactly the same, which made me think less
    of PostgreSQL.

    They don't seem to implement the SQL BLOB type properly, so it looks as
    though that rebarbative syntax with all the backslashes is necessary. Sorry.

    regards
    Steve
    --
    Steve Holden +44 150 684 7255 +1 800 494 3119
    Holden Web LLC www.holdenweb.com
    PyCon TX 2006 www.python.org/pycon/
     
    Steve Holden, Feb 7, 2006
    #6
  7. On Mon, 06 Feb 2006 04:40:31 GMT, Chason Hayes <> wrote:

    >I am trying to convert raw binary data to data with escaped octets in
    >order to store it in a bytea field on postgresql server. I could do this
    >easily in c/c++ but I need to do it in python. I am not sure how to read
    >and evaluate the binary value of a byte in a long string when it is a non
    >printable ascii value in python. I read some ways to use unpack from the
    >struct module, but i really couldn't understand where that would help. I
    >looked at the MIMIEncode module but I don't know how to convert the object
    >to a string. Is there a module that will convert the data? It seems to me
    >that this question must have been answered a million times before but I
    >can't find anything.
    >

    Have you considered just encoding the data as text in hex or base64, e.g.,

    >>> import binascii
    >>> s = '\x00\x01\x02\x03ABCD0123'
    >>> binascii.hexlify(s)

    '000102034142434430313233'
    >>> binascii.b2a_base64(s)

    'AAECA0FCQ0QwMTIz\n'

    which is also reversible later of course:
    >>> h = binascii.hexlify(s)
    >>> binascii.unhexlify(h)

    '\x00\x01\x02\x03ABCD0123'
    >>> b64 = binascii.b2a_base64(s)
    >>> binascii.a2b_base64(b64)

    '\x00\x01\x02\x03ABCD0123'

    Regards,
    Bengt Richter
     
    Bengt Richter, Feb 7, 2006
    #7
  8. Chason Hayes

    Chason Hayes Guest

    On Tue, 07 Feb 2006 15:06:49 +0000, Bengt Richter wrote:

    > On Mon, 06 Feb 2006 04:40:31 GMT, Chason Hayes <> wrote:
    >
    >>I am trying to convert raw binary data to data with escaped octets in
    >>order to store it in a bytea field on postgresql server. I could do this
    >>easily in c/c++ but I need to do it in python. I am not sure how to read
    >>and evaluate the binary value of a byte in a long string when it is a non
    >>printable ascii value in python. I read some ways to use unpack from the
    >>struct module, but i really couldn't understand where that would help. I
    >>looked at the MIMIEncode module but I don't know how to convert the object
    >>to a string. Is there a module that will convert the data? It seems to me
    >>that this question must have been answered a million times before but I
    >>can't find anything.
    >>

    > Have you considered just encoding the data as text in hex or base64, e.g.,
    >
    > >>> import binascii
    > >>> s = '\x00\x01\x02\x03ABCD0123'
    > >>> binascii.hexlify(s)

    > '000102034142434430313233'
    > >>> binascii.b2a_base64(s)

    > 'AAECA0FCQ0QwMTIz\n'
    >
    > which is also reversible later of course:
    > >>> h = binascii.hexlify(s)
    > >>> binascii.unhexlify(h)

    > '\x00\x01\x02\x03ABCD0123'
    > >>> b64 = binascii.b2a_base64(s)
    > >>> binascii.a2b_base64(b64)

    > '\x00\x01\x02\x03ABCD0123'
    >
    > Regards,
    > Bengt Richter


    I had just about come to that conclusion last night while I was working on
    it. I was going to use
    import base64
    base64.stringencode(binarydata)
    and
    base64.stringdecode(stringdata)

    I then wasn't sure if I should still use the bytea field or just use a
    text field.

    Do you have a suggestion?
     
    Chason Hayes, Feb 8, 2006
    #8
  9. Chason Hayes

    Chason Hayes Guest

    On Tue, 07 Feb 2006 01:58:00 +0000, Steve Holden wrote:

    > Chason Hayes wrote:
    >> On Mon, 06 Feb 2006 13:39:17 +0000, Steve Holden wrote:

    > [...]
    >>>
    >>>The URL you reference is discussing how you represent arbitrary values
    >>>in string literals. If you already have the data in a Python string the
    >>>best advise is to use a parameterized query - that way your Python DB
    >>>API module will do the escaping for you!
    >>>
    >>>regards
    >>> Steve

    >>
    >>
    >> Thanks for the input. I tried that with a format string and a
    >> dictionary, but I still received a database error indicating illegal
    >> string values. This error went away completely when I used a test file
    >> consisting only of text, but reproduced everytime with a true binary file.
    >> If you can let me know where I am wrong or show me a code snippet with a
    >> sql insert that contains a variable with raw binary data that works,
    >> I would greatly appreciate it.
    >>

    > I tried and my experience was exactly the same, which made me think less
    > of PostgreSQL.
    >
    > They don't seem to implement the SQL BLOB type properly, so it looks as
    > though that rebarbative syntax with all the backslashes is necessary. Sorry.
    >
    > regards
    > Steve


    with regards to escaping data parameters I have found that I have to
    specifically add quotes to my strings for them to be understood by
    pstgresql. For example

    ifs=open("binarydatafile","r")
    binarydata=ifs.read()
    stringdata=base64.encodestring(binarydata)

    #does not work
    cursor.execute("insert into binarytable values(%s)" % stringdata)

    #need to do this first
    newstringdata = "'" + stringdata + "'"

    then the select statment works.
    Is this expected behavior? Is there a better way of doing this?

    thanks for any insight
    Chason
     
    Chason Hayes, Feb 8, 2006
    #9
  10. Chason Hayes

    Steve Holden Guest

    Chason Hayes wrote:
    > On Tue, 07 Feb 2006 01:58:00 +0000, Steve Holden wrote:
    >
    >
    >>Chason Hayes wrote:
    >>
    >>>On Mon, 06 Feb 2006 13:39:17 +0000, Steve Holden wrote:

    >>
    >>[...]
    >>
    >>>>The URL you reference is discussing how you represent arbitrary values
    >>>>in string literals. If you already have the data in a Python string the
    >>>>best advise is to use a parameterized query - that way your Python DB
    >>>>API module will do the escaping for you!
    >>>>
    >>>>regards
    >>>> Steve
    >>>
    >>>
    >>>Thanks for the input. I tried that with a format string and a
    >>>dictionary, but I still received a database error indicating illegal
    >>>string values. This error went away completely when I used a test file
    >>>consisting only of text, but reproduced everytime with a true binary file.
    >>>If you can let me know where I am wrong or show me a code snippet with a
    >>>sql insert that contains a variable with raw binary data that works,
    >>>I would greatly appreciate it.
    >>>

    >>
    >>I tried and my experience was exactly the same, which made me think less
    >>of PostgreSQL.
    >>
    >>They don't seem to implement the SQL BLOB type properly, so it looks as
    >>though that rebarbative syntax with all the backslashes is necessary. Sorry.
    >>
    >>regards
    >> Steve

    >
    >
    > with regards to escaping data parameters I have found that I have to
    > specifically add quotes to my strings for them to be understood by
    > pstgresql. For example
    >
    > ifs=open("binarydatafile","r")
    > binarydata=ifs.read()
    > stringdata=base64.encodestring(binarydata)
    >
    > #does not work
    > cursor.execute("insert into binarytable values(%s)" % stringdata)
    >
    > #need to do this first
    > newstringdata = "'" + stringdata + "'"
    >
    > then the select statment works.
    > Is this expected behavior? Is there a better way of doing this?
    >
    > thanks for any insight


    Yes, parameterize your queries. I assume you are using psycopg or
    something similar to create the database connection (i.e. I something
    that expects the "%s" parameter style - there are other options, but we
    needn't discuss them here).

    The magic incantation you seek is:

    cursor.execute("insert into binarytable values(%s)", (stringdata, ))

    Note that here there are TWO arguments to the .execute() method. The
    first is a parameterized SQL statement, and the second is a tuple of
    data items, one for each parameter mark in the SQL.

    Using this technique all necessary quoting (and even data conversion
    with a good database module) is performed inside the database driver,
    meaning (among other things) that your program is no longer vulnerable
    to the dreaded SQL injection errors.

    This is the technique I was hoping would work with the bytea datatype,
    but alas it doesn't. ISTM that PostgreSQL needs a bit of work there,
    even though it is otherwise a very polished product.

    regards
    Steve
    --
    Steve Holden +44 150 684 7255 +1 800 494 3119
    Holden Web LLC www.holdenweb.com
    PyCon TX 2006 www.python.org/pycon/
     
    Steve Holden, Feb 8, 2006
    #10
  11. On Wed, 08 Feb 2006 05:11:28 GMT, Chason Hayes <>
    declaimed the following in comp.lang.python:


    > #does not work
    > cursor.execute("insert into binarytable values(%s)" % stringdata)
    >

    cursor.execute("insert into binarytable values (%s)", (stringdata,))

    Assuming the database module follows the DB-API spec, IT will
    determine that a set of surrounding quotes will be needed, and apply
    them. You may still have to handle converting other stuff internal to
    the data.
    --
    > ============================================================== <
    > | Wulfraed Dennis Lee Bieber KD6MOG <
    > | Bestiaria Support Staff <
    > ============================================================== <
    > Home Page: <http://www.dm.net/~wulfraed/> <
    > Overflow Page: <http://wlfraed.home.netcom.com/> <
     
    Dennis Lee Bieber, Feb 8, 2006
    #11
  12. Chason Hayes

    Chason Hayes Guest

    On Wed, 08 Feb 2006 00:57:45 -0500, Steve Holden wrote:

    > Chason Hayes wrote:
    >> On Tue, 07 Feb 2006 01:58:00 +0000, Steve Holden wrote:
    >>
    >>
    >>>Chason Hayes wrote:
    >>>
    >>>>On Mon, 06 Feb 2006 13:39:17 +0000, Steve Holden wrote:
    >>>
    >>>[...]
    >>>
    >>>>>The URL you reference is discussing how you represent arbitrary values
    >>>>>in string literals. If you already have the data in a Python string the
    >>>>>best advise is to use a parameterized query - that way your Python DB
    >>>>>API module will do the escaping for you!
    >>>>>
    >>>>>regards
    >>>>> Steve
    >>>>
    >>>>
    >>>>Thanks for the input. I tried that with a format string and a
    >>>>dictionary, but I still received a database error indicating illegal
    >>>>string values. This error went away completely when I used a test file
    >>>>consisting only of text, but reproduced everytime with a true binary file.
    >>>>If you can let me know where I am wrong or show me a code snippet with a
    >>>>sql insert that contains a variable with raw binary data that works,
    >>>>I would greatly appreciate it.
    >>>>
    >>>
    >>>I tried and my experience was exactly the same, which made me think less
    >>>of PostgreSQL.
    >>>
    >>>They don't seem to implement the SQL BLOB type properly, so it looks as
    >>>though that rebarbative syntax with all the backslashes is necessary. Sorry.
    >>>
    >>>regards
    >>> Steve

    >>
    >>
    >> with regards to escaping data parameters I have found that I have to
    >> specifically add quotes to my strings for them to be understood by
    >> pstgresql. For example
    >>
    >> ifs=open("binarydatafile","r")
    >> binarydata=ifs.read()
    >> stringdata=base64.encodestring(binarydata)
    >>
    >> #does not work
    >> cursor.execute("insert into binarytable values(%s)" % stringdata)
    >>
    >> #need to do this first
    >> newstringdata = "'" + stringdata + "'"
    >>
    >> then the select statment works.
    >> Is this expected behavior? Is there a better way of doing this?
    >>
    >> thanks for any insight

    >
    > Yes, parameterize your queries. I assume you are using psycopg or
    > something similar to create the database connection (i.e. I something
    > that expects the "%s" parameter style - there are other options, but we
    > needn't discuss them here).
    >
    > The magic incantation you seek is:
    >
    > cursor.execute("insert into binarytable values(%s)", (stringdata, ))
    >
    > Note that here there are TWO arguments to the .execute() method. The
    > first is a parameterized SQL statement, and the second is a tuple of
    > data items, one for each parameter mark in the SQL.
    >
    > Using this technique all necessary quoting (and even data conversion
    > with a good database module) is performed inside the database driver,
    > meaning (among other things) that your program is no longer vulnerable
    > to the dreaded SQL injection errors.
    >
    > This is the technique I was hoping would work with the bytea datatype,
    > but alas it doesn't. ISTM that PostgreSQL needs a bit of work there,
    > even though it is otherwise a very polished product.
    >
    > regards
    > Steve


    That was it. Thanks for your great help.

    Chason
     
    Chason Hayes, Feb 8, 2006
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    18
    Views:
    887
  2. Andreas Leitgeb
    Replies:
    0
    Views:
    472
    Andreas Leitgeb
    May 15, 2009
  3. Mark Space
    Replies:
    0
    Views:
    508
    Mark Space
    May 15, 2009
  4. Lew
    Replies:
    0
    Views:
    974
  5. Joshua Cranmer
    Replies:
    0
    Views:
    492
    Joshua Cranmer
    May 15, 2009
Loading...

Share This Page