Literal Escaped Octets

C

Chason Hayes

I am trying to convert raw binary data to data with escaped octets in
order to store it in a bytea field on postgresql server. I could do this
easily in c/c++ but I need to do it in python. I am not sure how to read
and evaluate the binary value of a byte in a long string when it is a non
printable ascii value in python. I read some ways to use unpack from the
struct module, but i really couldn't understand where that would help. I
looked at the MIMIEncode module but I don't know how to convert the object
to a string. Is there a module that will convert the data? It seems to me
that this question must have been answered a million times before but I
can't find anything.



See http://www.postgresql.org/docs/8.1/interactive/datatype-binary.html
for a description of the problem domain.
 
A

Alex Martelli

Chason Hayes said:
easily in c/c++ but I need to do it in python. I am not sure how to read
and evaluate the binary value of a byte in a long string when it is a non
printable ascii value in python.

If you have a bytestring (AKA plain string) s, the binary value of its
k-th byte is ord(s[k]).


Alex
 
S

Steve Holden

Chason said:
I am trying to convert raw binary data to data with escaped octets in
order to store it in a bytea field on postgresql server. I could do this
easily in c/c++ but I need to do it in python. I am not sure how to read
and evaluate the binary value of a byte in a long string when it is a non
printable ascii value in python. I read some ways to use unpack from the
struct module, but i really couldn't understand where that would help. I
looked at the MIMIEncode module but I don't know how to convert the object
to a string. Is there a module that will convert the data? It seems to me
that this question must have been answered a million times before but I
can't find anything.



See http://www.postgresql.org/docs/8.1/interactive/datatype-binary.html
for a description of the problem domain.
The URL you reference is discussing how you represent arbitrary values
in string literals. If you already have the data in a Python string the
best advise is to use a parameterized query - that way your Python DB
API module will do the escaping for you!

regards
Steve
 
C

Chason Hayes

The URL you reference is discussing how you represent arbitrary values
in string literals. If you already have the data in a Python string the
best advise is to use a parameterized query - that way your Python DB
API module will do the escaping for you!

regards
Steve

Thanks for the input. I tried that with a format string and a
dictionary, but I still received a database error indicating illegal
string values. This error went away completely when I used a test file
consisting only of text, but reproduced everytime with a true binary file.
If you can let me know where I am wrong or show me a code snippet with a
sql insert that contains a variable with raw binary data that works,
I would greatly appreciate it.

Chason
 
C

Chason Hayes

Chason Hayes said:
easily in c/c++ but I need to do it in python. I am not sure how to read
and evaluate the binary value of a byte in a long string when it is a non
printable ascii value in python.

If you have a bytestring (AKA plain string) s, the binary value of its
k-th byte is ord(s[k]).


Alex

Thank you very much, That is the function that I was looking for to write
a filter.

Chason
 
S

Steve Holden

Chason said:
Thanks for the input. I tried that with a format string and a
dictionary, but I still received a database error indicating illegal
string values. This error went away completely when I used a test file
consisting only of text, but reproduced everytime with a true binary file.
If you can let me know where I am wrong or show me a code snippet with a
sql insert that contains a variable with raw binary data that works,
I would greatly appreciate it.
I tried and my experience was exactly the same, which made me think less
of PostgreSQL.

They don't seem to implement the SQL BLOB type properly, so it looks as
though that rebarbative syntax with all the backslashes is necessary. Sorry.

regards
Steve
 
B

Bengt Richter

I am trying to convert raw binary data to data with escaped octets in
order to store it in a bytea field on postgresql server. I could do this
easily in c/c++ but I need to do it in python. I am not sure how to read
and evaluate the binary value of a byte in a long string when it is a non
printable ascii value in python. I read some ways to use unpack from the
struct module, but i really couldn't understand where that would help. I
looked at the MIMIEncode module but I don't know how to convert the object
to a string. Is there a module that will convert the data? It seems to me
that this question must have been answered a million times before but I
can't find anything.
Have you considered just encoding the data as text in hex or base64, e.g.,
'AAECA0FCQ0QwMTIz\n'

which is also reversible later of course: '\x00\x01\x02\x03ABCD0123'

Regards,
Bengt Richter
 
C

Chason Hayes

Have you considered just encoding the data as text in hex or base64, e.g.,

'AAECA0FCQ0QwMTIz\n'

which is also reversible later of course:
'\x00\x01\x02\x03ABCD0123'

Regards,
Bengt Richter

I had just about come to that conclusion last night while I was working on
it. I was going to use
import base64
base64.stringencode(binarydata)
and
base64.stringdecode(stringdata)

I then wasn't sure if I should still use the bytea field or just use a
text field.

Do you have a suggestion?
 
C

Chason Hayes

I tried and my experience was exactly the same, which made me think less
of PostgreSQL.

They don't seem to implement the SQL BLOB type properly, so it looks as
though that rebarbative syntax with all the backslashes is necessary. Sorry.

regards
Steve

with regards to escaping data parameters I have found that I have to
specifically add quotes to my strings for them to be understood by
pstgresql. For example

ifs=open("binarydatafile","r")
binarydata=ifs.read()
stringdata=base64.encodestring(binarydata)

#does not work
cursor.execute("insert into binarytable values(%s)" % stringdata)

#need to do this first
newstringdata = "'" + stringdata + "'"

then the select statment works.
Is this expected behavior? Is there a better way of doing this?

thanks for any insight
Chason
 
S

Steve Holden

Chason said:
Chason said:
]

The URL you reference is discussing how you represent arbitrary values
in string literals. If you already have the data in a Python string the
best advise is to use a parameterized query - that way your Python DB
API module will do the escaping for you!

regards
Steve


Thanks for the input. I tried that with a format string and a
dictionary, but I still received a database error indicating illegal
string values. This error went away completely when I used a test file
consisting only of text, but reproduced everytime with a true binary file.
If you can let me know where I am wrong or show me a code snippet with a
sql insert that contains a variable with raw binary data that works,
I would greatly appreciate it.

I tried and my experience was exactly the same, which made me think less
of PostgreSQL.

They don't seem to implement the SQL BLOB type properly, so it looks as
though that rebarbative syntax with all the backslashes is necessary. Sorry.

regards
Steve


with regards to escaping data parameters I have found that I have to
specifically add quotes to my strings for them to be understood by
pstgresql. For example

ifs=open("binarydatafile","r")
binarydata=ifs.read()
stringdata=base64.encodestring(binarydata)

#does not work
cursor.execute("insert into binarytable values(%s)" % stringdata)

#need to do this first
newstringdata = "'" + stringdata + "'"

then the select statment works.
Is this expected behavior? Is there a better way of doing this?

thanks for any insight

Yes, parameterize your queries. I assume you are using psycopg or
something similar to create the database connection (i.e. I something
that expects the "%s" parameter style - there are other options, but we
needn't discuss them here).

The magic incantation you seek is:

cursor.execute("insert into binarytable values(%s)", (stringdata, ))

Note that here there are TWO arguments to the .execute() method. The
first is a parameterized SQL statement, and the second is a tuple of
data items, one for each parameter mark in the SQL.

Using this technique all necessary quoting (and even data conversion
with a good database module) is performed inside the database driver,
meaning (among other things) that your program is no longer vulnerable
to the dreaded SQL injection errors.

This is the technique I was hoping would work with the bytea datatype,
but alas it doesn't. ISTM that PostgreSQL needs a bit of work there,
even though it is otherwise a very polished product.

regards
Steve
 
D

Dennis Lee Bieber

#does not work
cursor.execute("insert into binarytable values(%s)" % stringdata)
cursor.execute("insert into binarytable values (%s)", (stringdata,))

Assuming the database module follows the DB-API spec, IT will
determine that a set of surrounding quotes will be needed, and apply
them. You may still have to handle converting other stuff internal to
the data.
--
 
C

Chason Hayes

Chason said:
Chason Hayes wrote:

On Mon, 06 Feb 2006 13:39:17 +0000, Steve Holden wrote:

[...]

The URL you reference is discussing how you represent arbitrary values
in string literals. If you already have the data in a Python string the
best advise is to use a parameterized query - that way your Python DB
API module will do the escaping for you!

regards
Steve


Thanks for the input. I tried that with a format string and a
dictionary, but I still received a database error indicating illegal
string values. This error went away completely when I used a test file
consisting only of text, but reproduced everytime with a true binary file.
If you can let me know where I am wrong or show me a code snippet with a
sql insert that contains a variable with raw binary data that works,
I would greatly appreciate it.


I tried and my experience was exactly the same, which made me think less
of PostgreSQL.

They don't seem to implement the SQL BLOB type properly, so it looks as
though that rebarbative syntax with all the backslashes is necessary. Sorry.

regards
Steve


with regards to escaping data parameters I have found that I have to
specifically add quotes to my strings for them to be understood by
pstgresql. For example

ifs=open("binarydatafile","r")
binarydata=ifs.read()
stringdata=base64.encodestring(binarydata)

#does not work
cursor.execute("insert into binarytable values(%s)" % stringdata)

#need to do this first
newstringdata = "'" + stringdata + "'"

then the select statment works.
Is this expected behavior? Is there a better way of doing this?

thanks for any insight

Yes, parameterize your queries. I assume you are using psycopg or
something similar to create the database connection (i.e. I something
that expects the "%s" parameter style - there are other options, but we
needn't discuss them here).

The magic incantation you seek is:

cursor.execute("insert into binarytable values(%s)", (stringdata, ))

Note that here there are TWO arguments to the .execute() method. The
first is a parameterized SQL statement, and the second is a tuple of
data items, one for each parameter mark in the SQL.

Using this technique all necessary quoting (and even data conversion
with a good database module) is performed inside the database driver,
meaning (among other things) that your program is no longer vulnerable
to the dreaded SQL injection errors.

This is the technique I was hoping would work with the bytea datatype,
but alas it doesn't. ISTM that PostgreSQL needs a bit of work there,
even though it is otherwise a very polished product.

regards
Steve

That was it. Thanks for your great help.

Chason
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,008
Latest member
Rahul737

Latest Threads

Top