using strings from extension module question + possible documentation error

O

Oleg Leschov

Hi,

I am writing an extension module in which I want to do some heavy
number crunching on large amount of data.

The input data is prepared using Python code and passed to this
extension module via strings containing binary data, and the result is
also passed back as a list of pretty large strings containing binary
data.

So first I thought I'll just pass the empty strings made in Python
along with input data strings, so my extension code would just fill
those empty strings with the results.

But then I read the PyString docs, it says I must not modify any
strings even though it seems to be possible...

Ok then I decided I'll create a list and fill it with strings from my
C extension code..

However to avoid data copying I wish to fill the newly created (using
PyString_FromString(NULL,x) ) strings' buffers (obtained with
PyString_AsString) with data after creating them in my extension
module.

The question is - is this allowed? Because the doc says this
The data must not be modified in any way, unless the string was just created using PyString_FromStringAndSize(NULL, size).

but what exactly does "just created" mean? will it not be considered
"just created" if I call any more Python stuff after
PyString_FromString, like another PyString_FromString along with
PyString_AsString? Which I certainly intend to do since I first create
all strings I want, and then do my calculations which fill those with
actual data.


Another question is this - why does PyString_AsString doc states that
Return a NUL-terminated representation of the contents of string.

when strings may contain binary data and thus NOT NUL-terminated in
general? is this a documentation error or I can't access binary
strings using PyString_AsString ?

P.S. the doc quotes are from
http://docs.python.org/release/2.6.6/c-api/string.html
 
S

Stefan Sonnenberg-Carstens

Am 24.12.2010 13:31, schrieb Oleg Leschov:
Hi,

I am writing an extension module in which I want to do some heavy
number crunching on large amount of data.

The input data is prepared using Python code and passed to this
extension module via strings containing binary data, and the result is
also passed back as a list of pretty large strings containing binary
data.

So first I thought I'll just pass the empty strings made in Python
along with input data strings, so my extension code would just fill
those empty strings with the results.

But then I read the PyString docs, it says I must not modify any
strings even though it seems to be possible...
strings are immutable.
If you pass a string and the underlying C module changes it's contents,
this idiom is broken.
The source for endless pain ...
Ok then I decided I'll create a list and fill it with strings from my
C extension code..

However to avoid data copying I wish to fill the newly created (using
PyString_FromString(NULL,x) ) strings' buffers (obtained with
PyString_AsString) with data after creating them in my extension
module.
You could as an alternative just use byte arrays. These are changeable.
The question is - is this allowed? Because the doc says this
Once the string has been used by python code, you should not change it.
strings are immutable in python, so that every operation on a string
returns a new one.
See above.
but what exactly does "just created" mean? will it not be considered
Just created means just that. It is created and not been passed back to
the interpreter.
So long you may change it.
"just created" if I call any more Python stuff after
PyString_FromString, like another PyString_FromString along with
PyString_AsString? Which I certainly intend to do since I first create
all strings I want, and then do my calculations which fill those with
actual data.


Another question is this - why does PyString_AsString doc states that
when strings may contain binary data and thus NOT NUL-terminated in
general? is this a documentation error or I can't access binary
strings using PyString_AsString ?
Read carefully: NUL-terminated representation of the contents of the string.
It may contain other data, but the C-API will take care that this
"contract" will be valid.
There do they live, indeed.
 
O

Oleg Leschov

You could as an alternative just use byte arrays. These are changeable.

thanks, that's exactly what I need. I have completely missed those
things since they're pretty new.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top