Assigning generator expressions to ctype arrays

P

Patrick Maupin

Bug or misunderstanding?

Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
x = 32 * [0]
x[:] = (x for x in xrange(32))
from ctypes import c_uint
x = (32 * c_uint)()
x[:] = xrange(32)
x[:] = (x for x in xrange(32))
Traceback (most recent call last):

Thanks,
Pat
 
S

Steven D'Aprano

Bug or misunderstanding?

Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53) [GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
x = 32 * [0]
x[:] = (x for x in xrange(32))
from ctypes import c_uint
x = (32 * c_uint)()
x[:] = xrange(32)
x[:] = (x for x in xrange(32))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Can only assign sequence of same size

From the outside, you can't tell how big a generator expression is. It
has no length:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'generator' has no len()

Since the array object has no way of telling whether the generator will
have the correct size, it refuses to guess. I would argue that it should
raise a TypeError with a less misleading error message, rather than a
ValueError, so "bug".

The simple solution is to use a list comp instead of a generator
expression. If you have an arbitrary generator passed to you from the
outside, and you don't know how long it is yourself, you can use
itertools.islice to extract just the number of elements you want. Given g
some generator expression, rather than doing this:

# risky, if g is huge, the temporary list will also be huge
x[:] = list(g)[:32]

do this instead:

# use lazy slices guaranteed not to be unexpectedly huge
x[:] = list(itertools.islice(g, 32))
 
P

Patrick Maupin

From the outside, you can't tell how big a generator expression is. It has no length:

I understand that.
Since the array object has no way of telling whether the generator will have the correct size, it refuses to guess.

It doesn't have to guess. It can assume that I, the programmer, know
what the heck I am doing, and then validate that assumption -- trust,
but verify. It merely needs to fill the slice and then ask for one
more and check that StopIteration is raised.
I would argue that it should raise a TypeError
with a less misleading error message, rather
than a ValueError, so "bug".

And I would argue that it should simply work, unless someone can
present a more compelling reason why not.
The simple solution is to use a list comp
instead of a generator expression.

I know how to work around the issue. I'm not sure I should have to.
It violates the principle of least surprise for the ctypes array to
not be able to interoperate with the iterator protocol in this
fashion.

Regards,
Pat
 
T

Terry Reedy

x[:] = (x for x in xrange(32))

This translates to

s.__setitem__(slice(None,None), generator_object)

where 'generator_object' is completely opaque, except that it will yield
0 to infinity objects in response to next() before raising StopIteration.

Given that a cytpe_array is a *fixed-length* array, *unlike* Python's
extensible lists and arrays, failure is a possibility due to mis-matched
lengths. So ctype_array can either look first, as it does, by calling
len(value_object), or leap first and create a temporary array, see if it
fills up exactly right, and if it does, copy it over.
I know how to work around the issue. I'm not sure I should have to.

I do not think everyone else should suffer substantial increase in space
and run time to avoid surprising you.
It violates the principle of least surprise

for ctypes to do what is most efficient in 99.9% of uses?
for the ctypes array to
not be able to interoperate with the iterator protocol in this
fashion.

It could, but at some cost. Remember, people use ctypes for efficiency,
so the temp array path would have to be conditional. When you have a
patch, open a feature request on the tracker.
 
S

Steven D'Aprano

I understand that.


It doesn't have to guess. It can assume that I, the programmer, know
what the heck I am doing, and then validate that assumption -- trust,
but verify. It merely needs to fill the slice and then ask for one more
and check that StopIteration is raised.

Simple, easy, and wrong.

It needs to fill in the slice, check that the slice has exactly the right
number of elements (it may have fewer), and then check that the iterator
is now empty.

If the slice has too few elements, you've just blown away the entire
iterator for no good reason.

If the slice is the right length, but the iterator doesn't next raise
StopIteration, you've just thrown away one perfectly good value. Hope it
wasn't something important.

And I would argue that it should simply work, unless someone can present
a more compelling reason why not.

I think that "the iterator protocol as it exists doesn't allow it to work
the way you want" is a pretty compelling reason.

I know how to work around the issue. I'm not sure I should have to. It
violates the principle of least surprise for the ctypes array to not be
able to interoperate with the iterator protocol in this fashion.

Perhaps you're too easily surprised by the wrong things.
 
T

Terry Reedy

If the slice has too few elements, you've just blown away the entire
iterator for no good reason.
If the slice is the right length, but the iterator doesn't next raise
StopIteration, you've just thrown away one perfectly good value. Hope it
wasn't something important.

You have also over-written values that should be set back to what they
were, before the exception is raised, which is why I said the test needs
to be done with a temporary array.
 
P

Patrick Maupin

I do not think everyone else should suffer substantial increase in space
and run time to avoid surprising you.

What substantial increase? There's already a check that winds up
raising an exception. Just make it empty an iterator instead.
for ctypes to do what is most efficient in 99.9% of uses?

It doesn't work at all with an iterator, so it's most efficient 100%
of the time now. How do you know how many people would use iterators
if it worked?
It could, but at some cost. Remember, people use ctypes for efficiency,

yes, you just made my argument for me. Thank you. It is incredibly
inefficient to have to create a temp array.
so the temp array path would have to be conditional.

I don't understand this at all. Right now, it just throws up its
hands and says "I don't work with iterators." Why would it be a
problem to change this?
 
T

Terry Reedy

What substantial increase?

of time and space, as I said, for the temporary array that I think would
be needed and which I also described in the previous paragraph that you
clipped
There's already a check that winds up
raising an exception. Just make it empty an iterator instead.

It? I have no idea what you intend that to refer to.

It doesn't work at all with an iterator, so it's most efficient 100%
of the time now. How do you know how many people would use iterators
if it worked?

I doubt it would be very many because it is *impossible* to make it work
in the way that I think people would want it to.
yes, you just made my argument for me. Thank you. It is incredibly
inefficient to have to create a temp array.

But necessary to work with blank box iterators. Now you are agreeing
with my argument.
I don't understand this at all. Right now, it just throws up its
hands and says "I don't work with iterators."

If ctype_array slice assignment were to be augmented to work with
iterators, that would, in my opinion (and see below), require use of
temporary arrays. Since slice assignment does not use temporary arrays
now (see below), that augmentation should be conditional on the source
type being a non-sequence iterator.
Why would it be a problem to change this?

CPython comes with immutable fixed-length arrays (tuples) that do not
allow slice assignment and mutable variable-length arrays (lists) that
do. The definition is 'replace the indicated slice with a new slice
built from all values from an iterable'. Point 1: This works for any
properly functioning iterable that produces any finite number of items.
Iterators are always exhausted.

Replace can be thought of as delete follewed by add, but the
implementation is not that naive. Point 2: If anything goes wrong and an
exception is raised, the list is unchanged. This means that there must
be temporary internal storage of either old or new references. An
example that uses an improperly functioning generator.
a [0, 1, 2, 3, 4, 5, 6, 7]
def g():
yield None
raise ValueError
Traceback (most recent call last):
File "<pyshell#21>", line 1, in <module>
a[3:6]=g()
File "<pyshell#20>", line 3, in g
raise ValueError
ValueError[0, 1, 2, 3, 4, 5, 6, 7]

A c_uint array is a new kind of beast: a fixed-length mutable array. So
it has to have a different definition of slice assignment than lists.
Thomas Heller, the ctypes author, apparently chose 'replacement by a
sequence with exactly the same number of items, else raise an
exception'. though I do not know what the doc actually says.

An alternative definition would have been to replace as much of the
slice as possible, from the beginning, while ignoring any items in
excess of the slice length. This would work with any iterable. However,
partial replacement of a slice would be a surprising innovation to most.

The current implementation assumes that the reported length of a
sequence matches the valid indexes and dispenses with temporary storage.
This is shown by the following:

from ctypes import c_uint
n = 20

class Liar:
def __len__(self): return n
def __getitem__(self, index):
if index < 10:
return 1
else:
raise ValueError()

x = (n * c_uint)()
print(list(x))
x[:] = range(n)
print(list(x))
try:
x[:] = Liar()
except:
pass
print(list(x))[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

I consider such unintended partial replacement to be a glitch. An
exception could be raised, but without adding temp storage, the array
could not be restored. And making a change *and* raising an exception
would be a different sort of glitch. (One possible with augmented
assignment involving a mutable member of a tuple.) So I would leave this
as undefined behavior for an input outside the proper domain of the
function.

Anyway, as I said before, you are free to propose a specific change
('work with iterators' is too vague) and provide a corresponding patch.
 
P

Patrick Maupin

You have also over-written values that should be set back to what they
were, before the exception is raised, which is why I said the test needs
to be done with a temporary array.

Sometimes when exceptions happen, data is lost. You both make a big
deal out of simultaneously (a) not placing burden on the normal case
and (b) defining the normal case by way of what happens during an
exception. Iterators are powerful and efficient, and ctypes are
powerful and efficient, and the only reason you've managed to give why
I shouldn't be able to fill a ctype array slice from an iterator is
that, IF I SCREW UP and the iterator doesn't produce the right amount
of data, I will have lost some data.

Regards,
Pat
 
P

Patrick Maupin

Sometimes when exceptions happen, data is lost. You both make a big
deal out of simultaneously (a) not placing burden on the normal case
and (b) defining the normal case by way of what happens during an
exception.  Iterators are powerful and efficient, and ctypes are
powerful and efficient, and the only reason you've managed to give why
I shouldn't be able to fill a ctype array slice from an iterator is
that, IF I SCREW UP and the iterator doesn't produce the right amount
of data, I will have lost some data.

Regards,
Pat

And, BTW, the example you give of, e.g.

a,b,c = (some generator expression)

ALREADY LOSES DATA if the iterator isn't the right size and it raises
an exception.

It doesn't overwrite a or b or c, but you're deluding yourself if you
think that means it hasn't altered the system state.
 
S

Steven D'Aprano

And, BTW, the example you give of, e.g.

a,b,c = (some generator expression)

ALREADY LOSES DATA if the iterator isn't the right size and it raises an
exception.

Yes. What's your point? This fact doesn't support your proposal in the
slightest. You have argued against using a temporary array. I quote:

"It is incredibly inefficient to have to create a temp array."

[Aside: how do you know this is not just inefficient but *incredibly* so?]

But that's exactly what happens in tuple unpacking: the generator on the
right hand side is unpacked into a temporary tuple, and then assigned to
the targets on the left hand side. If the number of elements on both
sides aren't the same, the assignment fails. That is, tuple unpacking
behaves like this pseudo-code:

targets = left hand side
values = tuple(right hand side)
if len(targets) != len(values):
fail
otherwise:
for each pair target, value:
target = value

This has the important property that the assignment is atomic: it either
succeeds in full, or it doesn't occur. The only side-effect is to exhaust
the generator, which is unavoidable given that generators don't have a
length.

Without temporary storage for the right hand side, you lose the property
of atomicism. That would be unacceptable.

In the case of the ctypes array, the array slice assignment is also
treated as atomic: it either succeeds in full, or it fails in full. This
is an important property. Unlike tuple unpacking, the array is even more
conservative about what is on the right hand size: it doesn't accept
iterators at all, only sequences. This is a sensible default, because it
is *easy* to work around: if you want to unpack the iterator, just make a
temporary list:

array[:] = list(x+1 for x in range(32))

Assignment remains atomic, and the generator will be unpacked into a
temporary list at full C speed.

If you don't care about assignment being atomic -- and it's your right to
weaken the array contract in your own code -- feel free to write your own
helper function based on your earlier suggestion:

"It merely needs to fill the slice and then ask for one more and check
that StopIteration is raised."

def array_assign(array, values):
try:
if len(values) == len(array):
array[:] = values # runs at full C speed
except TypeError:
try:
for i in xrange(len(array)):
array = next(values) # or values.next in Python 2.5
except StopIteration:
raise TypeError('too few items')
try:
next(values)
except StopIteration:
pass
else:
raise TypeError('too many values')


But this non-atomic behaviour would be entirely inappropriate as the
default behaviour for a ctypes array.
 
P

Patrick Maupin

On Oct 28, 8:01 pm, Steven D'Aprano > > ALREADY LOSES DATA if the
iterator isn't the right size and it raises an
Yes. What's your point? This fact doesn't support your proposal in the
slightest.

You earlier made the argument that "If the slice has too few elements,
you've just blown away the entire iterator for no good reason. If the
slice is the right length, but the iterator doesn't next raise
StopIteration, you've just thrown away one perfectly good value. Hope
it
wasn't something important."

In other words, you WERE arguing that it's very important not to lose
information. Now that I show that information is ALREADY LOST in a
similar scenario, you are arguing that's irrelevant. Whatever.
You have argued against using a temporary array. I quote:

[Aside: how do you know this is not just inefficient but *incredibly* so?]

By profiling my app, looking for places to save time without having to
resort to C.
But that's exactly what happens in tuple unpacking:
the generator on the right hand side is unpacked into
a temporary tuple, and then assigned to the targets
on the left hand side.

Agreed. But Terry was making the (correct) argument that people using
ctypes are often looking for efficiency.
If the number of elements on both sides aren't the same,
the assignment fails. That is, tuple unpacking behaves
like (snip)

Yes, and that loses information from the right hand side if it fails
and the right hand side happens to be an iterator. For some reason
you're OK with that in this case, but it was the worst thing in the
world a few messages ago.
This has the important property that the
assignment is atomic: it either succeeds in full,
or it doesn't occur.

Not right at all.
The only side-effect is to exhaust the generator,
which is unavoidable given that generators don't have a
length.

Yes. But earlier you were acting like it would be problematic for me
to lose information out of the generator if there were a failure, and
now the sanctity of the information on the LHS is so much more than on
the RHS. Frankly, that's all a crock. In your earlier email, you
argued that my proposal loses information, when in fact, in some cases
it PRESERVES information (the very information you are trying to
transfer) that ISN'T preserved when this temp array is tossed, and the
only information it LOSES is information the programmer declared his
clear intent to kill. But that's an edge case anyway.
Without temporary storage for the right hand side,
you lose the property of atomicism. That would be
unacceptable.

As if the temporary storage workaround exhibited the "necessary"
atomicity, or as if you have even showed a valid argument why the
atomicity is important for this case.
In the case of the ctypes array, the array slice assignment is also
treated as atomic: it either succeeds in full, or it fails in full.
This is an important property. Unlike tuple unpacking, the array is even more
conservative about what is on the right hand size: it doesn't accept
iterators at all, only sequences. This is a sensible default,

How it works is not a bad start, but it's arguably unfinished.
because it is *easy* to work around: if you want to unpack the iterator, just make a temporary list: (snip)

I already said I know the workaround. I don't know why you can't
believe that. But one of the purposes of the iterator protocol is to
reduce the number of cases where you have to create huge lists.
Assignment remains atomic, and the generator will be unpacked into a
temporary list at full C speed.

Which can be very slow if your list has several million items in it.
If you don't care about assignment being atomic -- and it's your right to
weaken the array contract in your own code

Currently, there IS no contract between ctype arrays and generators.

I'm suggesting that it would be useful to create one, and further
suggesting that if a programmer attempts to load a ctypes array from a
generator of the wrong size, it is certainly important to let the
programmer know he screwed up, but it is not at all important that
some elements of the ctypes array, that the programmer was in fact
trying to replace, were in fact correctly replaced before the size
error was noticed.
-- feel free to write your own
helper function based on your earlier suggestion: (snip)

Doing this in Python is also slow and painful, and the tradeoff of
this vs the temp list depends on the system, the caching, the amount
of memory available, and the size of the data to be transferred. I
could do it in C, but that defeats my whole purpose of using ctypes to
avoid having to ship C code to customers.
But this non-atomic behaviour would be entirely inappropriate as the
default behaviour for a ctypes array.

That's obviously your opinion, but your supporting arguments are quite
weak.

Regards,
Pat
 
P

Patrick Maupin

of time and space, as I said, for the temporary array that I think would
be needed and which I also described in the previous paragraph that you
clipped

That's because I don't think it needs a temporary array. A temporary
array would provide some invariant guarantees that are nice but not
necessary in a lot of real-world cases.
It? I have no idea what you intend that to refer to.

Sorry, code path.

There is already a "code path" that says "hey I can't handle this."
To modify this code path to handle the case of a generic iterable
would add a tiny bit of code, but would not add any appreciable space
(no temp array needed for my proposal) and would not add any runtime
to people who are not passing in iterables or doing other things that
currently raise exceptions.
I doubt it would be very many because it is *impossible* to make it work
in the way that I think people would want it to.

How do you know? I have a use case that I really don't think is all
that rare. I know exactly how much data I am generating, but I am
generating it piecemeal using iterators.

No, I don't think I did "make your argument for you." I am currently
making a temp list because I have to, and am proposing that with a
small change to the ctypes library, that wouldn't always need to be
done.
But necessary to work with blank box iterators.

With your own preconceived set of assumptions. (Which I will admit,
you have given quite a bit of thought to, which I appreciate.)
Now you are agreeing with my argument.

Nope, still not doing that.
If ctype_array slice assignment were to be augmented to work with
iterators, that would, in my opinion (and see below),

That's better for not being absolute. Thank you for admitting other
possibilities.
require use of
temporary arrays. Since slice assignment does not use temporary arrays
now (see below), that augmentation should be conditional on the source
type being a non-sequence iterator.

I don't think any temporary array is required, but in any case, yes
the code path through the ctypes array library __setslice__ would have
to be modified where it gives up now, in order to decide to do
something different if it is passed an iterable.
CPython comes with immutable fixed-length arrays (tuples) that do not
allow slice assignment and mutable variable-length arrays (lists) that
do. The definition is 'replace the indicated slice with a new slice
built from all values from an iterable'. Point 1: This works for any
properly functioning iterable that produces any finite number of items.
Agreed.

Iterators are always exhausted.

And my proposal would continue to exhaust iterators, or would raise an
exception if the iterator wasn't exhausted.
Replace can be thought of as delete follewed by add, but the
implementation is not that naive.

Sure, on a mutable length item.
Point 2: If anything goes wrong and an
exception is raised, the list is unchanged.

This may be true on lists, and is quite often true (and is nice when
it happens), but it isn't always true in general. For example, with
the current tuple packing/unpacking protocol across an assignment, the
only real guarantee is that everything is gathered up into a single
object before the assignment is done. It is not the case that nothing
will be unpacked unless everything can be unpacked. For example:
Traceback (most recent call last):
This means that there must
be temporary internal storage of either old or new references.

As I show with the tuple unpacking example, it is not an inviolate law
that Python won't unpack a little unless it can unpack everything.
An
example that uses an improperly functioning generator. (snip)

Yes, I agree that lists are wondrously robust. But one of the reasons
for this is the "flexible" interpretation of slice start and end
points, that can be as surprising to a beginner as anything I'm
proposing.
A c_uint array is a new kind of beast: a fixed-length mutable
array. So it has to have a different definition of slice
assignment than lists. Thomas Heller, the ctypes author,
apparently chose 'replacement by a sequence with exactly
the same number of items, else raise an exception'. though
I do not know what the doc actually says.

Yes, but ctypes was designed and developed before generator
expressions were available, and before or contemporaneously with the
first cut of itertools. We arguably use Python differently than we
did in those days.
An alternative definition would have been to replace as much of the
slice as possible, from the beginning, while ignoring any items in
excess of the slice length. This would work with any iterable.

I think that an iterable that doesn't match the slice length should be
an error condition and raise an exception. Both for too much data and
too little data.
However, partial replacement of a slice would be a surprising innovation to most.

Yes, but when an exception is raised it doesn't always mean that
nothing got replaced. See my tuple unpacking example earlier.
The current implementation assumes that the reported length of a
sequence matches the valid indexes and dispenses with temporary storage. This is shown by the following: (snip)
I consider such unintended partial replacement to be a glitch.

Now that's actually interesting. I agree with you that it's not
desired behavior to not raise an exception. OTOH, exploiting this
misfeature might actually increase performance for my specific case.
An
exception could be raised, but without adding temp storage, the array
could not be restored. And making a change *and* raising an exception
would be a different sort of glitch. (One possible with augmented
assignment involving a mutable member of a tuple.)

It's also possible with non-augmented assignments with immutable
tuples, as I showed above.
So I would leave this as undefined behavior for an input
outside the proper domain of the function.

Not sure what you mean by "this."

I don't think that the interpreter should always paternalistically say
"no, you can't assign an item that doesn't have a __len__ attribute
because you obviously don't know what you're doing if you're trying to
do that." I think the interpreter should do the same as it does on my
tuple unpacking example -- try to do the right thing, and raise an
exception if it fails during the process.
Anyway, as I said before, you are free to propose a specific change
('work with iterators' is too vague) and provide a corresponding patch.

I will try to see if I can do that some time in the next few months,
if I ever get out of crunch mode.

Thanks,
Pat
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top