G
Glenn Linderman
I think we miscommunicated there--I'm actually agreeing with you. I
was trying to make the same point you were: that intricate and/or
large structures are meant to be passed around by a top-level pointer,
not using and serialization/messaging. This is what I've been trying
to explain to others here; that IPC and shared memory unfortunately
aren't viable options, leaving app threads (rather than child
processes) as the solution.
And I think we still are miscommunicating! Or maybe communicating anyway!
So when you said "object", I actually don't know whether you meant
Python object or something else. I assumed Python object, which may not
have been correct... but read on, I think the stuff below clears it up.
Your instincts are right. I'd only add on that when you're talking
about data structures associated with an intricate video format, the
complexity and depth of the data structures is insane -- the LAST
thing you want to burn cycles on is serializing and unserializing that
stuff (so IPC is out)--again, we're already on the same page here.
I think at one point you made the comment that shared memory is a
solution to handle large data sets between a child process and the
parent. Although this is certainty true in principle, it doesn't hold
up in practice since complex data structures often contain 3rd party
and OS API objects that have their own allocators. For example, in
video encoding, there's TONS of objects that comprise memory-resident
video from all kinds of APIs, so the idea of having them allocated
from shared/mapped memory block isn't even possible. Again, I only
raise this to offer evidence that doing real-world work in a child
process is a deal breaker--a shared address space is just way too much
to give up.
So I was thinking of multimedia data structures as a blob, and, in fact,
as a contiguous blob... that would be easy to toss into a shared memory.
Then when you mentioned thousands of objects, I imagined thousands of
Python objects, and somehow transforming the blob into same... and back
again. And Python objects certainly would need to be
serialized/deserialized, either via pickle or some
application-specific-more-efficient mechanism, but still that process
would add 3 copies to the process of moving data from one thread to another.
But now I think I understand your issue, about why shared memory is a
problem.
In addition to contiguous blobs, a multimedia application might have
non-contiguous blobs. For video, the contiguous blob might be an MPEG
stream (of one standard or another). This might get transformed into a
list of frames; the MPEG stream has frames of different types due to
compression, but that isn't the best format for doing transformations,
so a 3rd party library might be called to decompress the stream into a
set of independently allocated chunks, each containing one frame (each
possibly consisting of several allocations of memory for associated
metadata) that is independent of other frames (there may still be some
internal compression for the frame, such as the difference between JPEG
and BMP). This collection of frames is now subdivided into 8 parts, and
each of the 8 parts wants to be passed to a thread for processing. The
application provides a pointer to one part of the frames, each thread
has been loaded with modules that understand the structure of the
frames, and the user code manipulates those frames based on these
manipulation modules. If there are 8 processors, this goes 8 times as
fast as it would otherwise, except for GIL.
Hence, shared memory or shared temporary files is hard, because the
splitter is a 3rd party process that uses the standard C allocator, and
the data would have to be reconstructed/copied in shared space
afterwards to use it in multiple processes, which is not only
performance killing, but is extra code to maintain.
So I think this description is of a problem for which PyC (non-GIL,
independent) threads would be useful, and other solutions would be lower
performance.
It'll be interesting to see if anyone can suggest an alternative, now
that the problem is described this way. I somehow doubt it.