object references/memory access

dlomsak · Jun 30, 2007

Hello,
I have searched a good deal about this topic and have not found
any good information yet. It seems that the people asking all want
something a bit different than what I want and also don't divulge much
about their intentions. I wish to improve the rate of data transfer
between two python programs on the same machine. The project in
question is a database server that I built (written in python) that
works in conjunction with another python script for searching which is
executed via Apache. Basically, I am serving a database but since the
database is large, it remains in the memory of a server program. Upon
submitting a search request, the search script is called by Apache.
This search script makes a connection to the server, sends over the
query and waits on the reply. Queries and records are represented by
dictionaries and the database is a list of dictionaries. The only
problem here is that for large returns, it takes more time and I'd
like to transmit the data. The part that is slowing the data
transmission down seems to be the fact that the data has to travel
across a socket connection to be put back together on the other side.
The server makes a list of dictionaries which represents the records
that match the given criteria and then sends a pickle string of that
object back to the client (the search script).

What I'd like to do is somehow take the returned results of the search
and send over a memory address through the socket and let the search
script read the results directly from the server's memory rather than
send it all through the sockets. There is one more solution which
would work just as well for this application but would be less useful
in general terms. The second method would involve some tricks with the
python CGI module. I am fairly ignorant of how Apache works with the
CGI module but here is what I'd like to do. I want to somehow let the
server print out to the user's browser instead of the search script in
order to cut out the time of sending the results over the socket. This
sounds like it is not possible but I would like to know in greater
detail how the CGI module and Apache work together such that the
'print' statements write out to the user's browser. Mainly, I'd like
to know if there is any kind of descriptor or ID that can be passed
and used by another process to print output to the user's browser
instead of the script that Apache invoked.

If there is not a good Pythonic way to do the above, I am open to
mixing in some C to do the job if that is what it takes. I apologize
if this topic has been brought up many times before but hopefully I
have stated my intentions clearly enough for those with a higher
knowledge of the topic to help. If the above are not possible but you
have a really good idea for zipping large amounts of data from one
program to another, I'd like to hear it.

Thanks to all who take the time to read my request and also those with
a response.

Evan Klitzke · Jun 30, 2007

If there is not a good Pythonic way to do the above, I am open to
mixing in some C to do the job if that is what it takes. I apologize
if this topic has been brought up many times before but hopefully I
have stated my intentions clearly enough for those with a higher
knowledge of the topic to help. If the above are not possible but you
have a really good idea for zipping large amounts of data from one
program to another, I'd like to hear it.

You can do things like this in C with shared memory segments. I'm not
familiar enough with the Python C API to tell you how easy this would
be, but that might be a good place to start your investigations.

Paul Rubin · Jul 1, 2007

dlomsak said:
knowledge of the topic to help. If the above are not possible but you
have a really good idea for zipping large amounts of data from one
program to another, I'd like to hear it.

One cheesy thing you might try is serializing with marshal rather than
pickle. It won't handle as many object types, and it's not guaranteed
interoperable from one Python version to another, but it's potentially
faster than pickle. Of course for pickling I presume you're already
using cPickle (written in C) instead of Pickle (written in Python).

dlomsak · Jul 1, 2007

Paul said:
One cheesy thing you might try is serializing with marshal rather than
pickle. It won't handle as many object types, and it's not guaranteed
interoperable from one Python version to another, but it's potentially
faster than pickle. Of course for pickling I presume you're already
using cPickle (written in C) instead of Pickle (written in Python).

Well, I was using the regular pickle at first but then I switched to
just using repr() / eval() because the resulting string doesn't have
all the extra 's1=' and all that so it cuts down on the amount of data
I have to send for large returns when you cut out all of that
formatting. The speed of the above method is pretty high even for
really large returns and it works fine for a list of dictionaries.

To the previous poster: I'll be looking into C shared memory like you
said. I have a feeling C will let me lay my hands on the things I want
and that Python probably will not. That's fine though because Python
is a wonderful language and I don't mind calling in tedious C to do
the real dirty jobs if that is what it takes.

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Jul 1, 2007

I have searched a good deal about this topic and have not found

any good information yet. It seems that the people asking all want
something a bit different than what I want and also don't divulge much
about their intentions. I wish to improve the rate of data transfer
between two python programs on the same machine.

I don't understand why you want exactly this. Wouldn't it be
sufficient/better if the response time for a request as seen
by the client web browser would improve? Why is it necessary
to start optimizing at the data transfer rate?

As a starting point, I would try to eliminate the CGI part. There
are two ways to do that:
a) run the Python code inside the Apache process, and
b) use a single Python server (possibly shared with the database
process), and connect this to Apache through the
reverse proxy protocol.

The cost you observe might be in the repeated creation of
new processes, and the repeated establishment of new TCP
connections. Either solution would drop some of that overhead.

I am fairly ignorant of how Apache works with the
CGI module but here is what I'd like to do. I want to somehow let the
server print out to the user's browser instead of the search script in
order to cut out the time of sending the results over the socket.

That is not possible. The CGI script does not "print out to the
user's browser". Instead, it prints to its stdout, which is a pipe
being read by Apache; Apache then copies all data to the socket
going to the user's browser (possibly after manipulating the
headers also).

Mainly, I'd like
to know if there is any kind of descriptor or ID that can be passed
and used by another process to print output to the user's browser
instead of the script that Apache invoked.

No. The CGI script has a file handle, and it is not possible to pass
a file handle to a different process.

If there is not a good Pythonic way to do the above, I am open to
mixing in some C to do the job if that is what it takes.

No, it's not Python that fails to support that - it's the operating
system. See above for solutions that avoid one such copying in
the first place.

Regards,
Martin

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Jul 1, 2007

b) use a single Python server (possibly shared with the database

process), and connect this to Apache through the
reverse proxy protocol.

Following up to myself: Instead of using a reverse proxy, you can
also implement the FastCGI protocol in the server.

Regards,
Martin

Paul Rubin · Jul 1, 2007

Martin v. Löwis said:
No. The CGI script has a file handle, and it is not possible to pass
a file handle to a different process.

No, it's not Python that fails to support that - it's the operating
system. See above for solutions that avoid one such copying in
the first place.

If this is a Linux server, it might be possible to use the SCM_RIGHTS
message to pass the socket between processes. That would require a
patch to Python's socket library which I've never gotten around to
writing but it's been on my want-to-do list for a long time. There is
something similar for Solaris and probably for *BSD. I've been under
the impression that this is how preforked Apache distributes requests
between processes, but I never got around to checking.

http://sourceforge.net/tracker/index.php?func=detail&aid=814689&group_id=5470&atid=355470

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Jul 1, 2007

If this is a Linux server, it might be possible to use the SCM_RIGHTS

message to pass the socket between processes.

I very much doubt that the OP's problem is what he thinks it is,
i.e. that copying over a local TCP connection is what makes his
application slow.

That would require a
patch to Python's socket library which I've never gotten around to
writing but it's been on my want-to-do list for a long time. There is
something similar for Solaris and probably for *BSD. I've been under
the impression that this is how preforked Apache distributes requests
between processes, but I never got around to checking.

No, it doesn't. Instead, it is the operating system itself which
distributes the requests: the parent process opens the server socket,
then forks the actual server processes. They all do accept, and
the operating system selects an arbitrary one for the next request.
That process returns from accept, so for the next incoming
connection, one of the remaining processes will be selected.

Regards,
Martin

Paul Rubin · Jul 1, 2007

Martin v. Löwis said:
I very much doubt that the OP's problem is what he thinks it is,
i.e. that copying over a local TCP connection is what makes his
application slow.

Right, the copying should be very fast, but serializing and
deserializing the stuff being copied can be slow. This is an issue
with something I'm currently working on, for example.

[Apache] Instead, it is the operating system itself which
distributes the requests: the parent process opens the server socket, ...

Ah, thanks.

dlomsak · Jul 1, 2007

Thanks for the responses folks. I'm starting to think that there is
merely an inefficiency in how I'm using the sockets. The expensive
part of the program is definitely the socket transfer because I timed
each part of the routine individually. For a small return, the whole
search and return takes a fraction of a second. For a large return (in
this case 21,000 records - 8.3 MB) is taking 18 seconds. 15 of those
seconds are spent sending the serialized results from the server to
the client. I did a little bit of a blind experiment and doubled the
bytes on the client's socket.recv line. This improved the rate of
transfer each time. The original rate when I was accepting 1024 bytes
per recv took 47 seconds to send the 8.3 MB result. By doubling this
size several times, I reduced the time to 18 seconds until doubling it
further produced diminishing results. I was always under the
impression that keeping the send and recv byte sizes around 1024 is a
good idea and I'm sure that jacking those rates up is a lousy way to
mitigate the transfer. It is also interesting to note that increasing
the bytes sent per socket.send on the server side had no visible
effect. Again, that was just a curious experiment.

What bothers me is that I am sure sending data over the local loopback
address should be blazing fast. 8.3 MB should be a breeze because I've
transferred files over AIM to people connected to the same router as
me and was able to send hundreds of megabytes in less than a two or
three seconds. With that said, I feel like something about how I'm
send/recv-ing the data is causing lots of overhead and that I can
avoid reading the memory directly if I can speed that up.

I guess now I'd like to know what are good practices in general to get
better results with sockets on the same local machine. I'm only
instantiating two sockets total right now - one client and one server,
and the transfer is taking 15 seconds for only 8.3MB. If you guys have
some good suggestions on how to better utilize sockets to transfer
data at the speeds I know I should be able to achieve on a local
machine, let me know what you do. At present, I find that using
sockets in python requires very few steps so I'm not sure where I
could really improve at this point.

Thanks for the replies so far, I really appreciate you guys
considering my situation and helping out.

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Jul 1, 2007

I guess now I'd like to know what are good practices in general to get

better results with sockets on the same local machine. I'm only
instantiating two sockets total right now - one client and one server,
and the transfer is taking 15 seconds for only 8.3MB.

It would be good if you had showed the code that does that. It is hard
for us to guess what programming error you have made.

As you won't show code, I will. Please try the attached cli.py and
server.py on your machine, and report the timing. On my machine, I get

0.00105595588684 0.076632976532 8300000

which means I can transmit 8.3MB in 76ms, which is a lot less than
15s.

My guess is that you sum up the incoming data with

total_data += received_data

That is O(n**2).

Regards,
Martin

dlomsak · Jul 1, 2007

Martin said:
It would be good if you had showed the code that does that. It is hard
for us to guess what programming error you have made.

As you won't show code, I will. Please try the attached cli.py and
server.py on your machine, and report the timing. On my machine, I get

0.00105595588684 0.076632976532 8300000

which means I can transmit 8.3MB in 76ms, which is a lot less than
15s.

My guess is that you sum up the incoming data with

total_data += received_data

That is O(n**2).

Regards,
Martin

import socket,time,cStringIO

t1 = time.time()
s = socket.socket()
s.connect(('localhost', 8989))
t2 = time.time()
storage = cStringIO.StringIO()
while True:
data = s.recv(1024)
if not data:
break
storage.write(data)
result = storage.getvalue()
t3 = time.time()

print t2-t1,t3-t2,len(result)

import socket

data = ' '*8300000
s = socket.socket()
s.bind(('', 8989))
s.listen(10)
while True:
s1, peer = s.accept()
print s1,peer
s1.send(data)
s1.close()

I would have put my code up if it were here but it is on my machine at
work which I can't touch until Monday. I know people tend to like to
see code when you're asking for help but it is not available to me
right now so I apologize. You are right though, I believe I made the
mistake of using += to sum the data up and I had never considered the
fact that the runtime of that approach is O(n^2). I am willing to bet
that this was my major shortcoming and you just solved my problem. I
bet the reason that jacking up the socket.recv size is because it took
fewer concatenations. I'll give an official report tomorrow on weather
or not that was the fix but I am very convinced that you got it and
that I won't have to step around the socket transmission.

Thanks a lot Martin and also to the others who responded.

Alex Martelli · Jul 1, 2007

dlomsak said:
search and return takes a fraction of a second. For a large return (in
this case 21,000 records - 8.3 MB) is taking 18 seconds. 15 of those
seconds are spent sending the serialized results from the server to
the client. I did a little bit of a blind experiment and doubled the

So here's a tiny example to show that the mere transfer of bytes on the
socket should be taking nowhere like that long:

#!/usr/local/bin/python
import socket, os, time, sys

port = 8881
sendsize = 1024
recvsize = 1024
totsize = 8*1024*sendsize

def server():
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(('', 8881))
sock.listen(5)
newSocket, address = sock.accept()
totbytes = 0
start = time.time()
while totbytes < totsize:
receivedData = newSocket.recv(recvsize)
if not receivedData: break
totbytes += len(receivedData)
newSocket.close()
sock.close()
return totbytes, time.time()-start

def client():
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(('localhost', 8881))
totbytes = 0
while totbytes < totsize:
sock.sendall(sendsize*'x')
totbytes += sendsize
sock.close()

def main():
print "moving %d bytes (ss=%d, rs=%d)" % (totsize, sendsize,
recvsize)
if os.fork():
# parent process
forbytes, tooktime = server()
else:
# child process
time.sleep(0.5)
client()
sys.exit(0)
stend = time.time()
print "%d bytes in %5.2f sec (ss=%d, rs=%d)" % (forbytes,
tooktime, sendsize, recvsize)

main()

brain:~/downloads alex$ python sere.py
moving 8388608 bytes (ss=1024, rs=1024)
8388608 bytes in 0.08 sec (ss=1024, rs=1024)

So, moving 8.3 MB on a bare socket should take about 100 milliseconds,
give or take.

So let's try WITH pickling and unpickling (done right):

#!/usr/local/bin/python
import socket, os, time, sys, random, cPickle

port = 8881
sendsize = 1024
recvsize = 1024

data = [random.random() for i in xrange(1000*1000)]
pickled_data = cPickle.dumps(data, 2)
totsize = len(pickled_data)

def server():
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(('', 8881))
sock.listen(5)
newSocket, address = sock.accept()
totbytes = 0
recvdata = []
start = time.time()
while totbytes < totsize:
receivedData = newSocket.recv(recvsize)
if not receivedData: break
totbytes += len(receivedData)
recvdata.append(receivedData)
newSocket.close()
sock.close()
data = cPickle.loads(''.join(recvdata))
return totbytes, time.time()-start

def client():
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(('localhost', 8881))
totbytes = 0
while totbytes < totsize:
totbytes += sock.send(pickled_data[totbytes:totbytes+sendsize])
sock.close()

def main():
print "moving %d bytes (ss=%d, rs=%d)" % (totsize, sendsize,
recvsize)
if os.fork():
# parent process
forbytes, tooktime = server()
else:
# child process
time.sleep(0.5)
client()
sys.exit(0)
stend = time.time()
print "%d bytes in %5.2f sec (ss=%d, rs=%d)" % (forbytes,
tooktime, sendsize, recvsize)

main()

brain:~/downloads alex$ python sere.py
moving 9002006 bytes (ss=1024, rs=1024)
9002006 bytes in 0.32 sec (ss=1024, rs=1024)

So, a bit more data, quite a bit longer, but still on the order of
magnitude of 300 milliseconds or so.

Again this suggests the problems are not "intrinsic" to the task.

It's hard to guess at exactly what it may be that you're doing wrong.
For example, if recvdata was a string (grown with +=) rather than a list
(grown with append), this would boost the runtime to 0.76 seconds; a
huge waste (more than a factor of two blown away by a minor programming
gaucheness) but still a long way from the several orders of magniture
you're observing.

So, I suggest you try programming the interaction directly to bare
sockets, as I do here (and in several examples in Chapter 20 in "Python
in a Nutshell" 2nd edition), and see what difference that makes to your
timings.

Alex

dlomsak · Jul 2, 2007

Okay, Im back at work and got to put some of these suggestions to use.
cPickle is doing a great job a hiking up the serialization rate and
cutting out the +=data helped a lot too. The entire search process now
for this same data set is down to about 4-5 seconds from pressing
'search' to having the records posted to the browser. Only a fraction
of a second is spent transmitting the data now. Some of the time is
spent waiting for the sockets to actually make their connection. I'll
be now looking into FastCGI to see how much more time I can trim off
the total process.

Once again I would like to say thanks to everyone for the help and
taking the time out to give me some example code to study. I'm glad
that what I thought I wanted to do is not necessary and that the
sockets can send at the speed I hoped they could. This was my first
posting in this group and I plan to remain active and try to help out
where I can. I am fully statisfied with the responses and consider my
problem solved.

Karthik Gurusamy · Jul 2, 2007

Thanks for the responses folks. I'm starting to think that there is
merely an inefficiency in how I'm using the sockets. The expensive
part of the program is definitely the socket transfer because I timed
each part of the routine individually. For a small return, the whole
search and return takes a fraction of a second. For a large return (in
this case 21,000 records - 8.3 MB) is taking 18 seconds. 15 of those
seconds are spent sending the serialized results from the server to
the client. I did a little bit of a blind experiment and doubled the
bytes on the client's socket.recv line. This improved the rate of
transfer each time. The original rate when I was accepting 1024 bytes
per recv took 47 seconds to send the 8.3 MB result. By doubling this
size several times, I reduced the time to 18 seconds until doubling it
further produced diminishing results. I was always under the
impression that keeping the send and recv byte sizes around 1024 is a
good idea and I'm sure that jacking those rates up is a lousy way to
mitigate the transfer. It is also interesting to note that increasing
the bytes sent per socket.send on the server side had no visible
effect. Again, that was just a curious experiment.

What bothers me is that I am sure sending data over the local loopback
address should be blazing fast. 8.3 MB should be a breeze because I've
transferred files over AIM to people connected to the same router as
me and was able to send hundreds of megabytes in less than a two or
three seconds. With that said, I feel like something about how I'm
send/recv-ing the data is causing lots of overhead and that I can
avoid reading the memory directly if I can speed that up.

I guess now I'd like to know what are good practices in general to get
better results with sockets on the same local machine. I'm only
instantiating two sockets total right now - one client and one server,
and the transfer is taking 15 seconds for only 8.3MB. If you guys have
some good suggestions on how to better utilize sockets to transfer
data at the speeds I know I should be able to achieve on a local
machine, let me know what you do. At present, I find that using
sockets in python requires very few steps so I'm not sure where I
could really improve at this point.

I have found the stop-and-go between two processes on the same machine
leads to very poor throughput. By stop-and-go, I mean the producer and
consumer are constantly getting on and off of the CPU since the pipe
gets full (or empty for consumer). Note that a producer can't run at
its top speed as the scheduler will pull it out since it's output pipe
got filled up.

When you increased the underlying buffer, you mitigated a bit this
shuffling. And hence saw a slight increase in performance.

My guess that you can transfer across machines at real high speed, is
because there are no process swapping as producer and consumer run on
different CPUs (machines, actually).

Since the two processes are on the same machine, try using a temporary
file for IPC. This is not as efficient as real shared memory -- but it
does avoid the IPC stop-n-go. The producer can generate the multi-mega
byte file at one go and inform the consumer. The file-systems have
gone thru' decades of performance tuning that this job is done really
efficiently.

Thanks,
Karthik

Steve Holden · Jul 2, 2007

Karthik said:
I have found the stop-and-go between two processes on the same machine
leads to very poor throughput. By stop-and-go, I mean the producer and
consumer are constantly getting on and off of the CPU since the pipe
gets full (or empty for consumer). Note that a producer can't run at
its top speed as the scheduler will pull it out since it's output pipe
got filled up.

But when both processes are in the memory of the same machine and they
communicate through an in-memory buffer, what's to stop them from
keeping the CPU fully-loaded (assuming they are themselves compute-bound)?

When you increased the underlying buffer, you mitigated a bit this
shuffling. And hence saw a slight increase in performance.

My guess that you can transfer across machines at real high speed, is
because there are no process swapping as producer and consumer run on
different CPUs (machines, actually).

As a concept that's attractive, but it's easy to demonstrate that (for
example) two machines will get much better throughput using the
TCP-based FTP to transfer a large file than they do with the UDP-based
TFTP. This is because the latter protocol requires the sending unit to
stop and wait for an acknowledgment for each block transferred. With
FTP, if you use a large enough TCP sliding window and have enough
content, you can saturate a link as ling as its bandwidth isn't greater
than your output rate.

This isn't a guess ...

Since the two processes are on the same machine, try using a temporary
file for IPC. This is not as efficient as real shared memory -- but it
does avoid the IPC stop-n-go. The producer can generate the multi-mega
byte file at one go and inform the consumer. The file-systems have
gone thru' decades of performance tuning that this job is done really
efficiently.

I'm afraid this comes across a bit like superstition. Do you have any
evidence this would give superior performance?regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------

Karthik Gurusamy · Jul 2, 2007

But when both processes are in the memory of the same machine and they
communicate through an in-memory buffer, what's to stop them from
keeping the CPU fully-loaded (assuming they are themselves compute-bound)?

If you are a producer and if your output goes thru' a pipe, when the
pipe gets full, you can no longer run. Someone must start draining the
pipe.
On a single core CPU when only one process can be running, the
producer must get off the CPU so that the consumer may start the
draining process.

As a concept that's attractive, but it's easy to demonstrate that (for
example) two machines will get much better throughput using the
TCP-based FTP to transfer a large file than they do with the UDP-based
TFTP. This is because the latter protocol requires the sending unit to
stop and wait for an acknowledgment for each block transferred. With
FTP, if you use a large enough TCP sliding window and have enough
content, you can saturate a link as ling as its bandwidth isn't greater
than your output rate.

This isn't a guess ...

What you say about a stop-n-wait protocol versus TCP's sliding window
is correct.
But I think it's totally orthogonal to the discussion here. The issue
I'm talking about is how to keep the end nodes chugging along, if they
are able to run simultaneously. They can't if they aren't on a multi-
core CPU or one different machines.

I'm afraid this comes across a bit like superstition. Do you have any
evidence this would give superior performance?

I did some testing before when I worked on boosting a shell pipeline
performance and found using file-based IPC was very good.
(some details at http://kar1107.blogspot.com/2006/09/unix-shell-pipeline-part-2-using.html
)

Thanks,
Karthik

Steve Holden · Jul 2, 2007

Karthik said:
If you are a producer and if your output goes thru' a pipe, when the
pipe gets full, you can no longer run. Someone must start draining the
pipe.
On a single core CPU when only one process can be running, the
producer must get off the CPU so that the consumer may start the
draining process.

Wrong. The process doesn't "get off" the CPU, it remains loaded, and
will become runnable again once the buffer has been depleted by the
other process (which is also already loaded into memory and will become
runnable as soon as a filled buffer becomes available).

What you say about a stop-n-wait protocol versus TCP's sliding window
is correct.
But I think it's totally orthogonal to the discussion here. The issue
I'm talking about is how to keep the end nodes chugging along, if they
are able to run simultaneously. They can't if they aren't on a multi-
core CPU or one different machines.

If you only have one CPU then sure, you can only run one process at a
time. But your understanding of how multiple processes on the same CPU
interact is lacking.

I did some testing before when I worked on boosting a shell pipeline
performance and found using file-based IPC was very good.
(some details at http://kar1107.blogspot.com/2006/09/unix-shell-pipeline-part-2-using.html
)

Thanks,
Karthik

If you get better performance by writing files and reading them instead
of using pipes to communicate then something is wrong.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------

Karthik Gurusamy · Jul 2, 2007

Karthik said:
Karthik said:

Karthik Gurusamy wrote:
[...]
I have found the stop-and-go between two processes on the same machine
leads to very poor throughput. By stop-and-go, I mean the producer and
consumer are constantly getting on and off of the CPU since the pipe
gets full (or empty for consumer). Note that a producer can't run at
its top speed as the scheduler will pull it out since it's output pipe
got filled up.
But when both processes are in the memory of the same machine and they
communicate through an in-memory buffer, what's to stop them from
keeping the CPU fully-loaded (assuming they are themselves compute-bound)?

Click to expand...

Click to expand...

If you are a producer and if your output goes thru' a pipe, when the
pipe gets full, you can no longer run. Someone must start draining the
pipe.
On a single core CPU when only one process can be running, the
producer must get off the CPU so that the consumer may start the
draining process.

Click to expand...

Wrong. The process doesn't "get off" the CPU, it remains loaded, and
will become runnable again once the buffer has been depleted by the
other process (which is also already loaded into memory and will become
runnable as soon as a filled buffer becomes available).

huh? "get off" when talking about scheduling and CPU implies you are
not running.
It is a common term to imply that you are not running -- doesn't mean
it goes away from main memory. Sorry where did you learn your CS
concepts?

If you only have one CPU then sure, you can only run one process at a
time. But your understanding of how multiple processes on the same CPU
interact is lacking.

huh?

If you get better performance by writing files and reading them instead
of using pipes to communicate then something is wrong.

Why don't you provide a better explanation for the observed behavior
than to just claim that a given explanation is wrong? I did mention
using real shared memory is better. I do know the cost of using a file
("physical disk movements") - but with the amount of buffering that
goes on today's file-system implementations, for this problem, we will
see big improvement.

Karthik

urielka · Jul 2, 2007

if both the search server and the web server/script are in the same
computer you could use POSH(http://poshmodule.sourceforge.net/) for
memory sharing or if you are in UNIX you can use mmap.
this is way faster than using sockets and doesn`t require the
serialization/deserialization step.

How to unload 'execution Busy memory' with Php ? function 'unset(...)'	1	Apr 10, 2025
How do I fix Error 1028: Insufficient Memory in IBM Notes when opening a large NSF file?	0	Feb 19, 2026
Updating JSON object	1	Aug 12, 2023
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
Return pointer from void only gives the memory address	1	Nov 23, 2024
How to Import Excel Files to iCloud for Easy Access	0	Jan 23, 2025
How can I upload a tar.bz2 file to OpenStack swift object storage container using the Python swift client?	2	Mar 22, 2024
How to get all values of an object	1	Mar 25, 2022

object references/memory access

dlomsak

Evan Klitzke

Paul Rubin

dlomsak

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Paul Rubin

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Paul Rubin

dlomsak

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

dlomsak

Alex Martelli

dlomsak

Karthik Gurusamy

Steve Holden

Karthik Gurusamy

Steve Holden

Karthik Gurusamy

urielka

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads