IPC looking for simple/best way to communicate

Discussion in 'Perl Misc' started by whansen_at_corporate-image_dot_com@us.com, Jan 7, 2005.

  1. Guest

    I'm trying to find the best way to do something. I've got 50 processes
    (may have 200 soon) that need to broadcast simple messages to each of
    them. I tried doing this with sockets, but although I was able to get
    them to read a socket without halting succesfully I found that when
    one process reads a socket it removes the message from the socket and
    the other processes don't see it. My current solution works using
    IPC::Shareable, but is slow and hogs memory as well as the CPU.
    Shareable lets you set a variablle that multiple programs can read and
    write to. In my case they read and write to the list that they all run
    off.

    Basicly each process is iterating over a list (array) and every so
    often a process gets a result that means that item no longer needs to
    be ran, so it should remove it from it's list and notify the other
    processes so that they can remove it from theirs as well. Whith
    IPC::Shareable it works nicely as when one process removes the item,
    all the others have it removed also, but it appers that the shareable
    module is slowing things down considerablly (CPU usage doubled).

    If someone could point me in the right direction, that would be great.
    I have an idea for speeding up shareable a little, but it's still not
    going to be fast.
     
    , Jan 7, 2005
    #1
    1. Advertising

  2. Guest

    wrote:
    > I'm trying to find the best way to do something. I've got 50 processes
    > (may have 200 soon) that need to broadcast simple messages to each of
    > them. I tried doing this with sockets, but although I was able to get
    > them to read a socket without halting succesfully I found that when
    > one process reads a socket it removes the message from the socket and
    > the other processes don't see it.


    You need a separate socket to each process, and send the message to each of
    those sockets. I'd probably have one house-keeper process which collects
    messages from all 50 processes and redistributes them to all 50 processes,
    although I don't know that that is necessary.

    > My current solution works using
    > IPC::Shareable, but is slow and hogs memory as well as the CPU.
    > Shareable lets you set a variablle that multiple programs can read and
    > write to. In my case they read and write to the list that they all run
    > off.


    You probably shouldn't do it that way. Make just an array holding
    exceptions to the main array be shared. Then periodically update the
    (local) main array based on the shared exception array.

    > Basicly each process is iterating over a list (array) and every so
    > often a process gets a result that means that item no longer needs to
    > be ran, so it should remove it from it's list and notify the other
    > processes so that they can remove it from theirs as well.


    What are the consequences if a process doesn't get the message and runs
    that task anyway? Is it just a waste of resources, or is it fatal to the
    whole thing you are trying to do?

    How many such removal messages do you generate, in relation to the full
    size of the array to iterate over? If small, it would probably be most
    efficient to just run even the "removed" tasks and then filter them out in
    post-processing.

    How does it work that your processes are iterating over an array which
    is changing during the iteration? That seems like a problem waiting to
    happen.


    > Whith
    > IPC::Shareable it works nicely as when one process removes the item,
    > all the others have it removed also, but it appers that the shareable
    > module is slowing things down considerablly (CPU usage doubled).
    >
    > If someone could point me in the right direction, that would be great.
    > I have an idea for speeding up shareable a little, but it's still not
    > going to be fast.


    Each process could keep their own private version of the array, and
    only refresh it against the shared version (or against a shared
    exception list) every now and then. How often it would do this refresh
    would depend on the cost of the refresh vs. the wasted effort that goes
    into processing tasks that have been removed since the last refresh.


    When I had to do something sort of like this I used a much simpler
    approach. Each parallel child process was given a batch to do, then would
    reported back to the house-keeper on which things should be removed, and
    then exited. The house-keeper would process those exception to make a new
    set of batches, and spawn another round of child processes.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Jan 7, 2005
    #2
    1. Advertising

  3. Guest

    Intersperesed below:

    On 07 Jan 2005 20:24:52 GMT, wrote:

    > wrote:
    >> I'm trying to find the best way to do something. I've got 50 processes
    >> (may have 200 soon) that need to broadcast simple messages to each of
    >> them. I tried doing this with sockets, but although I was able to get
    >> them to read a socket without halting succesfully I found that when
    >> one process reads a socket it removes the message from the socket and
    >> the other processes don't see it.

    >
    >You need a separate socket to each process, and send the message to each of
    >those sockets. I'd probably have one house-keeper process which collects
    >messages from all 50 processes and redistributes them to all 50 processes,
    >although I don't know that that is necessary.


    If I go this way I probably should set up a server on a socket and
    have it maintain the list centrally, but I'm a little unsure of the
    speed. The 50 processes can iterate as fast as 3 times a second or 150
    times a second for all of them. A central server process is going to
    need to be able to keep up with this rate.


    >> My current solution works using
    >> IPC::Shareable, but is slow and hogs memory as well as the CPU.
    >> Shareable lets you set a variablle that multiple programs can read and
    >> write to. In my case they read and write to the list that they all run
    >> off.

    >
    >You probably shouldn't do it that way. Make just an array holding
    >exceptions to the main array be shared. Then periodically update the
    >(local) main array based on the shared exception array.


    I actually did something like that, but used a string instead of an
    array. It greately sped things up. Each process merely looks for a
    change to the string and if there are any it then decodes the string
    and modifies its internal list.

    >> Basicly each process is iterating over a list (array) and every so
    >> often a process gets a result that means that item no longer needs to
    >> be ran, so it should remove it from it's list and notify the other
    >> processes so that they can remove it from theirs as well.

    >
    >What are the consequences if a process doesn't get the message and runs
    >that task anyway? Is it just a waste of resources, or is it fatal to the
    >whole thing you are trying to do?
    >
    >How many such removal messages do you generate, in relation to the full
    >size of the array to iterate over? If small, it would probably be most
    >efficient to just run even the "removed" tasks and then filter them out in
    >post-processing.


    It's not fatal, but it wastes the request. Each request has a chance
    of getting a product for my company. So, concentrating on products
    that are still available and not wasting requests on taken products
    will improve our chances of getting products 10-20% or so.

    >How does it work that your processes are iterating over an array which
    >is changing during the iteration? That seems like a problem waiting to
    >happen.


    I didn't like that ethior, so instead of removing an element I set it
    to "". Then the main loop has a line:
    unless ($number) {next}
    to skip over removed entries. The extra time for the partial iteration
    is tiny compared to the other factors so this seemed the most elegant
    solution.

    >> Whith
    >> IPC::Shareable it works nicely as when one process removes the item,
    >> all the others have it removed also, but it appers that the shareable
    >> module is slowing things down considerablly (CPU usage doubled).
    >>
    >> If someone could point me in the right direction, that would be great.
    >> I have an idea for speeding up shareable a little, but it's still not
    >> going to be fast.

    >
    >Each process could keep their own private version of the array, and
    >only refresh it against the shared version (or against a shared
    >exception list) every now and then. How often it would do this refresh
    >would depend on the cost of the refresh vs. the wasted effort that goes
    >into processing tasks that have been removed since the last refresh.


    I have it do it for every request. Durring actual conditions the
    requests slow down to a 10 second response which is all the time in
    the world for this use.

    >When I had to do something sort of like this I used a much simpler
    >approach. Each parallel child process was given a batch to do, then would
    >reported back to the house-keeper on which things should be removed, and
    >then exited. The house-keeper would process those exception to make a new
    >set of batches, and spawn another round of child processes.


    That would work nicely, but timing is very important and just bringing
    up 50 processes can take up to ten minutes if they are really active.
    I thought of using the parent/child communication methods, but I'd
    rather do it outright as I am already down this design leg.

    >Xho
    >
    >--
    >-------------------- http://NewsReader.Com/ --------------------
    >Usenet Newsgroup Service $9.95/Month 30GB
     
    , Jan 10, 2005
    #3
  4. Guest

    So your advice boils down to, don't ask questions on Usenet?

    some stats:
    1.7GHz Processor
    1024 MB
    2x80GB raid 1
    500 GB montly badwith
    10mbps internet connection
    Mandrake 9.2
    Perl 5.8.1


    Speed is important, each process should be able to iterate at 3 per
    second with the entire lot comeing in at 150 itterations per second,
    although under actual war conditions this slows down to 10 seconds per
    cycle or about 5 iterations per second for all processes.

    Cpu utalization is important as the current Shareable implementation
    uses about 2% of CPU running at 3 requests per second. 50 processes
    meaning we're at very high CPU usage. However this drops consdierably
    durring the actual run time when things slow down to 10 seconds per
    response.

    Memory is important as it is limited. The current processes use less
    than 1% of the gig memory each. There is about 400megs free on the
    system when they are all running. This could be a factor if we
    increase the 50 processes to 200 or so. We are unable to upgrade
    memory without a new server and additional costs, so memory is the
    major factor lmiiting the number of processes we run.

    > - portability,

    should work on any reasonable linux box with perl
    > - maintainance,

    the code you write needs its oil changed every once in a while?
    > - scalability,

    more processes doing the same thing as described above.
    > - development time,

    not a huge factor. Getting it done right is more important
    > - development costs,

    not really a factor. I'm getting paid
    > - ease of deployment,

    sftp
    > - simplicity,

    so long as I can figgure it out
    > - operational costs,

    not even going to try to figgure that out, doesn't matter or is
    insignificant.


    On 08 Jan 2005 20:33:03 GMT, Abigail <> wrote:

    >
    >() wrote on MMMMCXLVII September MCMXCIII in <URL:news:>:
    >() I'm trying to find the best way to do something.
    >
    >This is an example of a really bad question. Who's going to decide what
    >the "best way" to do something is? All you do is describe a problem,
    >but you don't say *ANYTHING* at all which even remotely hints of what
    >will be a good way for you, let alone the best way.
    >
    >Do you want a solution that is optimized for:
    >
    > - speed,
    > - memory,
    > - portability,
    > - maintainance,
    > - scalability,
    > - development time,
    > - development costs,
    > - ease of deployment,
    > - simplicity,
    > - operational costs,
    > - ...
    >
    >?
    >
    >Note that none of these can be answered without knowing a lot of your
    >development and operation environments.
    >
    >If you want to know the "best way" (of anything, not this problem),
    >hire a good consultant. Don't go to Usenet for answers.
    >
    >
    >
    >Abigail
    >--
    >$_ = "\nrekcaH lreP rehtona tsuJ"; my $chop; $chop = sub {print chop; $chop};
    >$chop -> () -> () -> () -> () -> () -> () -> () -> () -> () -> () -> () -> ()
    >-> () -> () -> () -> () -> () -> () -> () -> () -> () -> () -> () -> () -> ()
     
    , Jan 10, 2005
    #4
  5. Guest

    I'm starting to think that Shareable may be the way to go unless I
    want to go to a server/client connection. I already mentioned that I
    have found a way to use Shareable for communication instead of for the
    actual list and that sped things up a great deal.

    A server would work like this:

    Server accepts connections on given port. Connection can be just a
    request or a request with a removal. Sever accepts any removals and
    removes them from it's list and then replies with the next data item
    from it's list for the client to run. Disadvantage - each process must
    connect to the server once for each itteration. What happens if the
    server is busy with another process. Advantage - Very exact control
    over the list. The server can easily make sure the list is run evenly
    where the current model uses a randomization so that the processes
    won't all be running the same list in synyc.

    Now the server process would be very simple. It just maintains the
    list and gives out the next element. But I wonder if a single process
    could deal well with 150 or more requests per second. I think I'd just
    have to program it and give it a try. Can anyone comment on this? What
    would happen if the server was not avialble? I'm assuming the
    requesting process would just wait for the port to be clear. Again
    this would be much less important in the actual use when it slows down
    to 5 requests per second.



    On Fri, 07 Jan 2005 12:29:16 -0800, Jim Gibson
    <> wrote:

    >In article <>,
    ><> wrote:
    >
    >> I'm trying to find the best way to do something. I've got 50 processes
    >> (may have 200 soon) that need to broadcast simple messages to each of
    >> them. I tried doing this with sockets, but although I was able to get
    >> them to read a socket without halting succesfully I found that when
    >> one process reads a socket it removes the message from the socket and
    >> the other processes don't see it. My current solution works using
    >> IPC::Shareable, but is slow and hogs memory as well as the CPU.
    >> Shareable lets you set a variablle that multiple programs can read and
    >> write to. In my case they read and write to the list that they all run
    >> off.

    >
    >Sockets implement a data connection between two processes, potentially
    >on different systems. Three or more processes cannot share a socket.
    >When you write data to a socket that connects two processes, only the
    >reading process will get the data. Any other process will be reading
    >from its own unique socket and will not receive the data. You would
    >have to connect each pair of processes with a unique socket pair. For
    >200 processes, that would mean 39800 separate sockets.
    >
    >A better approach using sockets would implement a single dispatcher
    >process that sends the current list to each of the 200 analysis
    >processes (or whatever you want to call them). Each analysis process
    >would send the updated list to the central dispatching process.
    >
    >If all of your processes are on the same system, then internet protocol
    >(IP) domain sockets impose a network overhead that is unnecessary. You
    >are better off using Unix domain sockets (assuming you are on Unix,
    >that is).
    >
    >However, it would seem for this application that shared memory is the
    >fastest way to go, and IPC::Shareable gives you access to shared memory
    >(disclaimer: I have not used it). If you are having performance
    >problems, then can try to optimize your use of shared memory. One
    >suggestion would be to make a local copy of the shared memory list and
    >iterate over the local copy. This works if you can use an old list when
    >the list gets modified for a short period of time. You can check
    >periodically (using another shared variable perhaps) if the list has
    >been modified and fetch the new version. You can use a simple counter
    >to indicate when the list has been updated.
    >
    >If you can't get IPC::Shareable to work, you can put the list into a
    >file, periodically read the file if it is unlocked, and have any
    >process that wants to update the file lock it and re-write it.
    >
    >
    >----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
    >http://www.newsfeeds.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
    >---= East/West-Coast Server Farms - Total Privacy via Encryption =---
     
    , Jan 10, 2005
    #5
  6. Guest

    wrote:

    > I actually did something like that, but used a string instead of an
    > array. It greately sped things up. Each process merely looks for a
    > change to the string and if there are any it then decodes the string
    > and modifies its internal list.


    When you have your variable tied to IPC::Shareable, merely looking for a
    change isn't just "merely". It has a lot of overhead. (Not to mention
    potential for corruption if you aren't careful about locking).

    >
    > >> Basicly each process is iterating over a list (array) and every so
    > >> often a process gets a result that means that item no longer needs to
    > >> be ran, so it should remove it from it's list and notify the other
    > >> processes so that they can remove it from theirs as well.

    > >
    > >What are the consequences if a process doesn't get the message and runs
    > >that task anyway? Is it just a waste of resources, or is it fatal to
    > >the whole thing you are trying to do?
    > >
    > >How many such removal messages do you generate, in relation to the full
    > >size of the array to iterate over? If small, it would probably be most
    > >efficient to just run even the "removed" tasks and then filter them out
    > >in post-processing.

    >
    > It's not fatal, but it wastes the request.


    I'm not sure what the "request" is that you are talking about. That
    sounds like you are doing some kind of http or other network processing,
    rather than the parallel computational processing in a SMP environment what
    I had originally thought you were talking about. If you just have one CPU
    and are issuing many slow IOs, maybe you should look at using non-blocking
    IO in just one process rather than spawning an extravagant number of
    processes.

    Anyway, unless someone is charging you per request, a request is not
    something that can be wasted. Only the resources associated with it can be
    wasted, and you should weigh those resources against the resources that, as
    you have discovered, are used by excessive IPC::Shareable (or any other
    synchronization method).

    > Each request has a chance
    > of getting a product for my company. So, concentrating on products
    > that are still available and not wasting requests on taken products
    > will improve our chances of getting products 10-20% or so.


    Let us say that the overhead of extremely fine-grained synchronization
    means that you can only perform 50 requests per second, with none of them
    wasted. While the lowered overhead of more loose synchronization means you
    can do 150 requests per second, with 5 of them wasted? Would it be
    preferable to have 50 good requests per second or 145 good requests per
    second?

    > >Each process could keep their own private version of the array, and
    > >only refresh it against the shared version (or against a shared
    > >exception list) every now and then. How often it would do this refresh
    > >would depend on the cost of the refresh vs. the wasted effort that goes
    > >into processing tasks that have been removed since the last refresh.

    >
    > I have it do it for every request. Durring actual conditions the
    > requests slow down to a 10 second response which is all the time in
    > the world for this use.


    If you already have all the time in the world, why are you worried about
    further optimizing it?

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Jan 11, 2005
    #6
  7. Guest

    wrote:
    >
    > A server would work like this:
    >
    > Server accepts connections on given port. Connection can be just a
    > request or a request with a removal. Sever accepts any removals and
    > removes them from it's list and then replies with the next data item
    > from it's list for the client to run. Disadvantage - each process must
    > connect to the server once for each itteration.


    Why would it have to connect once for each iteration? Just connect at
    the beginnging, and keep reusing that connection.

    > Now the server process would be very simple. It just maintains the
    > list and gives out the next element. But I wonder if a single process
    > could deal well with 150 or more requests per second. I think I'd just
    > have to program it and give it a try. Can anyone comment on this?


    My very simple server can process 100 times that much, 1,000,000 processes
    per minute. See below. I'm sure the code is lousy in many ways, but it
    is just a quick and dirty benchmark.


    Xho



    #!/usr/bin/perl -w
    use strict;
    use IO::Select;
    use IO::Handle;

    my $s=IO::Select->new();
    my %s;

    foreach (1..50) {
    pipe my ($pin,$cout);
    pipe my ($cin,$pout);
    my $pid = fork(); defined $pid or die;
    unless ($pid) { # In the child, interogate parent.
    close $pin; close $pout;
    select $cout; $|=1;
    foreach (1..20000) {
    print "giveme!\n";
    my $x=scalar <$cin>;
    # warn "$$: received $x" if rand()<0.001;
    };
    exit;
    };
    close $cout; close $cin;
    $pout->autoflush();
    $s{$pin}=$pout;
    $s->add($pin);
    };

    my $serial=0;

    while ($s->count()) {
    my @read=$s->can_read();
    foreach (@read) {
    my $x=<$_>;
    unless (defined $x) { $s->remove($_); next};
    die "'$x' ne giveme!" unless $x eq "giveme!\n";
    print {$s{$_}} "you get ". $serial++ . "\n";
    };
    };

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Jan 11, 2005
    #7
  8. Guest

    I'm going to go over your quick and dirty server code. I haven't been
    able to work on this much recently due to other related issues being
    more importatant.

    I think my largest concern is how many processes can work well
    together with memory being fininate. Each process is maintaing a
    telnet session with a mainfraime. Current tests with 50 proceesses
    operate at arround 150 requests per second combined. A request is
    wasted if it is given for a product that is known to already be
    'taken', and yes it would be better to make 145 valid requests per
    second wasting 5 requests than to only make 50 valid requests. That's
    why I think it's just something that I'll have to expierement with.

    IPC::Shareable has a lot of overhead. So a server solution may be able
    to decrease memory usage significantly increasing the number of
    processes that can be run at the same time.

    I hope I haven't pissed everyone off. Talking about things like this
    is very helpful for thinking this through and I value all of your
    suggestions.


    On 11 Jan 2005 01:52:29 GMT, wrote:

    > wrote:
    >
    >> I actually did something like that, but used a string instead of an
    >> array. It greately sped things up. Each process merely looks for a
    >> change to the string and if there are any it then decodes the string
    >> and modifies its internal list.

    >
    >When you have your variable tied to IPC::Shareable, merely looking for a
    >change isn't just "merely". It has a lot of overhead. (Not to mention
    >potential for corruption if you aren't careful about locking).
    >
    >>
    >> >> Basicly each process is iterating over a list (array) and every so
    >> >> often a process gets a result that means that item no longer needs to
    >> >> be ran, so it should remove it from it's list and notify the other
    >> >> processes so that they can remove it from theirs as well.
    >> >
    >> >What are the consequences if a process doesn't get the message and runs
    >> >that task anyway? Is it just a waste of resources, or is it fatal to
    >> >the whole thing you are trying to do?
    >> >
    >> >How many such removal messages do you generate, in relation to the full
    >> >size of the array to iterate over? If small, it would probably be most
    >> >efficient to just run even the "removed" tasks and then filter them out
    >> >in post-processing.

    >>
    >> It's not fatal, but it wastes the request.

    >
    >I'm not sure what the "request" is that you are talking about. That
    >sounds like you are doing some kind of http or other network processing,
    >rather than the parallel computational processing in a SMP environment what
    >I had originally thought you were talking about. If you just have one CPU
    >and are issuing many slow IOs, maybe you should look at using non-blocking
    >IO in just one process rather than spawning an extravagant number of
    >processes.
    >
    >Anyway, unless someone is charging you per request, a request is not
    >something that can be wasted. Only the resources associated with it can be
    >wasted, and you should weigh those resources against the resources that, as
    >you have discovered, are used by excessive IPC::Shareable (or any other
    >synchronization method).
    >
    >> Each request has a chance
    >> of getting a product for my company. So, concentrating on products
    >> that are still available and not wasting requests on taken products
    >> will improve our chances of getting products 10-20% or so.

    >
    >Let us say that the overhead of extremely fine-grained synchronization
    >means that you can only perform 50 requests per second, with none of them
    >wasted. While the lowered overhead of more loose synchronization means you
    >can do 150 requests per second, with 5 of them wasted? Would it be
    >preferable to have 50 good requests per second or 145 good requests per
    >second?
    >
    >> >Each process could keep their own private version of the array, and
    >> >only refresh it against the shared version (or against a shared
    >> >exception list) every now and then. How often it would do this refresh
    >> >would depend on the cost of the refresh vs. the wasted effort that goes
    >> >into processing tasks that have been removed since the last refresh.

    >>
    >> I have it do it for every request. Durring actual conditions the
    >> requests slow down to a 10 second response which is all the time in
    >> the world for this use.

    >
    >If you already have all the time in the world, why are you worried about
    >further optimizing it?
    >
    >Xho
    >
    >--
    >-------------------- http://NewsReader.Com/ --------------------
    >Usenet Newsgroup Service $9.95/Month 30GB
     
    , Jan 20, 2005
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    4
    Views:
    343
    Lawrence D'Oliveiro
    Sep 28, 2006
  2. Laszlo Nagy

    Looking for an IPC solution

    Laszlo Nagy, Aug 31, 2012, in forum: Python
    Replies:
    10
    Views:
    363
    Laszlo Nagy
    Sep 6, 2012
  3. Antoine Pitrou

    Re: Looking for an IPC solution

    Antoine Pitrou, Aug 31, 2012, in forum: Python
    Replies:
    0
    Views:
    172
    Antoine Pitrou
    Aug 31, 2012
  4. Jean-Michel Pichavant

    Re: Looking for an IPC solution

    Jean-Michel Pichavant, Sep 3, 2012, in forum: Python
    Replies:
    0
    Views:
    191
    Jean-Michel Pichavant
    Sep 3, 2012
  5. Gelonida N

    Re: Looking for an IPC solution

    Gelonida N, Sep 6, 2012, in forum: Python
    Replies:
    0
    Views:
    165
    Gelonida N
    Sep 6, 2012
Loading...

Share This Page