IO::Pipe and loss of data

G

Genevieve S.

Hello,

I use the module IO::pipe to have many process-childen communicate to their
father. As they are all sending the same kind of information (and the father
has not the time to listen to a pipe for each of the processes), only one
pipe-object is used for it. Everything seemed to be ok, but then I noticed a
loss of data - a small one but at least a loss. Then I decided to use a
message id to check if the loss was within the pipe or my code. To have it
in numbers: I have 71 child-processes which were sending between 400 and
400,000 messages to the father. That father found 6 cases, where between 2
and 10 message seemed to got lost when sent through the pipe.

The module-description does not help me with that: Is it not good to have so
many writing ends on a single IO::pipe or did I just reach the limit for
memory (on scriptstart the amount of data grows very fast as the children
work on old log files, but after a time the father has worked off and gets
faster than they give more information so the information in the pipe at
once gets smaller again). I am just too new to perl to have a look to the
module and see what my problem is. Is there an easy way to find that out? Or
does anyone already know what might help in this information?

I don't think pasting code here helps, as it would be chopped and not very
helpful anymore. If anyone thinks this might help to understand my problem,
I will do it anyway :)

Thanks a lot for at least reading so far and thanks for all advises!
 
X

xhoster

Genevieve S. said:
Hello,

I use the module IO::pipe to have many process-childen communicate to
their father. As they are all sending the same kind of information (and
the father has not the time to listen to a pipe for each of the
processes),

How much time do you expect that to take?
only one pipe-object is used for it. Everything seemed to be
ok, but then I noticed a loss of data - a small one but at least a loss.
Then I decided to use a message id to check if the loss was within the
pipe or my code. To have it in numbers: I have 71 child-processes which
were sending between 400 and 400,000 messages to the father. That father
found 6 cases, where between 2 and 10 message seemed to got lost when
sent through the pipe.

Please post runnable example code to demonstrate this.

The module-description does not help me with that: Is it not good to have
so many writing ends on a single IO::pipe

Since IO::pipe doesn't mention otherwise, I would assume it is not good to
have even 2 writing ends.
or did I just reach the limit
for memory (on scriptstart the amount of data grows very fast as the
children work on old log files, but after a time the father has worked
off and gets faster than they give more information so the information in
the pipe at once gets smaller again). I am just too new to perl to have a
look to the module and see what my problem is. Is there an easy way to
find that out? Or does anyone already know what might help in this
information?

I don't think pasting code here helps, as it would be chopped and not
very helpful anymore.

After you chop it to remove the irrelevant parts, you need to sew it back
up again (and verify it still shows the problem).
If anyone thinks this might help to understand my
problem, I will do it anyway :)

Thanks a lot for at least reading so far and thanks for all advises!

You could try "flock"ing the write handle before each write, but I doubt
that will help. I think this problem is much more complicated than you
think it is. Look into IO::Select.

Xho
 
B

Brian McCauley

Genevieve said:
I use the module IO::pipe to have many process-childen communicate to their
father. As they are all sending the same kind of information (and the father
has not the time to listen to a pipe for each of the processes), only one
pipe-object is used for it. Everything seemed to be ok, but then I noticed a
loss of data - a small one but at least a loss. Then I decided to use a
message id to check if the loss was within the pipe or my code. To have it
in numbers: I have 71 child-processes which were sending between 400 and
400,000 messages to the father. That father found 6 cases, where between 2
and 10 message seemed to got lost when sent through the pipe.

IO::pipe is just a pretty layer over the built-in pipe(). Personally,
now that we have handle autovivification, I'd not bother and just say:

pipe(my $reader, my $writer) or die $!;

Anyhow - FIFOs should be lossless no matter how many writers there are.

This means all your bytes get through but with multiple writers you
need to flush the writer handle after each message so that you don't
get paritial messages interleaved.

Note also that the each underlying OS-level write() (aka Perl's
syswrite) is only sure to be atomic provided it does not exceed some
OS-defined limit. The latter can be found by calling &POSIX::pIPE_BUF.
If your message size exceeds this limit there is a small but not always
negledible chance that messages will still be interleaved.

If your messages exceed Perl's IO buffering limits then the situation
is worse still.

With large messages you probably should flock() the handle to ensure
only one child can write at a time.

I'm not sure all OSs support flock()ing pipes.
 
X

xhoster

Brian McCauley said:
With large messages you probably should flock() the handle to ensure
only one child can write at a time.

I'm not sure all OSs support flock()ing pipes.

On linux, this doesn't work. I suspect the problem is not that it doesn't
support flock()ing on pipes per se, but rather that it doesn't support
locking on handles inherited across a fork, regardless of whether those
handles are to pipes or to files. (locks held through one forked copy of
the handle will not block other forked copies of that same handle, but they
will block independently acquired handles.)

$ perl -le 'open my $p, ">f" or die $!; fork; flock $p,2 or die $!; \
warn "done"; sleep 10;'
done at -e line 1.
done at -e line 1.

Both "done" messages arrive immediately.

If you swap the fork and the open, then one message arrives 10 seconds
after the other.

Xho
 
G

Genevieve S.

Hello again,
How much time do you expect that to take?
I cannot express this in seconds or anything but as reading from any pipe
blocks until there is anything to read, I have to build in a mechanism that
shortens that waiting time. Whatever that is (I have at least something like
that already for another 'special' child) it takes much more longer to be
used 71 times than a single said:
After you chop it to remove the irrelevant parts, you need to sew it back
up again (and verify it still shows the problem).
I choped it as much as it was possible, but it did not yet show up the
error. May be I cut too much, but as the last run of my big script was fine
for 6 hours before the error appeared I think it is just not easy to
reproduce the error.
Anyway I put some code at the end of this message, which should run and show
at least how I work to send a message and use the pipe.
You could try "flock"ing the write handle before each write, but I doubt
that will help.
I do not have a clue how this could work, so I am willing to believe xhoster
that this does not really work with Linux (which I use). But this will work
with named pipes - so maybe I will just switch that type to see if the error
still exists.
I think this problem is much more complicated than you
think it is. Look into IO::Select.
That first sentence is not making an easy day :) I had a look again to
IO::Select - what I tried to use to find out if my pipe is readable (which
was quite nonsense, as there is a difference between the shown 'open to
read' and the searched 'something in to be read'). I thought it might help
to use has_exception(), but it did not find any on the pipe, even directly
after there was a lost message. But maybe I just launched into that function
and did not recognize what you really wanted me to have a look at.
IO::pipe is just a pretty layer over the built-in pipe(). Personally,
now that we have handle autovivification, I'd not bother and just say:

pipe(my $reader, my $writer) or die $!;
I cut that out and replaced it with IO::pipe as this seemed to be working
with IO::Select (what seems to expect an object). But that seems obsolete
now anyway.
Anyhow - FIFOs should be lossless no matter how many writers there are.
I think you're right, that might be my next step to try.

Only thing left now is that I'm scared wether my errors do depend on
different writers to the same pipe or the amount of data waiting in the pipe
to be read. As the errors occuring yesterday after 6 hours where 5 in a
period of only one minute.

Anyway, thanks for your help, I have at least now some hints how to proceed!


******************************************************
These are the remains of my code. But please use with care. It is an endless
loop so you have to stop it manually and if it runs to long, it will have an
enormously amount of data in the pipe (this is because in original script
the line to be sent is assembled after a line of a logfile is read via tail.
In this version there is always the same line to be send, which is very fast
and does never have a break between lines).

Sadly I must say I did not yet received the error with this...
As logging of the script is nearly the same in my big one, I'd like to show
you what I got yesterday:
my_log>>>>>
Wed Jan 4 13:22:17 2006 script started 24687
[...]
Wed Jan 4 13:49:56 2006 log reader started for 3070: 14508
Wed Jan 4 19:40:27 2006 Server 3201: Missing message between 54 and 56
Wed Jan 4 19:40:27 2006 Server 1665: Missing message between 38 and 40
Wed Jan 4 19:40:30 2006 Server 4002: Missing message between 9 and 11
Wed Jan 4 19:40:40 2006 Server 3022: Missing message between 52 and 61
Wed Jan 4 19:55:25 2006 Server 3025: Missing message between 70 and 74
<<<<<

script>>>>>>>>>>>>>>>>>>>>>>>
#!/usr/bin/perl

use strict;
use warnings;

use IO::pipe;
$|++;

####################################
##### Variable's declaration #####
####################################
my %avl_server_ports; # saves all servers for which we have
data incl. their portsprocess)
# first get the updated server list from db[...]
# to change number of created children change loop condition here
for(1..30){
$avl_server_ports{$_} = $_;
}

my $server; # name of the server to be processed (for
child processes)
my %proc_counter; # ID for the procs (used in child procs)

my $logFile; # contains the name of the logfile for this
script

my $msid = 1; # the message id used on child-side
my %mshash; # actual message id per server used on father
side

###########################
##### Main programm #####
###########################

writeLog('create');

# creation of pipes used to communicate with cmd-child and log-file readers
my $pipe_log = new IO::pipe();


# start the initial creation of all log-file-readers
syncLFR();
$pipe_log->reader();

# starting of working routine
while(1){
# do some other things...

# read the next info about processes
logProcessing();
}


##### this is the task of a log-file-reader #####
###################################################
sub logReader{
my $server = shift;
writeLog("log reader started for $server: $$");
$pipe_log->writer();

# and ensure not to buffer
$pipe_log->autoflush(1);

my $sentence = "2005/08/16 02:00:44 pid 27009 compute end 2s 90+30us
211+0io 0+0net 0k 103pf\n";

while(1){
sendWithMsId($server.'::'.$sentence);
}# end of while 1
} # end of sub: logReader


##### fathers routine to get the processes out of the pipe and handle them
correct #####
##########################################################################################
sub logProcessing{
my $line = '';
my ($msg, $server);
eval {
local $SIG{ALRM} = sub { die "lesedauer" };
alarm(1);
$line = <$pipe_log>;
alarm(0);
};

return if ($line !~ /::/);
($msg, $server, $line) = split(/::/, $line);

if(!exists $mshash{$server}){
$mshash{$server} = $msg;
}else{
if($msg != $mshash{$server} +1){
writeLog("Server $server: Missing message between $mshash{$server} and
$msg");
}
$mshash{$server} = $msg;
}
} # end of sub: logProcessing



##### write the log file #####
###########################################
sub writeLog{
my ($msg) = @_;

# IF YOU DO NOT WANT TO HAVE A PRINT IN YOUR COMMAND LINE
# YOU CAN CHANGE COMMENTS HERE TO WRITE A LOGFILE INSTEAD

if($msg eq 'create'){
# #my $day = getTodaysDate();
# my $day = '2006/01/05';
# $day =~ s/\///g;
# my $count = 1;
#
# open(LS, "2>&1 ls log/log$day.* | ");
# while(<LS>){
# if(my ($c) = /log$day.(\d+)/){
# $count++ if($c == $count);
# }
# }
# close(LS);
# $logFile = "log/log".$day.".".$count;
# open(LOG, ">$logFile");
# print LOG localtime(time) . " script started $$\n";
# close(LOG);
print localtime(time) . " script started $$\n";
}
# write the message in the logfile
else{
# open(LOG, ">>$logFile");
# print LOG localtime(time) . " " . $msg . "\n";
# close(LOG);
print localtime(time) . " " . $msg . "\n";
}
}

##### sends messages to the father and adds a message ID #####
##############################################################
sub sendWithMsId{
my $info = shift;

print $pipe_log $msid.'::'.$info;
$msid++;
}

##### Synchronization of log-file-readers according to avl. servers #####
###########################################################################
sub syncLFR{

# start lfr for all servers, which were not in the list already
foreach my $newServer (keys %avl_server_ports){
my $server = $newServer;
my $forked_pid = fork();

# this is the new log-file-reader
if(!$forked_pid){
logReader($server);
exit(0);
}

}
} # end of sub: snycLFR
 
B

Brian McCauley

On linux, this doesn't work. I suspect the problem is not that it doesn't
support flock()ing on pipes per se, but rather that it doesn't support
locking on handles inherited across a fork, regardless of whether those
handles are to pipes or to files.

D'oh! I knew that, really.
 
X

xhoster

Genevieve S. said:
Hello again,

I cannot express this in seconds or anything but as reading from any pipe
blocks until there is anything to read,

Not if you use nonblocking approaches. That is painful to program, but
generally quite fast to run.
I do not have a clue how this could work, so I am willing to believe
xhoster that this does not really work with Linux (which I use). But this
will work with named pipes - so maybe I will just switch that type to see
if the error still exists.

Please let us know if it works.
That first sentence is not making an easy day :) I had a look again to
IO::Select - what I tried to use to find out if my pipe is readable
(which was quite nonsense, as there is a difference between the shown
'open to read' and the searched 'something in to be read').

I don't quite understand what this statement means.

If can_read reports it is readable, then it should be readable. The
problem is that you need to use read, not readline, to read it, and then do
your own buffering and end-of-message detection.

Xho
 
X

xhoster

Genevieve S. said:
I choped it as much as it was possible, but it did not yet show up the
error. May be I cut too much, but as the last run of my big script was
fine for 6 hours before the error appeared I think it is just not easy to
reproduce the error.

Bummer. That always makes for hard to find problems.
Anyway I put some code at the end of this message, which should run and
show at least how I work to send a message and use the pipe.

In your sample code, the message is always short. Is it always about that
size in the real code?

....
******************************************************
These are the remains of my code. But please use with care. It is an
endless loop so you have to stop it manually and if it runs to long, it
will have an enormously amount of data in the pipe

That isn't how pipes work. There is a fixed amount of buffer (probably 4K
on your system) for the pipe. If the reader can't keep up with the
writer(s), then that buffer gets full, and writers will block until the
reader has a chance to catch up, making more room in the buffer.

(this is because in
original script the line to be sent is assembled after a line of a
logfile is read via tail. In this version there is always the same line
to be send, which is very fast and does never have a break between
lines).

This non-CPU boundness of the real code might be the problem. See comments
below near the eval.
Sadly I must say I did not yet received the error with this...
As logging of the script is nearly the same in my big one, I'd like to
show you what I got yesterday:
my_log>>>>>
Wed Jan 4 13:22:17 2006 script started 24687
[...]
Wed Jan 4 13:49:56 2006 log reader started for 3070: 14508
Wed Jan 4 19:40:27 2006 Server 3201: Missing message between 54 and 56
Wed Jan 4 19:40:27 2006 Server 1665: Missing message between 38 and 40
Wed Jan 4 19:40:30 2006 Server 4002: Missing message between 9 and 11
Wed Jan 4 19:40:40 2006 Server 3022: Missing message between 52 and 61
Wed Jan 4 19:55:25 2006 Server 3025: Missing message between 70 and 74

I would make it log the just-received line in it's entirety, not only the
serial number of that line. In fact, the main program should probably
store not just the previous message-id, but the entire previous message,
for each "server". Then, when there seems to be a gap, you can print out
the entire message before, and the entire message after, the gap. That
could help a lot in debugging.
sub logProcessing{
my $line = '';
my ($msg, $server);
eval {
local $SIG{ALRM} = sub { die "lesedauer" };
alarm(1);
$line = <$pipe_log>;
alarm(0);
};

I don't understand the point of this time out. If the timeout occurs,
you never complain about it or even detect it. You merely start another
call of the same thing that just timed out. If all you do upon a timeout
is restart the same thing, why bother timing out?

Anyway, if the alarm goes off after the next line is read from $pipe_log,
but before the "alarm(0)" gets executed, then you die out of the eval{} and
the line you just read from $pipe_log gets ignored. That could account for
your missing data!
return if ($line !~ /::/);

Wouldn't it be better to die upon malformed data?
($msg, $server, $line) = split(/::/, $line);

if(!exists $mshash{$server}){
$mshash{$server} = $msg;
}else{
if($msg != $mshash{$server} +1){
writeLog("Server $server: Missing message between $mshash{$server} and
$msg");

I would add $line, if not $mshash2{$server}, to the logged info.
}
$mshash{$server} = $msg;

$mshash2{$server} = $line; # so prior line can also be logged
}
} # end of sub: logProcessing


Xho
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Genevieve S.
Hello,

I use the module IO::pipe to have many process-childen communicate to their
father. As they are all sending the same kind of information (and the father
has not the time to listen to a pipe for each of the processes), only one
pipe-object is used for it. Everything seemed to be ok, but then I noticed a
loss of data - a small one but at least a loss.

Are the reports self-delimited? I assume they are...

It looks like you assume that writing to a pipe is atomic; in other
words, if you syswrite() a chunk to a pipe, then this chunk is
appearing "connected" when read from the pipe, even if multiple childs
write these chunks "simultaneously".

a) I can't recall any OS claiming this a guaranteed behaviour; but I
did not ever try looking hard ;-);

b) this is definitely not true if you use buffered output to write
to the pipe (i.e., print() from Perl).

So the first thing is to change print() to syswrite(). The second one
is to find what POSIX says about "a".

Hope this helps,
Ilya
 
B

Brian McCauley

Ilya said:
It looks like you assume that writing to a pipe is atomic;
a) I can't recall any OS claiming this a guaranteed behaviour; but I
did not ever try looking hard ;-);
So the first thing is to change print() to syswrite(). The second one
is to find what POSIX says about "a".

I don't know about POSIX but "The Single UNIX Specification, Version 2"
(which is more or less the same thing) says...

Write requests of {PIPE_BUF} bytes or less will not be interleaved with
data from other processes doing writes on the same pipe. Writes of
greater than {PIPE_BUF} bytes may have data interleaved, on arbitrary
boundaries, with writes by other processes, whether or not the
O_NONBLOCK flag of the file status flags is set.

http://www.opengroup.org/onlinepubs/007908799/xsh/write.html

I don't know if POSIX (er, SUSv2) says anything about the minimum
permissable value for PIPE_BUF.

On Linux...

$ perl -MPOSIX -e'print POSIX::pIPE_BUF,"\n"'
4096
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Brian McCauley
I don't know about POSIX but "The Single UNIX Specification, Version 2"
(which is more or less the same thing) says...
Write requests of {PIPE_BUF} bytes or less will not be interleaved with
data from other processes doing writes on the same pipe.

Good, I was afraid that it is not PIPE_BUF, but the amount of space
currently available in the pipe buffer. It looks like it flushes the
buffer before write if not enough space is currently available in the
buffer.

In this setting, the problem of the OP is solvable: since (as you had
shown) this limit, PIPE_BUF, is calculatable at runtime, he needs
reports broken into self-delimited chunks of size PIPE_BUF or less
(with a chunk number and PID/TID embedded into the chunk if more than
one chunk may be ever needed). The recieving end should assemble
stuff from the chunks.

I wonder whether this is already available as an encapsulated API?

Thanks,
Ilya
 
I

Ina Scherf

Hi again and sorry for the delay,


In your sample code, the message is always short. Is it always about that
size in the real code?
No. Messages vary in their length but can reach up to 2000 characters. The
shortest might be about 20 characters I think. I once had the problem that
everything over 2500 gets cut so I tried to keep this down to 2000. But
anyway the error I got there was a different.
There is a fixed amount of buffer (probably 4K
on your system) for the pipe. If the reader can't keep up with the
writer(s), then that buffer gets full, and writers will block until the
reader has a chance to catch up, making more room in the buffer.
Ok, thanks for this information, I did not knew about that. I was just
thinking if this might cause the error. But your explanation gets me to know
this is not the case.
I would make it log the just-received line in it's entirety, not only the
serial number of that line. In fact, the main program should probably
store not just the previous message-id, but the entire previous message,
for each "server". Then, when there seems to be a gap, you can print out
the entire message before, and the entire message after, the gap. That
could help a lot in debugging.
After now switching to a named pipe I started with printing more info to my
log. The line before is always complete and does not show any smithereens.
I don't understand the point of this time out. If the timeout occurs,
you never complain about it or even detect it. You merely start another
call of the same thing that just timed out. If all you do upon a timeout
is restart the same thing, why bother timing out?
This seems to be obsolete because I cut away code. Normally the father
process shall do three things parallel in its routine. One of them has to do
with user input to another script and sending data. So I wanted to reduce
wating time at this point (although I believe this pipe is _never_ empty for
over one second).
Anyway, if the alarm goes off after the next line is read from $pipe_log,
but before the "alarm(0)" gets executed, then you die out of the eval{}
and
the line you just read from $pipe_log gets ignored. That could account
for
your missing data!
But should not be at least half a line in $line then? I changed my log to
have the old line and the new line. No half lines so far. Either there is
the whole line or nothing in it. And it is very easy to check if the line is
not complete as the last thing to be transfered is a dumped object and this
gives an error on every missing character at the end.
Wouldn't it be better to die upon malformed data?
Yes - you're completely right :)
I would add $line, if not $mshash2{$server}, to the logged info.
I do both right now.


Here is just a short update on what I found out right now:

I checked the length of the message I send from the child (first parameter
is now the length of message followed by a delimiter, father reads this out
first and checks if this is correct). All messages send match this length.
So it should not be a line cut through by anything.

The error occured (on thursday) when I only had _one_ child writing to the
pipe. To have a number it happens between message nr 327059 and 327061 so it
took very long to occur. Well, sadly the error did not appear while I am
writing this (because I made some additional change on the log).

I give out either the last line that was send sucessfully and the new line
after the missing one. Both seem to be just normal - nothing is missing.

Thanks for your help!
 
X

xhoster

This seems to be obsolete because I cut away code. Normally the father
process shall do three things parallel in its routine. One of them has to
do with user input to another script and sending data. So I wanted to
reduce wating time at this point (although I believe this pipe is _never_
empty for over one second).

If it is never empty for more than a second, then it should be safe to
remove the timeout :) At least for debugging purposes.

Or at least do a
die "$@: $line:" if $@;
(or a warn, or a logging command)
after the end of the eval block.
But should not be at least half a line in $line then?

Maybe, maybe not. I think that $line = <$pipe_log>; is executed first as a
read from pipe_log into some anonymous storage, then as an assignment from
that storage into $line. If the interuption happens after (or during)
the read but before the assingment, then $line will show no hint of the
missing data, (other than being empty, which event is silently ignored in
the code as you originally posted it.)
I changed my log to
have the old line and the new line. No half lines so far. Either there is
the whole line or nothing in it. And it is very easy to check if the line
is not complete as the last thing to be transfered is a dumped object and
this gives an error on every missing character at the end.

Yes - you're completely right :)

Have you seen the error after making this change? (This is where you
silently ignore empty lines, which, in conjuction with your alarm time-out,
might be the cause of the missing data).

.....
I give out either the last line that was send sucessfully and the new
line after the missing one. Both seem to be just normal - nothing is
missing.

This definitely makes me thinkg the problem is with the time-out, and not
with the pipe itself.

Xho
 
G

Genevieve S.

Maybe, maybe not. I think that $line = <$pipe_log>; is executed first as
a
read from pipe_log into some anonymous storage, then as an assignment from
that storage into $line. If the interuption happens after (or during)
the read but before the assingment, then $line will show no hint of the
missing data, (other than being empty, which event is silently ignored in
the code as you originally posted it.)
Ok, now that you say this, it sounds completely logical.
This definitely makes me thinkg the problem is with the time-out, and not
with the pipe itself.
You're right.

I changed that evil eval thing. First left it out completely and then
replaced it with something different. It now runs for full 24 hours (I
wanted to wait for that with my answere to be sure) not telling me about
loosing even a single line... So not the pipe causes the error (btw it now
runs with a named pipe and writers lock that one to use it).

xhoster you'd deserve a least a big hug (yes, in times of remedy my female
attributes break through)! Thank you so much.
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to

....

Maybe, maybe not. I think that $line = <$pipe_log>; is executed first as a
read from pipe_log into some anonymous storage, then as an assignment from
that storage into $line. If the interuption happens after (or during)
the read but before the assingment, then $line will show no hint of the
missing data, (other than being empty, which event is silently ignored in
the code as you originally posted it.)

My impression was that signals are delivered at statement boundaries
(with the new, unreliable, signal model). So this should not be
applicable to 5.8 or some such.

Puzzled,
Ilya
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top