IO::Select and PerlIO

Peter J. Holzer · Nov 19, 2012

Does IO::Select take into account data buffered by PerlIO?

Specifically, I would like to do something like the following:

my $s = IO::Select->new();
$s->add($socket_fh);

while (...) {
print $socket_fh "$request\n";

while ($s->can_read(0)) {
my $response = <$socket_fh>;
# do something with $response
}
}

to exploit pipelining in a protocol.

This wouldn't work with stdio and select(2) in C, but in C, the system
call select is at a lower layer than the stdio and cannot know about
stdio buffers, while in Perl, IO::Select works on filehandles so one
could hope that it is smart enough to know about them. I can't find
anything about that in the docs, though.

(perldoc -f select says you have to use sysread, but select uses fd
numbers, not filehandles)

hp

Rainer Weikusat · Nov 19, 2012

Ben Morrow said:
Quoth "Peter J. Holzer" <[email protected]>:
[...]

Your options here are:

- Use sysread/write, as recommended in the docs.
- Push a :unix layer, so that print and <> do unbuffered IO. You
will have to be aware of this, and handle the buffering yourself
as needed (you will almost certainly want to do block reads rather
than line reads, for instance).

In all cases that select loop is not sufficient: you are not checking
for writability before writing.

This doesn't really make sense: Except in unusual cases (eg, when the
code does something like 'sending the contents of a large file'), the
socket will always be writeable and checking for this is a waste of
time. The better idea is usually to try to write something and check
for 'did this become writeable' only if a pending write could not be
completed.

[...]

IO::Select is a very thin layer around the core select(): read the
source. All it does is handle the vec()s for you and maintain a map from
fds back to filehandles.

Using 'automatic I/O buffering' together with 'sockets' (or any other
kind of IPC channel) also doesn't make sense: Usually, the protocol
dictates when data needs to be written or can be read and this implies
that any 'hidden buffers' need to be flushed whenever data has to be
sent. Since Perl supports automatic memory management, constructing a
complete message in memory before sending and then using syswrite to
actually send it (instead of copying it into and application buffer
and then flushing that so that it gets copied into a kernel buffer)
shouldn't be difficult.

Xho Jingleheimerschmidt · Nov 19, 2012

No. It can't, because the underlying syscall can't.

So then don't make that underlying syscall if/when you don't need to.
Just because they chose not to implement it that way doesn't mean that
it can't be done.

That has always bothered me about IO::Select. An object-oriented
interface shouldn't just be an alternative spelling of an underlying
syscall.

Xho

Peter J. Holzer · Nov 19, 2012

No. It can't, because the underlying syscall can't.

That's like saying <> can't read line by line because the underlying
syscall can't.

This loop never blocks (assuming your filehandles are nonblocking). You
have to block somewhere, usually in select, or you'll just spin round
the loop doing nothing and wasting CPU.

No. When $s->can_read returns true, the loop does something: It reads
the next response and processes it. When there is nothing to do, the
loop is aborted. So it can't spin around doing nothing.

If the client is consistently faster than the server the outer loop will
eventually block on print.

Your options here are:

- Use sysread/write, as recommended in the docs.

Yes. I knew about this one.

- Push a :unix layer, so that print and <> do unbuffered IO.

That's an idea. Unbuffered I/O for <> probably means 1 byte reads which
doesn't sound appealing (I'm on a Gbit LAN, so the latency I'm trying to
avoid is actually less than a millisecond per request).

You will have to be aware of this, and handle the buffering
yourself as needed (you will almost certainly want to do block
reads rather than line reads, for instance).

If I do that I can use sysread.

In all cases that select loop is not sufficient: you are not checking
for writability before writing.

This is intentional: I want the process to block when the write buffer
becomes full. With the amount of buffering in the network layer and
given that I always read all received responses after each request I'm
not worried about deadlocks.

You need to buffer writes as well, and keep trying to write what's
left in the buffer until it's all been written.

That's what buffered I/O is for ;-).

I'm not sure what you mean here: you are writing and then reading in
lockstep,

No, I'm not. I'm only reading if a response has been received. I can
probably send quite a few requests before the first response trundles
in.

so you won't be pipelining anything.

Yes, I am.

If you want to pipeline you need to build a buffer with several
requests in,

No. There is no reason to build that buffer beforehand.

then loop around a select for both read and write, sending when you
can

Testing whether it is possible to send would be necessary if there was a
possibility of deadlock (both client and server are blocked trying to
write). This is highly unlikely in this case, as requests and responses
are much smaller than the buffer.

and buffering the next response until you've read it all.

IO::Select is a very thin layer around the core select(): read the
source. All it does is handle the vec()s for you and maintain a map from
fds back to filehandles.

That's what I feared. Using sysread isn't a big hassle, but I would have
hoped that they fixed this when they replaced stdio with PerlIO.

hp

Rainer Weikusat · Nov 20, 2012

Ben Morrow said:
Quoth Xho Jingleheimerschmidt <[email protected]>:
[...]

The IO::* classes are all just very thin wrappers around the core
functions. Except in the case of IO::Socket (and *maybe* ::Select),
where the core APIs are rather nasty to get right, I've never seen the
point of them.
[...]

You have to remember these were all written right at the very start of
Perl OO, when the whole *idea* of objects was somewhat
revolutionary.

By the time the 'Perl I/O objects' hit the scene, I had been doing
'object-oriented programming' for something like seven or eight years
and the concept is much older. Even Smalltalk was already a successor
of earlier implementations (IIRC). OTOH, I'd completely agree with the
notion that the Perl 'object-oriented I/O' is essentially 'why does
the dog lick its balls' programming, IOW, "We could do it. So, we
did" (and downright awfully at times, eg, in Graham Barr's poll-module
which mostly introduces the most serious deficiency in the select,
that it destroys the interest after each call, by 'carefully deficient
coding' into poll).

It's hardly surprising they don't always stand up to modern standards of
'good' OO programming (and in fact a little surprising when, sometimes,
they do).

If 'modern standards of OO programming' mean that abstractions for
delayed, block-based I/O intended to make efficient use of block
devices easy, are forcefully introduced into real-time communications
where they have exactly no place, and this even in a language where
the original problem, 'memory management in C is just too hard', does
not and has never existed, they are not particularly modern. Rather
something like "we're mindlessly aping some idea some guy had in 1976
in a completely different context because we don't even understand
that" ...

Rainer Weikusat · Nov 22, 2012

This is broken and will likely just hang (or loop) forever: The remote
server won't send any data until it has received a request and the
intermediate buffering layer won't send any data until the buffer has
been filled, cf the 'webget client' section in perlipc. At the very
least, the handle needs to be configured to flush everything out as
soon as it was written or the calling code needs to do explicit
flushes once a complete request has been copied to the intermediate
buffer (for whatever reason).

Thinking a little more about this, would something like this be
sufficient?

use IO:ending qw/pending_read/;

while (pending_read($socket_fh) || $s->can_read(0)) {

PerlIO provides the information at the C level, so it shouldn't be too
hard to export it to Perl, at least for real PerlIO filehandles. (For
tied filehandles all bets are off.)

This whole approach is fundamentally wrong and there's no point in
trying to make it work somehow: Communication with a process on
another computer somewhere 'on the internet' is something which
happens in 'real time', as opposed to writing data to a block device,
and the best thing the intermediate buffering layer provides for this
case is that it requires an additional copy of all data which is
either sent or received if it is tendered properly, that is, it makes
the programmer work more in order to achieve a technically worse
result.

There's (or used to be) a nice comment somewhere in the Samba sources
which was roughly "We have to get over our fprintf habit!". That
should be authoritative enough even for the most academic academic
mind which will not listen to anything which doesn't come from some
Prof Dr Dr Dr (and will accept everything the latter says, no matter
how nonsensical).

Peter J. Holzer · Nov 22, 2012

Thinking a little more about this, would something like this be
sufficient?

use IO:ending qw/pending_read/;

while (pending_read($socket_fh) || $s->can_read(0)) {

Is this an idea for a new module? IO:

ending doesn't seem to exist.

Yes, that sounds useful.

PerlIO provides the information at the C level, so it shouldn't be too
hard to export it to Perl, at least for real PerlIO filehandles.

I'll have a look at it.

hp

Peter J. Holzer · Nov 22, 2012

This is broken and will likely just hang (or loop) forever: The remote
server won't send any data until it has received a request and the
intermediate buffering layer won't send any data until the buffer has
been filled,

Autoflush is enabled for sockets and has been for a long time.

cf the 'webget client' section in perlipc.

This is out of date.

[Rest deleted. Really, Rainer, you start to sound like Detlef Bosau's
twin brother]

hp

Rainer Weikusat · Nov 24, 2012

Ben Morrow said:
OK, I've uploaded an implementation to github as

https://github.com/mauzo/IO-Select-Buffered

Quote from 'perldoc -f select':

WARNING: One should not attempt to mix buffered I/O (like
"read" or <FH>) with "select", except as permitted by POSIX,
and even then only on POSIX systems. You have to use
"sysread" instead.

Peter J. Holzer · Nov 24, 2012

Cool, thanks.

Quote from 'perldoc -f select':

WARNING: One should not attempt to mix buffered I/O (like
"read" or <FH>) with "select", except as permitted by POSIX,
and even then only on POSIX systems. You have to use
"sysread" instead.

Guten Morgen, Herr Weikusat!

That paragraph was the reason I started the thread and Ben wrote
IO::Select::Buffered specifically to get around that limitation.

hp

Guessing Encodings and the PerlIO layer	2	Jul 27, 2009
IO::Select->can_read returns immediately	2	Jan 27, 2006
IO::Select extension	3	Jan 22, 2005
IO::Select::select() says no readable data even if there are	13	Aug 24, 2006
ActiveState problem using IPC::Open3 with IO::Select	0	Feb 10, 2004
Basic "IO::Select" problem	3	Jul 20, 2003
IO::Select and error_array	0	Feb 15, 2006
$SIG{IO} example	11	Nov 29, 2004

IO::Select and PerlIO

Peter J. Holzer

Rainer Weikusat

Xho Jingleheimerschmidt

Peter J. Holzer

Rainer Weikusat

Rainer Weikusat

Peter J. Holzer

Peter J. Holzer

Rainer Weikusat

Peter J. Holzer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads