Read socket using both <> and sysread()

Y

Yohan N. Leder

Hello. In the framework of a dialog with an ESMTP server, I would like
to alternate some incoming socket reading using <> (when server replies
with a single line) and sysread() (when server replies with several
lines), is there something to take care with this kind of alternance ?

Here a piece of typical part (without response codes checking) :

use Socket;
[here code to connect using two sockets : SOCKET_IN and SOCKET_OUT]
select(SOCKET_IN); $| = 1; select(SOCKET_OUT); $| = 1; select(STDOUT);
my $resp = <SOCKET_IN>; # wait server be ready
syswrite(SOCKET_OUT, "EHLO test\cM\cJ", 11); # engage dialog
sysread(SOCKET_IN, $resp, 1024); # fetch a multi-lines response
syswrite(SOCKET_OUT, "AUTH PLAIN\cM\cJ", 11); # engage authentication
$resp = <SOCKET_IN>; # wait server ask for username
[...]
 
U

Uri Guttman

YNL> Hello. In the framework of a dialog with an ESMTP server, I would like
YNL> to alternate some incoming socket reading using <> (when server replies
YNL> with a single line) and sysread() (when server replies with several
YNL> lines), is there something to take care with this kind of alternance ?

do you know the difference between <> and sysread? the docs or faq
(not sure where ATM) strongly state that you should never mix them on a
single handle unless you are extremely sure of what you are doing. the
problem is that <> uses stdio which will buffer input from the handle
and then return only one line to you. a following sysread won't (or may
not) find any data to read as it was already read into a buffer by
stdio. the best way to do this is to only use sysread and do your own
buffering and then parse out a line from the beginning of the buffer
when you need one. this isn't too hard to do but it does require some
code beyond just a simple sysread call.

uri
 
X

xhoster

Yohan N. Leder said:
Hello. In the framework of a dialog with an ESMTP server, I would like
to alternate some incoming socket reading using <> (when server replies
with a single line) and sysread() (when server replies with several
lines), is there something to take care with this kind of alternance ?

Why not just use read() instead of sysread()?

Xho
 
U

Uri Guttman

x> Why not just use read() instead of sysread()?

that works but since the api is the same, all you get is more buffering
with read with no benefits. sockets already have buffering so there is
no need for more in the user space. having done tons of c and socket
stuff before, sysread is my choice as it is just a wrapper around c's
read.

but maybe your point is mixing read with <> and that could work if done
carefully (it is more a protocol and implementation problem then). it is
just not my style so i would never do that.

uri
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Uri Guttman
do you know the difference between <> and sysread? the docs or faq
(not sure where ATM) strongly state that you should never mix them on a
single handle unless you are extremely sure of what you are doing. the
problem is that <> uses stdio which will buffer input from the handle

I must object to (details of) this statement:

a) AFAIU: most probably, stdio is not going to be used with
contemporary Perls (should be enabled by a compile option of perl);
nowadays Perl uses its own (and very buggy) buffering routines;

b) Perl core knows perfectly well when an unbuffered operation follows
a buffered one (same as for read/write mixes).

So it is a bug in PerlIO that the necessary seek()s needed on such a
boundary are not performed.

Hope this helps,
Ilya
 
U

Uri Guttman

IZ> [A complimentary Cc of this posting was sent to
IZ> Uri Guttman

IZ> I must object to (details of) this statement:

IZ> a) AFAIU: most probably, stdio is not going to be used with
IZ> contemporary Perls (should be enabled by a compile option of perl);
IZ> nowadays Perl uses its own (and very buggy) buffering routines;

makes no difference from the point of view of <> vs sysread. i don't
care if it is stdio or perlio, they both buffer.

IZ> b) Perl core knows perfectly well when an unbuffered operation follows
IZ> a buffered one (same as for read/write mixes).

IZ> So it is a bug in PerlIO that the necessary seek()s needed on such a
IZ> boundary are not performed.

hmm, you should know better. this is doing reads on a SOCKET. seek
doesn't apply there so perl would have to trap any sysread calls and
check if there is anything in the buffer and return that and possibly
even do more kernel reads to satisfy the read request size. sounds like
something only you would like. i will stick to pure sysread and do my
own buffering. been doing that for 25 years so it is second nature.

uri
 
B

Ben Morrow

Quoth (e-mail address removed):
Why not just use read() instead of sysread()?

Or if you're going down that route, why not just set $/ = \1024; when
you want a block read and always use <> ?

Ben
 
X

xhoster

Ben Morrow said:
Quoth (e-mail address removed):

Or if you're going down that route, why not just set $/ = \1024; when
you want a block read and always use <> ?

I'd find it a bit confusing to be constantly changing $/ back and
forth between "\n" and \1024. If I could just set it to one thing
at the top of the program and forget it, then that that is what I would do.
But I would rather use read() than have to keep changing $/ back and forth.

Xho
 
X

xhoster

Uri Guttman said:
x> Why not just use read() instead of sysread()?

that works but since the api is the same, all you get is more buffering
with read with no benefits. sockets already have buffering so there is
no need for more in the user space.

I'm not sure I understand this. Is this special to sockets? What form of
IO *doesn't* already have some buffering at the system level?

Anyway, additional user-space buffering does seem like it makes things
faster for sockets (as well as for regular files) with small reads, but I
don't know if the difference is meaningful for real-world cases.
time perl -le 'use IO::Socket;
my $x=IO::Socket::INET->new("localhost:9871") or die $@;
$y+=read $x,$buf,10 foreach 1..10_000_000;
print $y'
100000000
3.694u 0.138s 0:03.84 99.4% 0+0k 0+0io 0pf+0w
time perl -le 'use IO::Socket;
my $x=IO::Socket::INET->new("localhost:9871") or die $@;
$y+=sysread $x,$buf,10 foreach 1..10_000_000;
print $y'
100000000
4.097u 5.682s 0:09.80 99.6% 0+0k 0+0io 0pf+0w

having done tons of c and socket
stuff before, sysread is my choice as it is just a wrapper around c's
read.

but maybe your point is mixing read with <> and that could work if done
carefully (it is more a protocol and implementation problem then).

Yep, that is what I meant.
it is
just not my style so i would never do that.

So then maybe you are the wrong person to ask, but why does Perl's read
exist?

Was it intended to be just like sysread only doesn't intefere with <>?

Or was it intended to be just like sysread only faster because it uses
an extra layer of buffering?

Is the fact that read will block until the requested size (or eof) is read
(as opposed to sysread, which blocks until at least 1 byte is read, but
after that will return a partial buffer if need be rather than blocking) a
real feature or a malfeature or a bug?

Xho
 
U

Uri Guttman

x> Why not just use read() instead of sysread()?
x> I'm not sure I understand this. Is this special to sockets? What form of
x> IO *doesn't* already have some buffering at the system level?

but why have system buffering in the socket AND also user process
buffering via stdio (or perlio)? that just means extra copying of all
the data.

x> Anyway, additional user-space buffering does seem like it makes things
x> faster for sockets (as well as for regular files) with small reads, but I
x> don't know if the difference is meaningful for real-world cases.
x> my $x=IO::Socket::INET->new("localhost:9871") or die $@;
x> $y+=read $x,$buf,10 foreach 1..10_000_000;
x> print $y'
x> 100000000
x> 3.694u 0.138s 0:03.84 99.4% 0+0k 0+0io 0pf+0w
x> my $x=IO::Socket::INET->new("localhost:9871") or die $@;
x> $y+=sysread $x,$buf,10 foreach 1..10_000_000;
x> print $y'
x> 100000000
x> 4.097u 5.682s 0:09.80 99.6% 0+0k 0+0io 0pf+0w

10 bytes is not a typical size read on sockets for the kind of
applications i do or see. sure local buffering will help there as it
bypasses the overhead of sysread calls. but with a larger read size
that shouldn't matter as much.

x> So then maybe you are the wrong person to ask, but why does Perl's read
x> exist?

not sure in a way. it really is the opposite of print in most ways (like
sysread/syswrite are opposites) so it is there to provide a sized read
on the stdio (or perlio to appease ilya. as i said, from this point of
view they are the same - just user space buffered i/o).

x> Was it intended to be just like sysread only doesn't intefere with <>?

x> Or was it intended to be just like sysread only faster because it uses
x> an extra layer of buffering?

all are reasonable answers but as i said, i wouldn't expect as much (and
you got very little! :) speedup with larger read sizes.

x> Is the fact that read will block until the requested size (or eof)
x> is read (as opposed to sysread, which blocks until at least 1 byte
x> is read, but after that will return a partial buffer if need be
x> rather than blocking) a real feature or a malfeature or a bug?

read will restart if it was interrupted by a signal and sysread will
return a partial read or some EINTR error instead.

sysread will block on its read size unless you set the socket to
non-blocking mode. this is defined in the unix read call. you should
never use perl's read on an non-blocking socket since typical buffered
i/o (like stdio or perlio) expect a blocking socket. another reason i
use sysread, is that stem uses non-blocking sockets.

uri
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Uri Guttman
IZ> b) Perl core knows perfectly well when an unbuffered operation follows
IZ> a buffered one (same as for read/write mixes).

IZ> So it is a bug in PerlIO that the necessary seek()s needed on such a
IZ> boundary are not performed.
hmm, you should know better.

I do.
this is doing reads on a SOCKET. seek doesn't apply there

seek() applies to any buffered operation.
so perl would have to trap any sysread calls

It should not "trap" anything. It is Perl who initiates them, not
some third party.
and check if there is anything in the buffer and return that
Right.

and possibly even do more kernel reads to satisfy the read request size.

No need to. Partial read is OK.
sounds like something only you would like.

Right; so it is because of me these questions appear again and again...
i will stick to pure sysread and do my
own buffering. been doing that for 25 years so it is second nature.

Sure; feel free to do your own buffering - on top of Perl's one,
(possibly) on top of CRTL one, on top of OS' one.

Yours,
Ilya
 
X

xhoster

Uri Guttman said:
x> Is the fact that read will block until the requested size (or eof)
x> is read (as opposed to sysread, which blocks until at least 1 byte
x> is read, but after that will return a partial buffer if need be
x> rather than blocking) a real feature or a malfeature or a bug?

read will restart if it was interrupted by a signal and sysread will
return a partial read or some EINTR error instead.

sysread will block on its read size unless you set the socket to
non-blocking mode.

That doesn't seem to be the case in my hands.
My localhost:9871 server now prints 100 bytes to the socket every 8
seconds.
perl -le 'use IO::Socket;
my $x=IO::Socket::INET->new("localhost:9871") or die $@;
$SIG{ALRM}=sub{alarm 5}; alarm 5;
print sysread ($x,$buf,75), q": $! is:", $! foreach 1..1000;
'
75: $! is:
25: $! is:
: $! is:Interrupted system call
75: $! is:
25: $! is:
: $! is:Interrupted system call
: $! is:Interrupted system call
75: $! is:
25: $! is:

The first 75 and 25 bytes reads are instant (the 75 because that is what
you asked for, the 25 because that is all that is left of the 100 printed
by the other end), then a 5 second pause until the alarm times out giving
us the interupted system call, then 3 seconds until the next 75 and 25
reads. So it will only block on the first byte, not for the full read size.

Of course by turning on non-blocking mode by using O_NONBLOCK, then it
doesn't block even for the first byte.

Xho
 
B

Ben Morrow

Quoth (e-mail address removed):
That doesn't seem to be the case in my hands.
The first 75 and 25 bytes reads are instant (the 75 because that is what
you asked for, the 25 because that is all that is left of the 100 printed
by the other end), then a 5 second pause until the alarm times out giving
us the interupted system call, then 3 seconds until the next 75 and 25
reads. So it will only block on the first byte, not for the full read size.

Yes. That is what read(2) (and SUS) says it does. If you want reliable
behaviour you want to use O_NONBLOCK and select/poll anyway, and deal
with EINTR.

Ben
 
Y

Yohan N. Leder

Hello. In the framework of a dialog with an ESMTP server, I would like
to alternate some incoming socket reading using <> (when server replies
with a single line) and sysread() (when server replies with several
lines), is there something to take care with this kind of alternance ?

Hi all. Sorry to reply here in the thread (I mean : to myself), but I've
experienced problem with my newsreader and/or news server these last
days and just seen all of your replies today only (oops) ; using another
news server and reader from another station... Also, since I've now
solved my problem without being aware of your opinions, it becomes
unuseful to reply back to every of your posts.

Well, I've well understood the difference between <> (which wait for and
read a one line only) and sysread (which read a certain amount of data)
and the buffer considerations. Also as some of you said : it's possible
to mix both method in a context where we well know what is returned by
server : which is the case in an SMTP communication. Because of this,
I've kept this mixing way and keep an eye on the behavior of this part
of the script : however, it would be quite easy to switch all <> to a
sysread() call a day or another if useful in practice.

Thanks again.
Now, I have to figure-out if its my newsreader or the news server I'm
going through which has problem these days.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top