Buffered socket I/O with sysread-style blocking?

J

James Marshall

I need to read data from a socket, with sysread-style blocking (i.e. block
if no data available, but return immediately with anything that is
available, possibly less than the requested amount). Is this possible
when using buffered I/O (read(), <S>), or only with non-buffered I/O
(sysread(), recv())?

I can't use non-blocking I/O, because that implies a busy loop when no
data is available (doesn't it?).

I can't use select(), because that won't work with buffered I/O (will
it?).

I can't really use signals, because the program can't really continue
until it has the data.


How does one do this? Am I missing something, or making a bad assumption
somewhere? Has anyone documented the "heavy wizardry" that's required to
use buffered I/O and non-buffered I/O on the same socket? I'd really
rather not rewrite my (large) program to use unbuffered I/O... that could
get messy. Socket flags and fcntl() etc. are allowed, as long as it's
reasonably portable. Standard modules are OK too, but ideally for this
application I don't want to require the user to install extra modules.
Since that's not restrictive enough, ideally I'd also like it to run in
Perl 5.6.1, but I'm flexible on that.

Thanks a lot for any ideas!

James
.............................................................................
James Marshall (e-mail address removed) Berkeley, CA @}-'-,--
"Teach people what you know."
.............................................................................
 
X

xhoster

I need to read data from a socket, with sysread-style blocking (i.e.
block if no data available, but return immediately with anything that is
available, possibly less than the requested amount). Is this possible
when using buffered I/O (read(), <S>), or only with non-buffered I/O
(sysread(), recv())?

I don't think that it is possible with buffered I/O. Why have buffered I/O
if you don't want it to actually do buffering? I guess you could specify a
length of 1 for each read. I don't see what that would get you, but then
again I don't see what you trying to accomplish in the first place.
I can't use non-blocking I/O, because that implies a busy loop when no
data is available (doesn't it?).

I'm sure exactly what you mean by non-blocking I/O. I seems that when
people say in a perl context, they mean that they are using select or
IO::Select, but since you meantion select separately, maybe you don't mean
that.
I can't use select(), because that won't work with buffered I/O (will
it?).

It will under some conditions and/or interpretations. The trick is that
you need to empty the buffer before the next call to select. This results
in semi-nonblocking, where you won't block waiting for the client to start
sending a message, but then once it is started you block until the whole
message is received. Often, this type of thing is good enough. But it
seems like you want the opposite, blocking until a message is started, but
then nonblocking until it is all received. I guess I just don't see the
point in that.
I can't really use signals, because the program can't really continue
until it has the data.

Then why worry about not blocking?
How does one do this? Am I missing something, or making a bad assumption
somewhere? Has anyone documented the "heavy wizardry" that's required to
use buffered I/O and non-buffered I/O on the same socket? I'd really
rather not rewrite my (large) program to use unbuffered I/O... that could
get messy.

Why? If all you use is read, you just need to change each one of them to
sysread. Not a day at the beach, but still not too bad. If you use things
other than read on these handles, what are they?

Xho
 
J

James Marshall

Thanks for the response. Comments interspersed below:


I don't think that it is possible with buffered I/O. Why have buffered
I/O if you don't want it to actually do buffering? I guess you could
specify a length of 1 for each read. I don't see what that would get
you, but then again I don't see what you trying to accomplish in the
first place.

I use the <> input operator (which is buffered) in several different ways.
Also, it's nice to not handle interrupted system calls, which (at least
according to Camel) one must do when using sysread() and presumably
recv().

Reading one character at a time would be too inefficient for this app.

I'm sure exactly what you mean by non-blocking I/O. I seems that when
people say in a perl context, they mean that they are using select or
IO::Select, but since you meantion select separately, maybe you don't
mean that.

By "non-blocking", I mean when the socket has the O_NONBLOCK flag set.

It will under some conditions and/or interpretations. The trick is that
you need to empty the buffer before the next call to select. This
results in semi-nonblocking, where you won't block waiting for the
client to start sending a message, but then once it is started you block
until the whole message is received. Often, this type of thing is good
enough. But it seems like you want the opposite, blocking until a
message is started, but then nonblocking until it is all received. I
guess I just don't see the point in that.

OK, interesting. Thanks for the info.

What I want to do is read and process the (in this case HTML) data as it
comes in, rather than wait for the whole resource to download, or even
wait for a whole buffer block. But if there's no incoming data waiting, I
don't want the program to eat up the CPU in a busy loop-- the program is a
CGI script that may have many instances running simultaneously, and busy
loops are bad style anyway.

Then why worry about not blocking?

Because I *can* process some data from a partial read.

Why? If all you use is read, you just need to change each one of them
to sysread. Not a day at the beach, but still not too bad. If you use
things other than read on these handles, what are they?

That's probably what I'll end up doing, but I don't think the switch is so
simple (is it?). I also might try to read one HTML tag at a time using <>
with $/=='>', though that has pitfalls and is somewhat inefficient.

Besides read(), I use <>. It's handy for line-oriented input. HTTP is an
odd case in that an HTTP message starts with textual line-oriented data,
and then changes to potentially binary data. Those can even be mixed
together, in the case of chunked data.


Thanks,
James
.............................................................................
James Marshall (e-mail address removed) Berkeley, CA @}-'-,--
"Teach people what you know."
.............................................................................
 
A

A. Sinan Unur

What I want to do is read and process the (in this case HTML) data as
it comes in, rather than wait for the whole resource to download, or
even wait for a whole buffer block. But if there's no incoming data
waiting, I don't want the program to eat up the CPU in a busy loop--
the program is a CGI script that may have many instances running
simultaneously, and busy loops are bad style anyway.

You are making a huge assumption that is most likely not warranted.

Sinan
--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
 
A

A. Sinan Unur

On Tue, 28 Mar 2006, A. Sinan Unur wrote:

ASU> ASU>
ASU> > What I want to do is read and process the (in this case HTML) data as
ASU> > it comes in, rather than wait for the whole resource to download, or
ASU> > even wait for a whole buffer block. But if there's no incoming data
ASU> > waiting, I don't want the program to eat up the CPU in a busy loop--
ASU> > the program is a CGI script that may have many instances running
ASU> > simultaneously, and busy loops are bad style anyway.
ASU>
ASU> You are making a huge assumption that is most likely not warranted.

James Marshall said:
I don't understand. What is the assumption I'm making, and
why is it not warranted?

Please do not respond by email to newsgroup posts.

The assumption you are making is that somehow

while ( <$socket> ) {

}

is a busy loop and is going to consume a lot of CPU if it blocks.

I doubt that assumption is correct.

For example:

D:\Home\asu1\UseNet\clpmisc\cs> cat server.pl
#!/usr/bin/perl

use strict;
use warnings;

use IO::Socket;

my $server = IO::Socket::INET->new(
Listen => 5,
LocalAddr => q{127.0.0.1},
LocalPort => 9999,
Proto => 'tcp'
);

my $client = $server->accept;

sleep 100;

__END__

D:\Home\asu1\UseNet\clpmisc\cs> cat client.pl
#!/usr/bin/perl

use strict;
use warnings;

use IO::Socket;

my $client = IO::Socket::INET->new(
PeerAddr => q{127.0.0.1},
PeerPort => 9999,
Proto => 'tcp',
);

print while <$client>;

__END__


On Windows XPSP2, the client.pl CPU usage remains at 0
(whereas server.pl CPU usage for some reason shoots up when
hits the sleep line).

Sinan

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
 
X

xhoster

A. Sinan Unur said:
On Tue, 28 Mar 2006, A. Sinan Unur wrote:

ASU> ASU>
ASU> > What I want to do is read and process the (in this case HTML) data
as
ASU> > it comes in, rather than wait for the whole resource to
download, or
ASU> > even wait for a whole buffer block. But if there's
no incoming data
ASU> > waiting, I don't want the program to eat up the
CPU in a busy loop--
ASU> > the program is a CGI script that may have
many instances running
ASU> > simultaneously, and busy loops are bad
style anyway.
ASU>
ASU> You are making a huge assumption that is most likely not warranted.


The assumption you are making is that somehow

while ( <$socket> ) {

}

is a busy loop and is going to consume a lot of CPU if it blocks.

Your assumption that he is making that assumption is not warranted.
There are two thing we wants to avoid. Ordinary <$socket> does one of them
(block), naive O_NONBLOCK does the other (busy loop)

Xho
 
J

James Marshall

On Wed, 29 Mar 2006, A. Sinan Unur wrote:

ASU> On Tue, 28 Mar 2006, A. Sinan Unur wrote:
ASU>
ASU> ASU> You are making a huge assumption that is most likely not warranted.
ASU>
ASU> James Marshall <[email protected]> to Sinan
ASU>
ASU> > I don't understand. What is the assumption I'm making, and
ASU> > why is it not warranted?
ASU>
ASU> Please do not respond by email to newsgroup posts.
ASU>
ASU> The assumption you are making is that somehow
ASU>
ASU> while ( <$socket> ) {
ASU>
ASU> }
ASU>
ASU> is a busy loop and is going to consume a lot of CPU if it blocks.


I made no such assumption, and I don't know what gave you that impression.

By the way, there are valid reasons for responding to a posting via email.
However, it *is* a violation to post private email to a newsgroup. :)


James
.............................................................................
James Marshall (e-mail address removed) Berkeley, CA @}-'-,--
"Teach people what you know."
.............................................................................
 
A

A. Sinan Unur

Your assumption that he is making that assumption is not warranted.
There are two thing we wants to avoid. Ordinary <$socket> does one of
them (block), naive O_NONBLOCK does the other (busy loop)

Thanks for the clarification. I misinterpreted the issue.

Sinan
--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
 
A

A. Sinan Unur

On Wed, 29 Mar 2006, A. Sinan Unur wrote:

ASU> On Tue, 28 Mar 2006, A. Sinan Unur wrote:
ASU>
ASU> You are making a huge assumption that is most likely not
ASU> warranted.
ASU> James Marshall <[email protected]> to Sinan
ASU>
ASU> > I don't understand. What is the assumption I'm making, and
ASU> > why is it not warranted?
ASU>
ASU> Please do not respond by email to newsgroup posts.
ASU>
ASU> The assumption you are making is that somehow
ASU>
ASU> while ( <$socket> ) {
ASU>
ASU> }
ASU>
ASU> is a busy loop and is going to consume a lot of CPU if it blocks.


I made no such assumption, and I don't know what gave you that
impression.

I misinterpreted your question.
By the way, there are valid reasons for responding to a posting via
email. However, it *is* a violation to post private email to a
newsgroup. :)

OK, you are right. Apologies.

Sinan

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
 
X

xhoster

Thanks for the response. Comments interspersed below:



I use the <> input operator (which is buffered) in several different
ways. Also, it's nice to not handle interrupted system calls, which (at
least according to Camel) one must do when using sysread() and presumably
recv().

Reading one character at a time would be too inefficient for this app.

What you want to do is going to have you processing one character at a time
anyway, if that is what the client sends you. If your client is not
generally going to be sending one character at a time, but rather large
chunks, then what is the harm in blocking until you get one of those
chunks?
By "non-blocking", I mean when the socket has the O_NONBLOCK flag set.

Ah. I think that if you very carefully combine O_NONBLOCK, and read, and
select, that you could arrive at what you want, except that
OK, interesting. Thanks for the info.

What I want to do is read and process the (in this case HTML) data as it
comes in, rather than wait for the whole resource to download, or even
wait for a whole buffer block. But if there's no incoming data waiting,
I don't want the program to eat up the CPU in a busy loop-- the program
is a CGI script that may have many instances running simultaneously, and
busy loops are bad style anyway.


Because I *can* process some data from a partial read.

Sure, you can, but why? Let's say your client sends you a chunk of 100
bytes every second, and that it takes on average 100 nanoseconds per byte
to process.

You wait for a chunk of 1000 bytes.
block 10 seconds, process 100 milliseconds. Total 10.10 seconds.

Alternatively, your desired way:
block 1 second, process 10 millisecond, block 0.99 second, process 10
millisecond, block 0.99 second and so on for 10 chunks. Total of 100
millisecond processing and 9.91 second blocking. Total time of 10.01
seconds.

That seems like an awful lot of work to get that .09 seconds!

OK, so what happens if you have 100 server processes running at once,
doesn't that 0.09 start to add up? Nope. At that point, pure CPU becomes
the bottleneck. If anything, the blocking chunk method will be more CPU
efficient and thus have a higher throughput.

Well, what if I have 1e8 bytes to send (Well, then why the heck are you
dilly dallying around sending only 100 bytes per second?), wouldn't the
difference add up then? Still wouldn't. The 100 milliseconds that the
"chunk of 1000" methods spends processing each chunk (other than the last
on) do not count, as they just come out of the blocking time of the next
chunk. So only the last chunk of 1000 gets the 0.09 second penalty.

It seems like it would take some extremely unusual circumstances to make
your semi-blocking method turn out to be more than trivially better. Of
course, maybe you do in fact face extremely unusual circumstances--if so
sorry for the lecture.

That's probably what I'll end up doing, but I don't think the switch is
so simple (is it?).

Not quite so simple. If I were determined to do this, I would probably
change read() to my_read(). Then define my_read to do the sysread, plus
whatever restarting-upon-interupts and error checking is necessary. The
bigger wrinkle is that, since you also use <>, you would need a different
set of special my_readline() code to replace that. And my_read() and
my_readline() would probably need to share state with each other.

I also might try to read one HTML tag at a time
using <> with $/=='>', though that has pitfalls and is somewhat
inefficient.

Besides read(), I use <>. It's handy for line-oriented input. HTTP is
an odd case in that an HTTP message starts with textual line-oriented
data, and then changes to potentially binary data. Those can even be
mixed together, in the case of chunked data.

Have you looked into using LWP::parallel to handle both sending and
efficiently receiving the requests?

Xho
 
U

Uri Guttman

x> Your assumption that he is making that assumption is not warranted.
x> There are two thing we wants to avoid. Ordinary <$socket> does one of them
x> (block), naive O_NONBLOCK does the other (busy loop)

how does O_NONBLOCK mean busy loop? the OP knows that select loops exist
but my impression is that he doesn't know how they work. in fact they
are easy to use and can work with buffered i/o as long as you only do
buffered reads using <>. mixing <> and sysread is the danger, not the
blocking or select loop stuff.

also the OP mentioned http can send binary after text? from what i know
http only handles text and all binary data (post params or uploaded
files) are encoded. so he can still read that data with <>. now as to
why he is reading http data himself is another question. so many of the
things the OP mentioned were all over the map and show a deep
misunderstanding of net protocols, sockets, buffering, event/select
loops and more.

uri
 
X

xhoster

Uri Guttman said:
x> Your assumption that he is making that assumption is not warranted.
x> There are two thing we wants to avoid. Ordinary <$socket> does one
of them x> (block), naive O_NONBLOCK does the other (busy loop)

how does O_NONBLOCK mean busy loop?

By using it naively. (That is why I used the word naive) :)
the OP knows that select loops exist
but my impression is that he doesn't know how they work. in fact they
are easy to use and can work with buffered i/o as long as you only do
buffered reads using <>.

Right, but they are only semi-nonblocking (and the semi part of it is not
in the way the OP wants, or thinks we wants.)
mixing <> and sysread is the danger, not the
blocking or select loop stuff.

<> isn't the only thing he uses, he also uses read, which does not
also the OP mentioned http can send binary after text? from what i know
http only handles text and all binary data (post params or uploaded
files) are encoded. so he can still read that data with <>.

You can read either text or binary data with <>. With binary data, you
probably shouldn't chomp, at least not if $/ is "\n". Other than that, it
doesn't make much difference, except maybe if you turn on utf or maybe
on Windows. Binary data might go for hundreds of megabytes without using a
\n (or whatever $/ is), but http data can do that to. If this lack of
line-orientation is what makes binary data binary, then http is binary.
One the other hand, if it just 8-bit versus 7-bit, it seems like http can
produce data that is binary in that regard to (whether it is supposed to do
that or not, may be a different matter, but experimentally, it can do
that.)

now as to
why he is reading http data himself is another question. so many of the
things the OP mentioned were all over the map and show a deep
misunderstanding of net protocols, sockets, buffering, event/select
loops and more.

I don't understand why he wants what he wants, maybe he doesn't either.
The stuff you mention, he seems to understand those things enough to
discuss them rationally.

Xho
 
J

James Marshall

Thanks again... comments below:


On Tue, 28 Mar 2006, Jim Gibson wrote:

JG> There doesn't seem to be much advantage in using the <> operator for
JG> reading HTML. It isn't really line-oriented. While most HTML pages
JG> might have newlines in them, you can't really count on it, nor can you
JG> count on newlines being in logical places. Valid HTML need not contain
JG> any newlines after the HTTP header.

I agree 100%. That's why after the message body starts, I (usually) no
longer use <>. But while reading the HTTP headers, or when reading chunk
sizes in a chunked body, I use <>.


JG> I would use sysread and buffer up the data received until the HTML
JG> page is finished (at the </HTML> tag, presumably), or the socket
JG> server disconnects (hopefully because the page has been sent), or
JG> until some timeout period. You can then process all of the HTML at one
JG> time.

That's what the program does now. However, there are some operations,
such as long multi-database queries, that take a long time (minutes) to
complete, and they provide partial results to entertain the user while
waiting for the rest of the page. That's why I need to send partial
results to the user.

[The application is something like a proxy, taking requests for remote Web
pages and returning the pages to the user.]


JG> > >> I can't use non-blocking I/O, because that implies a busy loop when no
JG> > >> data is available (doesn't it?).
JG>
JG> Not necessarily. You can do other work between checking for input, or
JG> you can sleep for a second to let other processes use the CPU.

You're right. These wouldn't really work here, though.


JG> > What I want to do is read and process the (in this case HTML) data as it
JG> > comes in, rather than wait for the whole resource to download, or even
JG> > wait for a whole buffer block. But if there's no incoming data waiting, I
JG> > don't want the program to eat up the CPU in a busy loop-- the program is a
JG> > CGI script that may have many instances running simultaneously, and busy
JG> > loops are bad style anyway.
JG>
JG> It is hard to see the advantage unless the source of the HTML is really
JG> slow or you have to do a lot of processing with the HTML you read.

Unfortunately, both of those are true.


JG> > Because I *can* process some data from a partial read.
JG>
JG> Are you really gaining anything by processing a partial read before the
JG> page has been delivered? Can't you just wait until it is all there or
JG> until some timeout period has elapsed, in the event of a slow or
JG> unreliable network?

Not really-- see explanation above.


Thanks,
James
.............................................................................
James Marshall (e-mail address removed) Berkeley, CA @}-'-,--
"Teach people what you know."
.............................................................................
 
J

James Marshall

Thanks. As before:


What you want to do is going to have you processing one character at a time
anyway, if that is what the client sends you. If your client is not
generally going to be sending one character at a time, but rather large
chunks, then what is the harm in blocking until you get one of those
chunks?

Yes, ultimately we're processing one character at a time, but my
impression is that using Perl functions to do so will take longer. For
example, reading 1 character 1000 times will take a lot longer than
reading 1000 characters once.

The harm is that it keeps the user waiting unnecessarily, when they could
instead be reading partial results.

[The app in question is similar to an HTTP proxy, retrieving pages,
modifying them, and sending them to the client/user.]

Ah. I think that if you very carefully combine O_NONBLOCK, and read, and
select, that you could arrive at what you want, except that
1) It will play havoc with the <> which you mixing in with your read().
2) Combining them will probably be at least as hard as just using sysread.

Yeah, I may just go with sysread().

Sure, you can, but why? Let's say your client sends you a chunk of 100
bytes every second, and that it takes on average 100 nanoseconds per
byte to process.

You wait for a chunk of 1000 bytes. block 10 seconds, process 100
milliseconds. Total 10.10 seconds.

Alternatively, your desired way:
block 1 second, process 10 millisecond, block 0.99 second, process 10
millisecond, block 0.99 second and so on for 10 chunks. Total of 100
millisecond processing and 9.91 second blocking. Total time of 10.01
seconds.

That seems like an awful lot of work to get that .09 seconds!

OK, so what happens if you have 100 server processes running at once,
doesn't that 0.09 start to add up? Nope. At that point, pure CPU becomes
the bottleneck. If anything, the blocking chunk method will be more CPU
efficient and thus have a higher throughput.

Well, what if I have 1e8 bytes to send (Well, then why the heck are you
dilly dallying around sending only 100 bytes per second?), wouldn't the
difference add up then? Still wouldn't. The 100 milliseconds that the
"chunk of 1000" methods spends processing each chunk (other than the last
on) do not count, as they just come out of the blocking time of the next
chunk. So only the last chunk of 1000 gets the 0.09 second penalty.

It seems like it would take some extremely unusual circumstances to make
your semi-blocking method turn out to be more than trivially better. Of
course, maybe you do in fact face extremely unusual circumstances--if so
sorry for the lecture.

No no, thanks for the effort! I do think that in this case my desired
approach can get useful partial pages back to the user, rather than have
them wait minutes for the complete page all at once. In the case I'm
debugging with right now, complete sections of the query results page are
sent, but there is a long delay between those sections of results. This
means that if a section is sent, it will not fully get to the user until
the *next* section's delivery starts (unless the size of the section is an
exact multiple of the input buffer size).

Not quite so simple. If I were determined to do this, I would probably
change read() to my_read(). Then define my_read to do the sysread, plus
whatever restarting-upon-interupts and error checking is necessary.
The bigger wrinkle is that, since you also use <>, you would need a
different set of special my_readline() code to replace that. And
my_read() and my_readline() would probably need to share state with each
other.

OK, thanks. I may still go with setting $/ to ">" and using <> . Then
the results section's end will certainly fall on an input "record"
boundary. Problem solved, I think, but potentially a little slow... but
then again, maybe not.

Have you looked into using LWP::parallel to handle both sending and
efficiently receiving the requests?

No, but maybe I will.


Thanks,
James
.............................................................................
James Marshall (e-mail address removed) Berkeley, CA @}-'-,--
"Teach people what you know."
.............................................................................
 
J

James Marshall

The OP chiming in here:

On Tue, 28 Mar 2006, Uri Guttman wrote:

UG>
UG> x> Your assumption that he is making that assumption is not warranted.
UG> x> There are two thing we wants to avoid. Ordinary <$socket> does one of them
UG> x> (block), naive O_NONBLOCK does the other (busy loop)
UG>
UG> how does O_NONBLOCK mean busy loop? the OP knows that select loops exist
UG> but my impression is that he doesn't know how they work. in fact they
UG> are easy to use and can work with buffered i/o as long as you only do
UG> buffered reads using <>. mixing <> and sysread is the danger, not the
UG> blocking or select loop stuff.

I know select(), thank you. Trouble is, I don't think select() works in
conjunction with buffered input (e.g. the Camel book strongly recommends
against it). If it can be made to so work, I'm interested in those
details (such as what xhoster posted).


UG> also the OP mentioned http can send binary after text? from what i
UG> know http only handles text and all binary data (post params or
UG> uploaded files) are encoded. so he can still read that data with <>.
UG> now as to why he is reading http data himself is another question. so
UG> many of the things the OP mentioned were all over the map and show a
UG> deep misunderstanding of net protocols, sockets, buffering,
UG> event/select loops and more.

Um, I'm not sure how to respond to this, but I'm giggling. For one, HTTP
sends more than just form submissions; it sends all Web traffic, including
binary data-- read the RFC. Maybe even search for an HTTP tutorial to
make it easier. :) (You'll get the joke when you do the search.) That
word "misunderstanding", I don't think it means what you think it means
(apologies to "The Princess Bride").


Best,
James
.............................................................................
James Marshall (e-mail address removed) Berkeley, CA @}-'-,--
"Teach people what you know."
.............................................................................
 
U

Uri Guttman

x> Right, but they are only semi-nonblocking (and the semi part of it
x> is not in the way the OP wants, or thinks we wants.)

semi-nonblocking? huh??

x> <> isn't the only thing he uses, he also uses read, which does not
x> work well in a select loop. And <> doesn't mix with O_NONBLOCK.

<> doesn't mix with sysread. you can do nonblocking i/o with <> but it
makes little sense. <> will read until it hits a newline, blocking or
not. it just keeps in a read loop. so the non-blocking shouldn't even
affect <> (but i am not going to test it). the blocking nature of a
socket has nothing to do with buffering or how you read it. the mixing
of read methods (buffered <> vs unbuffered sysread) is the big no-no as
a sysread after <> may not see any data already read by <> and which is
sitting in the stdin buffers.

x> You can read either text or binary data with <>. With binary data,
x> you probably shouldn't chomp, at least not if $/ is "\n". Other
x> than that, it doesn't make much difference, except maybe if you
x> turn on utf or maybe on Windows. Binary data might go for hundreds
x> of megabytes without using a \n (or whatever $/ is), but http data
x> can do that to. If this lack of line-orientation is what makes
x> binary data binary, then http is binary. One the other hand, if it
x> just 8-bit versus 7-bit, it seems like http can produce data that
x> is binary in that regard to (whether it is supposed to do that or
x> not, may be a different matter, but experimentally, it can do
x> that.)

reading binary data with <> is just plain stupid (and we have seen
posted code that does that). as for http being text or binary,
it is text as it needs to have mime markers to handle uploads which are
encoded as text (uuencoded IIRC). the rfc's will cover this. it would
make no sense to have real binary data in a post/upload as you couldn't
detect the end of the file and have multiple uploads. same as with
email, all attachments must be encoded into some form of plain text. and
that encoded text will have line endings in them so you can read them
with <> and not kill your stdio.

x> I don't understand why he wants what he wants, maybe he doesn't either.
x> The stuff you mention, he seems to understand those things enough to
x> discuss them rationally.

rationally but with little understanding IMO. the way he discussed all
the different issues said he didn't grok them well. and the final
correct answer is to use an event loop or stem or poe (the latter two
being higher level event loops). all this other mishmosh is just
that. you don't handle multiple i/o requests with polled i/o or <> or buffered
i/o. in another post the OP said this was a http proxy thingy. that
cries out for an event loop. anything else (other than sickening
threads) is foolish coding.

uri
 
J

James Marshall

By using it naively. (That is why I used the word naive) :)

Yes, I forgot to clarify that I don't want to sleep(), nor is there other
processing that can be done before the data arrives.

You can read either text or binary data with <>. With binary data, you
probably shouldn't chomp, at least not if $/ is "\n". Other than that,
it doesn't make much difference, except maybe if you turn on utf or
maybe on Windows. Binary data might go for hundreds of megabytes
without using a \n (or whatever $/ is), but http data can do that to.
If this lack of line-orientation is what makes binary data binary, then
http is binary. One the other hand, if it just 8-bit versus 7-bit, it
seems like http can produce data that is binary in that regard to
(whether it is supposed to do that or not, may be a different matter,
but experimentally, it can do that.)

HTTP can transmit any kind of data, by design. You can also declare the
character set or encoding as needed, in the headers. HTTP is very
flexible.

Every time your browser retrieves an image, you're using HTTP to transmit
"binary" data.

I don't understand why he wants what he wants, maybe he doesn't either.

Nope, I'm pretty sure I know why I want it. :) I've been around this
problem for a couple of days now.

The stuff you mention, he seems to understand those things enough to
discuss them rationally.

Thank you. (Sheesh. What's become of Usenet these days?)


Cheers,
James
.............................................................................
James Marshall (e-mail address removed) Berkeley, CA @}-'-,--
"Teach people what you know."
.............................................................................
 
U

Uri Guttman

JM> The OP chiming in here:
JM> On Tue, 28 Mar 2006, Uri Guttman wrote:

UG> how does O_NONBLOCK mean busy loop? the OP knows that select loops exist
UG> but my impression is that he doesn't know how they work. in fact they
UG> are easy to use and can work with buffered i/o as long as you only do
UG> buffered reads using <>. mixing <> and sysread is the danger, not the
UG> blocking or select loop stuff.

JM> I know select(), thank you. Trouble is, I don't think select() works in
JM> conjunction with buffered input (e.g. the Camel book strongly recommends
JM> against it). If it can be made to so work, I'm interested in those
JM> details (such as what xhoster posted).

read the perl cookbook (at least the 1st edition covers this). it has
examples of event loops using <>. *I* don't recommend that because i
like to do my own buffering and i know how to do it correctly. and i
know sysread/write are much faster than stdio. but it is perfectly
logical to use <> and an event loop if the protocol is properly line
oriented.

and i will reiterate that you should use an event loop (not select
directly as that is painful) for this. and some systems (stem as i
mentioned) can do all the buffering for you. but you don't seem to want
to bite the bullet and do this correctly. i won't help anymore then.

uri
 
J

James Marshall

Uri,

Please stop pretending to know about things you don't. You might confuse
some people who are trying to learn here. You're also preventing yourself
from learning. Furthermore, spewing insults is not the way to behave in
this newsgroup, even if I *was* wrong (which, note, I wasn't). Or, for
that matter, in (almost) any newsgroup.

I'm not trying to provoke. I'm posting this so that future readers of
this thread will know to not believe your posts on this thread, because
many of your statements are factually wrong, despite the bravado.

James
.............................................................................
James Marshall (e-mail address removed) Berkeley, CA @}-'-,--
"Teach people what you know."
.............................................................................

On Tue, 28 Mar 2006, Uri Guttman wrote:

UG>
UG> x> Right, but they are only semi-nonblocking (and the semi part of it
UG> x> is not in the way the OP wants, or thinks we wants.)
UG>
UG> semi-nonblocking? huh??
UG>
UG> >> mixing <> and sysread is the danger, not the
UG> >> blocking or select loop stuff.
UG>
UG> x> <> isn't the only thing he uses, he also uses read, which does not
UG> x> work well in a select loop. And <> doesn't mix with O_NONBLOCK.
UG>
UG> <> doesn't mix with sysread. you can do nonblocking i/o with <> but it
UG> makes little sense. <> will read until it hits a newline, blocking or
UG> not. it just keeps in a read loop. so the non-blocking shouldn't even
UG> affect <> (but i am not going to test it). the blocking nature of a
UG> socket has nothing to do with buffering or how you read it. the mixing
UG> of read methods (buffered <> vs unbuffered sysread) is the big no-no as
UG> a sysread after <> may not see any data already read by <> and which is
UG> sitting in the stdin buffers.
UG>
UG> x> You can read either text or binary data with <>. With binary data,
UG> x> you probably shouldn't chomp, at least not if $/ is "\n". Other
UG> x> than that, it doesn't make much difference, except maybe if you
UG> x> turn on utf or maybe on Windows. Binary data might go for hundreds
UG> x> of megabytes without using a \n (or whatever $/ is), but http data
UG> x> can do that to. If this lack of line-orientation is what makes
UG> x> binary data binary, then http is binary. One the other hand, if it
UG> x> just 8-bit versus 7-bit, it seems like http can produce data that
UG> x> is binary in that regard to (whether it is supposed to do that or
UG> x> not, may be a different matter, but experimentally, it can do
UG> x> that.)
UG>
UG> reading binary data with <> is just plain stupid (and we have seen
UG> posted code that does that). as for http being text or binary,
UG> it is text as it needs to have mime markers to handle uploads which are
UG> encoded as text (uuencoded IIRC). the rfc's will cover this. it would
UG> make no sense to have real binary data in a post/upload as you couldn't
UG> detect the end of the file and have multiple uploads. same as with
UG> email, all attachments must be encoded into some form of plain text. and
UG> that encoded text will have line endings in them so you can read them
UG> with <> and not kill your stdio.
UG>
UG> x> I don't understand why he wants what he wants, maybe he doesn't either.
UG> x> The stuff you mention, he seems to understand those things enough to
UG> x> discuss them rationally.
UG>
UG> rationally but with little understanding IMO. the way he discussed all
UG> the different issues said he didn't grok them well. and the final
UG> correct answer is to use an event loop or stem or poe (the latter two
UG> being higher level event loops). all this other mishmosh is just that.
UG> you don't handle multiple i/o requests with polled i/o or <> or
UG> buffered i/o. in another post the OP said this was a http proxy
UG> thingy. that cries out for an event loop. anything else (other than
UG> sickening threads) is foolish coding.
UG>
UG> uri
UG>
UG> --
UG> Uri Guttman ------ (e-mail address removed) -------- http://www.stemsystems.com
UG> --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
UG> Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
UG>
 
U

Uri Guttman

JM> Please stop pretending to know about things you don't. You might
JM> confuse some people who are trying to learn here. You're also
JM> preventing yourself from learning. Furthermore, spewing insults
JM> is not the way to behave in this newsgroup, even if I *was* wrong
JM> (which, note, I wasn't). Or, for that matter, in (almost) any
JM> newsgroup.

i may have messed up with the http protocol stuff but not with the
i/o. but go on, i won't bother to help you anymore. 20 years of doing
event loops and i/o systems must be just delusions of my diseased
mind. i wasn't the one who was writing a server using <> and then
wondering how to fix it. believe what you want, it is your code and your
problem.

JM> I'm not trying to provoke. I'm posting this so that future
JM> readers of this thread will know to not believe your posts on this
JM> thread, because many of your statements are factually wrong,
JM> despite the bravado.

and what you said above was insulting as well. so get off your own high
horse and syswrite it into the sunset. as for the future readers, they
can make up their own minds and don't need your 'help'. you were the one
babbling about mixing random i/o operations.

uri
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,015
Latest member
AmbrosePal

Latest Threads

Top