Java Sockets/bufferedIOstreams are full-duplex right?

R

Richard Maher

Hi,

Sorry if this is another stupid question, but it's been a long day and I'm
trying to rule things out.

I've got a tcp/ip Socket connected to a remote host. I've also got a Reader
thread with a blocking read on a BufferedInputStream associated with that
Socket. In another thread, I then do a write to a BufferedOutputStream
associated with the same Socket.

What I think I'm seeing is the local buffered-read pick up the message
rather than the remote host. Clearly a bug somewhere in my code, but not
being the full-bottle in the ins and outs of Java
BufferedInput/OutputStreams I'd appreciate it if someone could rule out any
missing switch or config/constructor issues.

1) I create a Socket
2) in = new BufferedInputStream(sock.getInputStream(), maxbuf)
3) out = new BufferedOutputStream(sock.getOutputStream(), maxbuf)
4) Exchange a couple of messges happily with remote host
5) Kick-off a Reader thread that does and in.read
6) Send off a message to the remote-server which looks like it's being
nabbed by 5

Cheers Richard Maher
 
R

Richard Maher

Hi Pete,

Peter Duniho said:
[...]
1) I create a Socket
2) in = new BufferedInputStream(sock.getInputStream(), maxbuf)
3) out = new BufferedOutputStream(sock.getOutputStream(), maxbuf)
4) Exchange a couple of messges happily with remote host
5) Kick-off a Reader thread that does and in.read
6) Send off a message to the remote-server which looks like it's being
nabbed by 5

Sounds like a bug in your code.

Yeah simple school-boy error in the end, the message was getting through to
a different test server and just being echoed back. That might explain why
nothing was showing up in the log-files I was looking at - Oops!
But without a concise-but-complete code
example that reliably demonstrates the problem, it's not really possible
to say what the bug is.

For what it's worth, I would not use the BufferedOutputStream with a
Socket, especially not for a message/transactional protocol.

What do you recommend? Write directly to the underlying OutputStream? NIO?
(I remember liking the in-built endianness, charset support but balking at
it for a few reasons? No connect timeout - and/or some such?)
The network
stack already provides some buffering, and unless you remember to call
flush() all the time,

I do flush() after each complete write, but I currently don't have a problem
with writing HeaderStuff1,2,3+Body then Flush() and I thought it was a
performance booster to mitigate byte-oriented i/o issues?
you may find data not getting out to the network
when you expect it to.

Such is tcp/ip and segment sizes, but I do have nodelay set and Synchronize
the writers before accessing the BufferedOutputStream, but as long as the
messages (or message fragments) arrive in order, I'm good with it. Are there
nasty side effects we should be aware of?

(BTW the Tier3Socket.java example I referred to is somewhat dated, and reads
are now repeated/streamed until N bytes or N = -1)
BufferedInputStream might be a bit more useful,
depending on how you're using the Socket.

I don't think that the use of either Buffered... class should have
anything to do with the problem you're seeing. More likely, you've
somehow confused the data structures involved in your own code (assuming,
of course, that the remote host is correctly implemented and you're sure
it wouldn't simply echo messages you've sent).

Guilty :)

Regards Richard Maher
 
M

Mark Space

Peter said:
In very low-bandwidth-usage situations, you can get the data there a
tiny bit quicker by disabling Nagle. But this is usually not really as
important as people think, and it's counter-productive any time you're
trying to send any significant amount of data all at once.


Er, this. Good lord, don't mess with Nagle, period. If you have to
ask, don't touch this. At all. Ever. Least ye die.
 
E

EJP

Peter said:
For what it's worth, I would not use the BufferedOutputStream with a
Socket, especially not for a message/transactional protocol. The
network stack already provides some buffering, and unless you remember
to call flush() all the time, you may find data not getting out to the
network when you expect it to.

I wouldn't ever use a Socket +without+ a BufferedOutputStream,
especially if there's a DataOutputStream in the stack: just to conserve
context switches. The rule for calling flush() is very simple: call it
before you read. If the socket is an SSLSocket, the BufferedOutputStream
is am absolute *necessity,* to avoid a 42x space explosion when writing
single bytes, if you do that sort of thing.
 
E

EJP

Peter said:
your "serializing complex structures" scenario is where you naturally
have data that is literally only a single byte.

Yep, such as type octets in type-length-value protocols, which abound;
the TC_* constants in the Serialization protocol; ....
(do I really care that sending just ten different bytes takes ten bytes plus
TCP/IP overhead, or 370 bytes plus TCP/IP overhead?)

Plus the CPU overhead of all those extra SSL records at both ends,
especially computing and checking all those MACs. It's non trivial.

Plus the CPU overhead of all those extra context switches in any
protocol, SSL or plaintext.

NB My 42x figure came from an experiment.
 
R

Roedy Green

I've got a tcp/ip Socket connected to a remote host.

The socket is full duplex, but you need two threads if you want to
send and receive simultaneously.
--
Roedy Green Canadian Mind Products
http://mindprod.com

"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
~ Charles Darwin
 
E

EJP

Peter said:
But that's my point. Even there, typically that one byte is a
relatively small proportion of the overall data.

But it's another write. With another context switch. And another JNI
boundary jump. So its cost is disproportionate to its size, unless you
buffer.
What context switches?

The context switches between Java and the kernel for each write() call,
including crossing the JNI boundary. You seem to be completely ignoring
this issue. It's the entire reason C stdio exists, for example.
I'm just saying that bandying about numbers like 40x (or whatever)
seriously overstates the potential problems.

It provides a bound on the worst case.
 
E

EJP

Peter said:
Anyway, I'm not accustomed to people using the phrase "context switch"
to mean something other than a thread context switch. I'm not ignoring
the issue. I'm having trouble parsing your posts. Thank you for
explaining what you actually meant.

Peter, thank you too, but I am frankly astounded that you've apparently
never heard of context switches into the kernel, or why their cost
motivates user-space buffering. And I don't accept your representation
of that as a 'parsing' difficulty with *my* posts. I expressed myself
clearly using standard terminology.
In any case, as with the memory overhead, the performance overhead isn't
going to be that significant under normal circumstances, because
individual bytes as part of the overall data are relatively infrequent.

We've discussed that. Your terms are getting more and more rubbery. Now
we're down to 'under normal circumstances' and 'relatively infrequent'.
The plain fact is that network I/O is measurably more efficient with a
BufferedOutputStream than without one. Sun do it everywhere I can think
of. Me too. If you're going to offer advice *to the contrary* you need
to provide a lot more justification than you have.
And seriously overstates the potential problems in the process.

Again I can't accept that. I didn't 'bandy' anything about. I provided
an experimental result. And providing a bound on the worst case doesn't
'overstate' anything. It does the opposite in fact.

EJP
 
E

EJP

Peter said:
Here's the Wikipedia article on the phrase "context switch":

Peter, I didn't learn computing from Wikipedia, or from the x86
architecture either. The phrase 'context switch' predates Wikipaedia by
forty-something years, and predates threads by decades too. It's had
nothing *except* kernel/user mode switches to apply to for most of its
history, and for most of mine in this industry too. Hence my usage. That
might make me a dinosaur but it doesn't make my statements unparseable,
which incidentally is a misuse of that terminology. However if calling
it a kernel/user mode transition will help, let's do that. They are
expensive and they are the motivation for buffering in user space. The
only motivation.
Your mischaracterization of my previous posts is unwarranted and frankly
quite a bit disingenous.

You're right. I checked. Your terms aren't getting more and more rubbery
at all. They were rubbery all the time.

What really is 'unwarranted and frankly disingenuous' here is your
characterization of my previous posts as 'bandying numbers around'. I
did no such thing. Go back and read it. I gave a precise statement of an
upper bound and the circumstances under which it could be experimentally
reproduced. It is quantitative experimental data and it has been
supported within a small margin of error by another poster, who was
arguing from first principles rather than experiment. And you accepted
it. So there are two reasons to believe it, none to disbelieve it, and
none whatsoever to characterize it as 'bandying numbers around'. You owe
me an apology on that.
instead of once introduces overhead.

If that's what BufferedInput/OutpuStream do, which they avoid when
possible, which is another relevant fact. It's also patently obvious
that kernel/user mode switches introduce overhead, and it's also
patently obvious that BufferedInputStream exists for a reason. Let's ask
another question: would you read a socket (or a file) 2 bytes at a time?
4? 16? 128? Do you know where the break-even point is here?
A person needs to read the statements given here very carefully to realize that in real-world
code, they're not going to see anywhere near a ~40x increase in
overhead. That's most certainly "overstating" in my book.

No, that's just failing to read what I actually wrote. What I wrote
wasn't an overstatement. Go and read it again.
you've decided to focus on this
issue to the detriment of the more important question of when to flush
the output stream.

(a) That's because you're arguing against using buffered streams at all,
a case in which you +never+ have to flush, and

(b) because I've given my answer and I don't see any reason to revise
it. In the normal request/response case it's a perfectly sound
principle. As you pointed out, it doesn't apply when there are separate
threads reading or writing the stream: this seems so obvious to me that
I didn't think it required elucidation beforehand, or further comment
afterward.

EJP
 
L

Lew

Peter said:
No, you didn't. You never responded to that particular point. You
simply ignored it.

FWIW, this observer has seen Pete's arguments in this thread supported by
evidence, citation of authoritative if not normative sources, agreement with
salient points (such as that a 37x increase can happen, if only in trivial and
unlikely situations), and a tone of reason and logic. EJP's arguments have
been characterized by unsupported assertions, personal remarks, lack of
evidence and diversion of the point.

I've certainly disagreed firmly with Pete in the past, so it's not like I have
a stake in one person's "side" over another here. EJP, I suggest you provide
actual citations for your idiolectic use of the term "context switch" and
focus on logic and evidence. Pete, I suggest that further argument will not
strengthen your already solid points.
 
E

EJP

Peter said:
I take it you are unable to support your claim with an authoritative
reference.

I don't know which 'claim' is under discussion here. If it's the current
meaning of 'context switch', I have no authority whatsoever, just 38
years of computer programming. No doubt I am a dinosaur, as I seem to
remember stating before. If it's the 42x figure I derived from a
repeatable experiment, I don't *need* an 'authoritative reference'. This
is science, not religion. Arguments from authority do not hold. I only
need the experimental data. If you have some contrary experimental data
please post it here.
misrepresent your opponent's statements

I'm not aware of having done that other than in circumstances where I
have myself been misrepresented, but if you can produce an example
please do so.
I have not argued "against using buffered streams at all".

That is precisely and exactly the content of your first posting in this
matter. You've subsequently resiled in the light of followups, which the
first instance came from me.
the only possible
reason to disagree with that statement is if you _do_ believe that the
worst-case scenario is in fact the common case.

As I haven't asserted (and don't believe) that, the point is immaterial.
No, you didn't. You never responded to that particular point. You
simply ignored it.

The paragraph which you have selectively quoted here does precisely that.

Baci

EJP
 
E

EJP

Lew said:
Peter Duniho wrote:
EJP's arguments have been characterized by unsupported
assertions, personal remarks, lack of evidence and diversion of the point

On the contrary. My 'arguments' have consisted of reportage of
observable and repeatable facts. As a matter of fact I was the first
person to introduce actual observations, rather than unsupported
opinion, into the discussion.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top