Reading from a socket: first characters, then octets

S

Stefan Ram

When a client sends an HTTP PUT request to my web server,
I start to read characters from the socket:

this.br =
new java.io.BufferedReader
( new java.io.InputStreamReader
( socket.getInputStream(), "ANSI_X3.4-1968" ));

But sometimes, after the initial text, the socket will
change to emit binary data (octets), that is, it will
send me the actual data of the PUT request during the
same transmission (TCP session).

The read()-method of the InputStreamReader will give
me a converted character. But I need unconverted octets
from this point on. With the above code, Java will
convert some octets and modify their values, so that
the binary data will become corrupted.

When I start then to use the read()-method of the
socket.getInputStream(), I might miss some octets which where
already read by the InputStreamReader or the BufferedReader.
But I also want to have buffering, because I/O usually is
slower without buffering.

Is there already a well-known solution to such a problem,
that is, switch from text to binary mode when reading from
a socket?

One possibility would be an encoding name that hints to
Java to treat 0A and 0D as line seperators, but otherwise
not to modify octet values, when converting octets to
characters. Then I could read the octets as characters.
(Or do I have to write a custom CharsetDecoder for this?)
 
M

markspace

Stefan said:
In the meantime, I have written a workaround that reads
only octets (binary data) from the socket and uses a
custom method (instead of a Reader from java.io) to
break lines from the binary data.


This is basically what I was going to suggest. Read the data as byte.
Then from a buffer, make an InputStream that reads from the buffer.
Wrap this in a Reader of some sort to read character data. If you want
to change character encodings or switch to binary, toss out the the
InputStream & Reader pair and make a new one.

I have incomplete code around here that does this, although it's still
being noodled around with. Your custom code is probably good enough for
now.
 
R

Roedy Green

But sometimes, after the initial text, the socket will
change to emit binary data (octets), that is, it will
send me the actual data of the PUT request during the
same transmission (TCP session).

Read it with a DataInputStream. To read the char data, collect bytes
with readUnsignedByte, readByte or read and write them to a
ByteArrayOutputStream, and process the binary with the DataInputStream
methods. Then when you have all the text, read it with a Reader.

See http://mindprod.com/applet/fileio.html for sample code.

ByteArrayInputStream bais = new ByteArrayInputStream( bytesCollected
);
InputStreamReader eisr = new InputStreamReader( bais,"UTF-8" );


--
Roedy Green Canadian Mind Products
http://mindprod.com

"For reason that have a lot to do with US Government bureaucracy, we settled on the one issue everyone could agree on, which was weapons of mass destruction."
~ Paul Wolfowitz 2003-06, explaining how the Bush administration sold the Iraq war to a gullible public.
 
O

Owen Jacobson

When a client sends an HTTP PUT request to my web server,
I start to read characters from the socket:

this.br =
new java.io.BufferedReader
( new java.io.InputStreamReader
( socket.getInputStream(), "ANSI_X3.4-1968" ));

What about

this.s = new java.io.BufferedInputStream(socket.getInputStream());
this.r = new java.io.InputStreamReader(s, encoding);
But sometimes, after the initial text, the socket will
change to emit binary data (octets), that is, it will
send me the actual data of the PUT request during the
same transmission (TCP session).

The read()-method of the InputStreamReader will give
me a converted character. But I need unconverted octets
from this point on. With the above code, Java will
convert some octets and modify their values, so that
the binary data will become corrupted.

When I start then to use the read()-method of the
socket.getInputStream(), I might miss some octets which where
already read by the InputStreamReader or the BufferedReader.
But I also want to have buffering, because I/O usually is
slower without buffering.

By doing the buffering before conversion, you're guaranteed that only
the bytes you've already read as characters will have been consumed
from the buffer when you begin reading bytes, rather than those bytes
plus up to a whole buffer page. You could, for example:

while (r.read() != '.') ; // consume up to the first dot
readSomeBytes(s); // consume some binary data

and be assured that readSomeBytes would pick up at the first byte after
the code unit that was read as a '.' characters.

-o
 
L

Lothar Kimmeringer

Stefan said:
When a client sends an HTTP PUT request to my web server,
I start to read characters from the socket:

this.br =
new java.io.BufferedReader
( new java.io.InputStreamReader
( socket.getInputStream(), "ANSI_X3.4-1968" ));

But sometimes, after the initial text, the socket will
change to emit binary data (octets), that is, it will
send me the actual data of the PUT request during the
same transmission (TCP session).

I don't know that charset, but as long as a line break
is there 0x0a and/or 0x0d as well you can do the following:

br = new BufferedReader(new InputStreamReader(is, "8859_1"));
String line;
while ((line = br.readLine()) != null){
String realText = new String(line.getBytes("8859_1"), "ANSI_X3.4-1968");
if (lineFitsCondition(realText)){
break;
}
}
char[] buf = new char[BUF_SIZE];
int read;
while ((read = br.read(buf)) != -1){
addBinaryDataToWhatever(new String(buf, 0, read).getBytes("8859_1"));
}

It's ugly, but it works and you don't need to fiddle around with
the internals of InputStreamReader (which is buffering with
quite some buffer - AFAIR 4096 bytes).


Regards, Lothar
--
Lothar Kimmeringer E-Mail: (e-mail address removed)
PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

Always remember: The answer is forty-two, there can only be wrong
questions!
 
K

Karl Uppiano

Owen Jacobson said:
What about

this.s = new java.io.BufferedInputStream(socket.getInputStream());
this.r = new java.io.InputStreamReader(s, encoding);

This is my favorite solution so far. Just put a 'T' on the stream, and tap
off the raw bytes as needed!
 
K

Kevin McMurtrie

"Karl Uppiano said:
This is my favorite solution so far. Just put a 'T' on the stream, and tap
off the raw bytes as needed!

InputStreamReader's documentation doesn't say it reads the minimum
number of bytes needed to produce a response. It delegates to
sun.nio.cs.StreamDecoder and there's no knowing what it does.

InputStream's mark()/reset() is very handy for parsing streams needing
an unknown amount of lookahead. If markSupported() returns false, wrap
in a BufferedInputStream. Hopefully you can translate read characters
to skip bytes.
 
R

Roedy Green

Is the behaviour of what should happen when you read both from s and r
fully defined? Or are you getting a pig in a poke with whatever the
implementation happens to do for now?


--
Roedy Green Canadian Mind Products
http://mindprod.com

"For reason that have a lot to do with US Government bureaucracy, we settled on the one issue everyone could agree on, which was weapons of mass destruction."
~ Paul Wolfowitz 2003-06, explaining how the Bush administration sold the Iraq war to a gullible public.
 
K

Karl Uppiano

Roedy Green said:
Is the behaviour of what should happen when you read both from s and r
fully defined? Or are you getting a pig in a poke with whatever the
implementation happens to do for now?

That's a good question. The BufferedInputStream could introduce some
uncertainty. A FilterInputStream might be a better choice as a 'T'. Might
have to implement the buffering in there, or wrap the BufferedInputStream
with the FilterInputStream.

In any event, I turn to a FilterInputStream whenever I have custom
processing to do on a stream.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top