What replaces StringBufferInputStream

Patricia Shanahan · Aug 29, 2006

Lasse Reichstein Nielsen wrote:
....

Strings contain characters, so the most fitting sequential input to
convert it to would be a Reader.

Yes, but as far as I can tell Reader is a total dead end when the
objective is InputStream.

I got stuck, and had to ask for help, precisely because I was thinking
Reader, when I should have been taking a detour through byte arrays

Patricia

John W. Kennedy · Aug 30, 2006

Chris said:
Maybe that came out wrong, but taken literally it is completely false.
InputStreams are central to the IO architecture.

Only for reading and writing raw bytes. There is, so to speak, an
impedance mismatch between strings and streams, which is why, since 1.1,
strings are supposed to be processed by Reader and Writer classes.
That's why StringBufferInputStream is obsolete, producing the warning
messages that were the cause of this whole thread in the first place.

=?ISO-8859-1?Q?Arne_Vajh=F8j?= · Aug 30, 2006

John said:
Only for reading and writing raw bytes. There is, so to speak, an
impedance mismatch between strings and streams, which is why, since 1.1,
strings are supposed to be processed by Reader and Writer classes.
That's why StringBufferInputStream is obsolete, producing the warning
messages that were the cause of this whole thread in the first place.

Your first post said "Java doesn't want you to use an InputStream
for anything" without the "Only for reading and writing raw bytes".

Arne

M.J. Dance · Aug 30, 2006

Patricia said:
Lasse Reichstein Nielsen wrote:
...

Yes, but as far as I can tell Reader is a total dead end when the
objective is InputStream.

I got stuck, and had to ask for help, precisely because I was thinking
Reader, when I should have been taking a detour through byte arrays

That seems to be the problem with this input stream <-> reader (and output
stream <-> writer, for that matter) dichotomy. Every(?) reader has an uderlying
input stream which, I imagine, wouldn't be a problem to obtain. But that would
mean inviting problems: reading from both stream and reader simultaneously could
cause "unpredictable" behaviour: a few <khm/> years ago I was trying to make a
jsp serve binary content. No problem, I thought, one can obtain an output stream
from response (implicit object, instanceof (Http)ServletResponse, available in
every jsp) and send data through there. But it (the servlet engine, that is),
said that it already called .getWriter() and that .getOutputStream() cannot be
called after that. Maybe things changed since then, but it's a good
illustration. I think.

Chris Uppal · Aug 30, 2006

John said:
Only for reading and writing raw bytes.

?!? Only !?!

That's hardly a trivial, obscure, or unimportant application !

(In fact, judging from what I read in this ng, many applications, or would-be
applications, of Readers or Writers are technically wrong in that the posters
are trying to treat binary data as if it were "really" text.)

-- chris

Chris Uppal · Aug 30, 2006

Oliver said:
Setting an explicit encoding (to me) implies that that's the actual
encoding you want to use, as opposed to you having just chosen an encoding
randomly because you didn't know which one was appropriate.

Fair point.

With luck (i.e. I haven't bothered to check) the US-ASCII decoder will signal
an error if it is fed bytes outside the [0, 127] range. If so then setting
that would be one way to be explicit about the assumption (almost certainly
correct) that I think Patricia's making.

-- chris

Lasse Reichstein Nielsen · Aug 30, 2006

M.J. Dance said:
That seems to be the problem with this input stream <-> reader (and
output stream <-> writer, for that matter) dichotomy. Every(?) reader
has an uderlying input stream which, I imagine, wouldn't be a problem
to obtain.

Except, e.g., StringReader.

If you are communicating between processes, or even computers, then at
some point you'll represent your data as the lowest common
denominator: bytes, but working inside a single program, you can start
out with characters and keep it that way.

There is no way to meaningfully convert a generic Writer to an
OutputStream, and it's an implementation detail whether there is an
underlying OutputStream for a given Writer, so the Writer interface
can't meaningfully expose a method for giving out an OutputStream.

On the opposite end, you shouldn't blindly convert an InputStream
to a Reader without knowing that the bytes do represent characters
in the encoding you have chosen.

/L

Chris Uppal · Aug 30, 2006

Lasse said:
There is no way to meaningfully convert a generic Writer to an
OutputStream, and it's an implementation detail whether there is an
underlying OutputStream for a given Writer, so the Writer interface
can't meaningfully expose a method for giving out an OutputStream.

Certainly there is. You can create an OutputStream decorator which wraps a
Writer (or an InputStream which wraps a Reader) in just the same way as an
OutputStreamWriter wraps an OutputStream. All you need is a CharacterDecoder
(or Encoder).

The java.io package lacks such a beast, but it's trivial enough to create your
own if you have a need for it (and -- as I said before -- if you can think of
an acceptable name for it).

I admit there's not an /awful/ lot of use for it though...

-- chris

Mike Schilling · Aug 30, 2006

Chris Uppal said:
Oliver said:

Setting an explicit encoding (to me) implies that that's the actual
encoding you want to use, as opposed to you having just chosen an
encoding
randomly because you didn't know which one was appropriate.

Click to expand...

Fair point.

With luck (i.e. I haven't bothered to check) the US-ASCII decoder will
signal
an error if it is fed bytes outside the [0, 127] range. If so then
setting
that would be one way to be explicit about the assumption (almost
certainly
correct) that I think Patricia's making.

Not quite so much luck.

import java.io.*;

public class BadAscii
{
public static void main(String[] args) throws Exception
{
byte arr[] = { (byte)0x40, (byte)0x80};

ByteArrayInputStream bais = new ByteArrayInputStream(arr);
InputStreamReader isr = new InputStreamReader(bais, "US-ASCII");
while (true)
{
int r = isr.read();
if (r < 0)
break;
System.out.println(
(char)r + "(" + Integer.toHexString(r) + ")");
}
}
}

results in

% java -cp . BadAscii
@(40)
?(fffd)

So, no exception, but the FFFD is a clear indication that there's been a
decoding error.

Mike Schilling · Aug 30, 2006

Chris Uppal said:
Certainly there is. You can create an OutputStream decorator which wraps
a
Writer (or an InputStream which wraps a Reader) in just the same way as an
OutputStreamWriter wraps an OutputStream. All you need is a
CharacterDecoder
(or Encoder).

The java.io package lacks such a beast, but it's trivial enough to create
your
own if you have a need for it (and -- as I said before -- if you can think
of
an acceptable name for it).

WriterOutputStream and ReaderInputStream will do.

Chris Uppal · Aug 31, 2006

Mike said:
WriterOutputStream and ReaderInputStream will do.

Ugh!

;-)

-- chris

Chris Uppal · Aug 31, 2006

Mike said:
So, no exception, but the FFFD is a clear indication that there's been a
decoding error.

Thanks for checking. A pity that it doesn't throw an exception. "Suppressing"
the error is probably the best behaviour for most purposes, but it would be
nice (now I come to think of it) if we could tell a decoding/encoding stream
that we want it to be strict (just as we can tell an actual CharserDecoder how
it should treat "wrong" inputs).

-- chris

Piotr Kobzda · Aug 31, 2006

Chris said:
A pity that it doesn't throw an exception. "Suppressing"
the error is probably the best behaviour for most purposes, but it would be
nice (now I come to think of it) if we could tell a decoding/encoding stream
that we want it to be strict (just as we can tell an actual CharserDecoder how
it should treat "wrong" inputs).

You can easily achieve that setting "strictness" on CharsetDecoder
produced by your desired Charset.

Strictness you can express this way:

CharsetDecoder dec = Charset.forName("US-ASCII").newDecoder();
dec.onMalformedInput(CodingErrorAction.REPORT);
dec.onUnmappableCharacter(CodingErrorAction.REPORT);

Applying it to Mike's example as:

InputStreamReader isr = new InputStreamReader(bais, dec);

Should give you an expected results.

piotr

Mike Schilling · Aug 31, 2006

Piotr Kobzda said:
You can easily achieve that setting "strictness" on CharsetDecoder
produced by your desired Charset.

Strictness you can express this way:

CharsetDecoder dec = Charset.forName("US-ASCII").newDecoder();
dec.onMalformedInput(CodingErrorAction.REPORT);
dec.onUnmappableCharacter(CodingErrorAction.REPORT);

Applying it to Mike's example as:

InputStreamReader isr = new InputStreamReader(bais, dec);

Should give you an expected results.

Thank you; I've changed my example to use it, resulting in

import java.io.*;
import java.nio.charset.*;

public class BadAscii
{
public static void main(String[] args) throws Exception
{
byte arr[] = { (byte)0x40, (byte)0x80};

CharsetDecoder dec = Charset.forName("US-ASCII").newDecoder();
dec.onMalformedInput(CodingErrorAction.REPORT);
dec.onUnmappableCharacter(CodingErrorAction.REPORT);

ByteArrayInputStream bais = new ByteArrayInputStream(arr);
InputStreamReader isr = new InputStreamReader(bais, dec);
while (true)
{
int r = isr.read();
if (r < 0)
break;
System.out.println(
(char)r + "(" + Integer.toHexString(r) + ")");

}
}
}

and it now produces

Exception in thread "main" java.nio.charset.MalformedInputException: Input
length = 1
at java.nio.charset.CoderResult.throwException(CoderResult.java:260)
at
sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:463)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:182)
at sun.nio.cs.StreamDecoder.read0(StreamDecoder.java:131)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:117)
at java.io.InputStreamReader.read(InputStreamReader.java:151)
at BadAscii.main(BadAscii.java:18)

By the way, you can observe that the decoder tries to process the entire
array at once, since the first character, which is legitimate ASCII, is
never returned.

Mike Schilling · Aug 31, 2006

Chris Uppal said:
Ugh!

;-)

Those are worse than InputStreamReader and OutputStreamWriter because ... ?

Dale King · Sep 1, 2006

M.J. Dance said:
There is no proper replacement. The line of code above is mixing two
superficially similar but inherently different things: bytes and chars.
Of course the two are related but, in order to fully describe that
relatinship, one needs additional information: character encoding.
Having that, one can getBytes() from a String and, using those, create a
ByteArrayInputStream.

The essential issue here is that Java (rightly so in my opinion)
associates direction with crossing the boundary between characters and
bytes. Going from character to bytes is only supported in the output or
writing direction. The implication being that conversion between the two
is associated with an external entity (a file, a server). The assumption
with Java is that once you have it as a character the program itself
should only deal with it as characters. It should only be converted to
bytes in order to send it outside of Java.

That is a fairly resonable way to do things in my opinion. In almost all
cases it is correct. There are some cases where you do want to go the
other way, but they are rare. And if they did support it, it would
probably be encouraging abuse by those that try to handle text data as
bytes which is just wrong.

This assumption also simplifies the whole character encoder/decoder
system. Consider the fact that one character can map to multiple bytes,
but the reverse is not true. One byte doesn't map to multiple characters
in any known encoding. If I am reading a byte from a "ReaderInputStream"
and one character from the String can map to multiple bytes you end up
having to have some form of buffer in between because the byte you read
may only be one of several for that first character. You would also end
up buffering in the reverse direction for a "WriterOutputStream" class
because the byte you write may not be enough to write the character. But
in the directionality supported by Java there is no need for such a
buffer. The idea of converting from characters to bytes is not as
straightforward as it seems on the surface.

In this case the reason Patricia needs it is a poorly designed class.
Since that class expects to read textual data it should support Readers.
It could in addition support InputStream (although I would mark that
support as deprecated because users should not be using it).

Mike Schilling · Sep 1, 2006

This assumption also simplifies the whole character encoder/decoder
system. Consider the fact that one character can map to multiple bytes,
but the reverse is not true. One byte doesn't map to multiple characters
in any known encoding. If I am reading a byte from a "ReaderInputStream"
and one character from the String can map to multiple bytes you end up
having to have some form of buffer in between because the byte you read
may only be one of several for that first character. You would also end up
buffering in the reverse direction for a "WriterOutputStream" class
because the byte you write may not be enough to write the character. But
in the directionality supported by Java there is no need for such a
buffer. The idea of converting from characters to bytes is not as
straightforward as it seems on the surface.

In UTF-8, a group of 4 bytes can map to two characters (when the code point
is > FFFF, and is represented in Java by a pair of 16-bit characters.) At
any rate, decoders don't decode a character at a time, as you'll note if you
check the Javadoc for CharacterDecoder; they decode an array or stream of
bytes into the appropriate characters, and there's always (at least
potentially) a buffer involved.

Chris Uppal · Sep 1, 2006

Mike Schilling wrote:

[me:]

Those are worse than InputStreamReader and OutputStreamWriter because ...
?

Well, since you ask...

They are confusingly similar to pre-existing classes.

They violate rules of English phrase construction, and class name construction
based on that.

(Incidentally, part of the problem is the non-symmetry in the existing names.
"Reader" and "InputStream" do not follow the same grammatical pattern.
While -- as it chances -- you can qualify "Reader" with "InputStream", the
reverse doesn't sit happily).

-- chris

Chris Uppal · Sep 1, 2006

Piotr Kobzda wrote:

[me:]

A pity that it doesn't throw an exception. "Suppressing"
the error is probably the best behaviour for most purposes, but it
would be nice (now I come to think of it) if we could tell a
decoding/encoding stream that we want it to be strict (just as we can
tell an actual CharserDecoder how it should treat "wrong" inputs).

Click to expand...

You can easily achieve that setting "strictness" on CharsetDecoder
produced by your desired Charset. [,,,]
InputStreamReader isr = new InputStreamReader(bais, dec);

Ah! I hadn't realised you could do that. Thanks for the suggestion.

-- chris

M.J. Dance · Sep 1, 2006

Dale said:
The essential issue here is that Java (rightly so in my opinion)
associates direction with crossing the boundary between characters and
bytes. Going from character to bytes is only supported in the output or
writing direction. The implication being that conversion between the two
is associated with an external entity (a file, a server). The assumption
with Java is that once you have it as a character the program itself
should only deal with it as characters. It should only be converted to
bytes in order to send it outside of Java.

That is a fairly resonable way to do things in my opinion. In almost all
cases it is correct. There are some cases where you do want to go the
other way, but they are rare. And if they did support it, it would
probably be encouraging abuse by those that try to handle text data as
bytes which is just wrong.

Well. There are cases where you can't do without bytes. Cryptography, digests,
signing etc. all operate on bytes. And people do want to encrypt, digest and/or
sign text (t.i. a string of characters) from time to time.

This assumption also simplifies the whole character encoder/decoder
system. Consider the fact that one character can map to multiple bytes,
but the reverse is not true. One byte doesn't map to multiple characters
in any known encoding. If I am reading a byte from a "ReaderInputStream"
and one character from the String can map to multiple bytes you end up
having to have some form of buffer in between because the byte you read
may only be one of several for that first character. You would also end
up buffering in the reverse direction for a "WriterOutputStream" class
because the byte you write may not be enough to write the character. But
in the directionality supported by Java there is no need for such a
buffer. The idea of converting from characters to bytes is not as
straightforward as it seems on the surface.

In this case the reason Patricia needs it is a poorly designed class.
Since that class expects to read textual data it should support Readers.
It could in addition support InputStream (although I would mark that
support as deprecated because users should not be using it).

Even if not all the readers/writers are wrapped around a stream, there could be
a public getInputStream(...) or getOutputStream(...). It would just throw an
OperationNotSupportedException or something.

What should I do Before I give up programming?	6	Jan 14, 2023
FOP and XSLT <xsl:import> issue	1	Apr 5, 2005
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
[ANN] vojne, a GUI and/or document composite.	0	Mar 3, 2005
I need help in understanding these files on my phone, Could someone help me understand these files? Urgent help needed. Please help.	1	Jun 4, 2023
Newbie again. "Java Keyboard input" is a failure as a google search. What isn't?	34	Jul 18, 2010
What is a "servicename" in jax-ws ? Frustrated!	1	Nov 16, 2006
process.waitFor and readLine() blocks, help please!	0	Jul 21, 2003

What replaces StringBufferInputStream

Patricia Shanahan

John W. Kennedy

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

M.J. Dance

Chris Uppal

Chris Uppal

Lasse Reichstein Nielsen

Chris Uppal

Mike Schilling

Mike Schilling

Chris Uppal

Chris Uppal

Piotr Kobzda

Mike Schilling

Mike Schilling

Dale King

Mike Schilling

Chris Uppal

Chris Uppal

M.J. Dance

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads