What replaces StringBufferInputStream

P

Patricia Shanahan

Lasse Reichstein Nielsen wrote:
....
Strings contain characters, so the most fitting sequential input to
convert it to would be a Reader.

Yes, but as far as I can tell Reader is a total dead end when the
objective is InputStream.

I got stuck, and had to ask for help, precisely because I was thinking
Reader, when I should have been taking a detour through byte arrays

Patricia
 
J

John W. Kennedy

Chris said:
Maybe that came out wrong, but taken literally it is completely false.
InputStreams are central to the IO architecture.

Only for reading and writing raw bytes. There is, so to speak, an
impedance mismatch between strings and streams, which is why, since 1.1,
strings are supposed to be processed by Reader and Writer classes.
That's why StringBufferInputStream is obsolete, producing the warning
messages that were the cause of this whole thread in the first place.
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

John said:
Only for reading and writing raw bytes. There is, so to speak, an
impedance mismatch between strings and streams, which is why, since 1.1,
strings are supposed to be processed by Reader and Writer classes.
That's why StringBufferInputStream is obsolete, producing the warning
messages that were the cause of this whole thread in the first place.

Your first post said "Java doesn't want you to use an InputStream
for anything" without the "Only for reading and writing raw bytes".

Arne
 
M

M.J. Dance

Patricia said:
Lasse Reichstein Nielsen wrote:
...

Yes, but as far as I can tell Reader is a total dead end when the
objective is InputStream.

I got stuck, and had to ask for help, precisely because I was thinking
Reader, when I should have been taking a detour through byte arrays

That seems to be the problem with this input stream <-> reader (and output
stream <-> writer, for that matter) dichotomy. Every(?) reader has an uderlying
input stream which, I imagine, wouldn't be a problem to obtain. But that would
mean inviting problems: reading from both stream and reader simultaneously could
cause "unpredictable" behaviour: a few <khm/> years ago I was trying to make a
jsp serve binary content. No problem, I thought, one can obtain an output stream
from response (implicit object, instanceof (Http)ServletResponse, available in
every jsp) and send data through there. But it (the servlet engine, that is),
said that it already called .getWriter() and that .getOutputStream() cannot be
called after that. Maybe things changed since then, but it's a good
illustration. I think.
 
C

Chris Uppal

John said:
Only for reading and writing raw bytes.

?!? Only !?!

That's hardly a trivial, obscure, or unimportant application !

(In fact, judging from what I read in this ng, many applications, or would-be
applications, of Readers or Writers are technically wrong in that the posters
are trying to treat binary data as if it were "really" text.)

-- chris
 
C

Chris Uppal

Oliver said:
Setting an explicit encoding (to me) implies that that's the actual
encoding you want to use, as opposed to you having just chosen an encoding
randomly because you didn't know which one was appropriate.

Fair point.

With luck (i.e. I haven't bothered to check) the US-ASCII decoder will signal
an error if it is fed bytes outside the [0, 127] range. If so then setting
that would be one way to be explicit about the assumption (almost certainly
correct) that I think Patricia's making.

-- chris
 
L

Lasse Reichstein Nielsen

M.J. Dance said:
That seems to be the problem with this input stream <-> reader (and
output stream <-> writer, for that matter) dichotomy. Every(?) reader
has an uderlying input stream which, I imagine, wouldn't be a problem
to obtain.

Except, e.g., StringReader.

If you are communicating between processes, or even computers, then at
some point you'll represent your data as the lowest common
denominator: bytes, but working inside a single program, you can start
out with characters and keep it that way.

There is no way to meaningfully convert a generic Writer to an
OutputStream, and it's an implementation detail whether there is an
underlying OutputStream for a given Writer, so the Writer interface
can't meaningfully expose a method for giving out an OutputStream.

On the opposite end, you shouldn't blindly convert an InputStream
to a Reader without knowing that the bytes do represent characters
in the encoding you have chosen.

/L
 
C

Chris Uppal

Lasse said:
There is no way to meaningfully convert a generic Writer to an
OutputStream, and it's an implementation detail whether there is an
underlying OutputStream for a given Writer, so the Writer interface
can't meaningfully expose a method for giving out an OutputStream.

Certainly there is. You can create an OutputStream decorator which wraps a
Writer (or an InputStream which wraps a Reader) in just the same way as an
OutputStreamWriter wraps an OutputStream. All you need is a CharacterDecoder
(or Encoder).

The java.io package lacks such a beast, but it's trivial enough to create your
own if you have a need for it (and -- as I said before -- if you can think of
an acceptable name for it).

I admit there's not an /awful/ lot of use for it though...

-- chris
 
M

Mike Schilling

Chris Uppal said:
Oliver said:
Setting an explicit encoding (to me) implies that that's the actual
encoding you want to use, as opposed to you having just chosen an
encoding
randomly because you didn't know which one was appropriate.

Fair point.

With luck (i.e. I haven't bothered to check) the US-ASCII decoder will
signal
an error if it is fed bytes outside the [0, 127] range. If so then
setting
that would be one way to be explicit about the assumption (almost
certainly
correct) that I think Patricia's making.


Not quite so much luck.

import java.io.*;

public class BadAscii
{
public static void main(String[] args) throws Exception
{
byte arr[] = { (byte)0x40, (byte)0x80};

ByteArrayInputStream bais = new ByteArrayInputStream(arr);
InputStreamReader isr = new InputStreamReader(bais, "US-ASCII");
while (true)
{
int r = isr.read();
if (r < 0)
break;
System.out.println(
(char)r + "(" + Integer.toHexString(r) + ")");
}
}
}

results in

% java -cp . BadAscii
@(40)
?(fffd)

So, no exception, but the FFFD is a clear indication that there's been a
decoding error.
 
M

Mike Schilling

Chris Uppal said:
Certainly there is. You can create an OutputStream decorator which wraps
a
Writer (or an InputStream which wraps a Reader) in just the same way as an
OutputStreamWriter wraps an OutputStream. All you need is a
CharacterDecoder
(or Encoder).

The java.io package lacks such a beast, but it's trivial enough to create
your
own if you have a need for it (and -- as I said before -- if you can think
of
an acceptable name for it).

WriterOutputStream and ReaderInputStream will do.
 
C

Chris Uppal

Mike said:
So, no exception, but the FFFD is a clear indication that there's been a
decoding error.

Thanks for checking. A pity that it doesn't throw an exception. "Suppressing"
the error is probably the best behaviour for most purposes, but it would be
nice (now I come to think of it) if we could tell a decoding/encoding stream
that we want it to be strict (just as we can tell an actual CharserDecoder how
it should treat "wrong" inputs).

-- chris
 
P

Piotr Kobzda

Chris said:
A pity that it doesn't throw an exception. "Suppressing"
the error is probably the best behaviour for most purposes, but it would be
nice (now I come to think of it) if we could tell a decoding/encoding stream
that we want it to be strict (just as we can tell an actual CharserDecoder how
it should treat "wrong" inputs).

You can easily achieve that setting "strictness" on CharsetDecoder
produced by your desired Charset.

Strictness you can express this way:

CharsetDecoder dec = Charset.forName("US-ASCII").newDecoder();
dec.onMalformedInput(CodingErrorAction.REPORT);
dec.onUnmappableCharacter(CodingErrorAction.REPORT);

Applying it to Mike's example as:

InputStreamReader isr = new InputStreamReader(bais, dec);

Should give you an expected results.


piotr
 
M

Mike Schilling

Piotr Kobzda said:
You can easily achieve that setting "strictness" on CharsetDecoder
produced by your desired Charset.

Strictness you can express this way:

CharsetDecoder dec = Charset.forName("US-ASCII").newDecoder();
dec.onMalformedInput(CodingErrorAction.REPORT);
dec.onUnmappableCharacter(CodingErrorAction.REPORT);

Applying it to Mike's example as:

InputStreamReader isr = new InputStreamReader(bais, dec);

Should give you an expected results.

Thank you; I've changed my example to use it, resulting in

import java.io.*;
import java.nio.charset.*;

public class BadAscii
{
public static void main(String[] args) throws Exception
{
byte arr[] = { (byte)0x40, (byte)0x80};

CharsetDecoder dec = Charset.forName("US-ASCII").newDecoder();
dec.onMalformedInput(CodingErrorAction.REPORT);
dec.onUnmappableCharacter(CodingErrorAction.REPORT);

ByteArrayInputStream bais = new ByteArrayInputStream(arr);
InputStreamReader isr = new InputStreamReader(bais, dec);
while (true)
{
int r = isr.read();
if (r < 0)
break;
System.out.println(
(char)r + "(" + Integer.toHexString(r) + ")");

}
}
}

and it now produces

Exception in thread "main" java.nio.charset.MalformedInputException: Input
length = 1
at java.nio.charset.CoderResult.throwException(CoderResult.java:260)
at
sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:463)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:182)
at sun.nio.cs.StreamDecoder.read0(StreamDecoder.java:131)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:117)
at java.io.InputStreamReader.read(InputStreamReader.java:151)
at BadAscii.main(BadAscii.java:18)

By the way, you can observe that the decoder tries to process the entire
array at once, since the first character, which is legitimate ASCII, is
never returned.
 
D

Dale King

M.J. Dance said:
There is no proper replacement. The line of code above is mixing two
superficially similar but inherently different things: bytes and chars.
Of course the two are related but, in order to fully describe that
relatinship, one needs additional information: character encoding.
Having that, one can getBytes() from a String and, using those, create a
ByteArrayInputStream.


The essential issue here is that Java (rightly so in my opinion)
associates direction with crossing the boundary between characters and
bytes. Going from character to bytes is only supported in the output or
writing direction. The implication being that conversion between the two
is associated with an external entity (a file, a server). The assumption
with Java is that once you have it as a character the program itself
should only deal with it as characters. It should only be converted to
bytes in order to send it outside of Java.

That is a fairly resonable way to do things in my opinion. In almost all
cases it is correct. There are some cases where you do want to go the
other way, but they are rare. And if they did support it, it would
probably be encouraging abuse by those that try to handle text data as
bytes which is just wrong.

This assumption also simplifies the whole character encoder/decoder
system. Consider the fact that one character can map to multiple bytes,
but the reverse is not true. One byte doesn't map to multiple characters
in any known encoding. If I am reading a byte from a "ReaderInputStream"
and one character from the String can map to multiple bytes you end up
having to have some form of buffer in between because the byte you read
may only be one of several for that first character. You would also end
up buffering in the reverse direction for a "WriterOutputStream" class
because the byte you write may not be enough to write the character. But
in the directionality supported by Java there is no need for such a
buffer. The idea of converting from characters to bytes is not as
straightforward as it seems on the surface.

In this case the reason Patricia needs it is a poorly designed class.
Since that class expects to read textual data it should support Readers.
It could in addition support InputStream (although I would mark that
support as deprecated because users should not be using it).
 
M

Mike Schilling

This assumption also simplifies the whole character encoder/decoder
system. Consider the fact that one character can map to multiple bytes,
but the reverse is not true. One byte doesn't map to multiple characters
in any known encoding. If I am reading a byte from a "ReaderInputStream"
and one character from the String can map to multiple bytes you end up
having to have some form of buffer in between because the byte you read
may only be one of several for that first character. You would also end up
buffering in the reverse direction for a "WriterOutputStream" class
because the byte you write may not be enough to write the character. But
in the directionality supported by Java there is no need for such a
buffer. The idea of converting from characters to bytes is not as
straightforward as it seems on the surface.

In UTF-8, a group of 4 bytes can map to two characters (when the code point
is > FFFF, and is represented in Java by a pair of 16-bit characters.) At
any rate, decoders don't decode a character at a time, as you'll note if you
check the Javadoc for CharacterDecoder; they decode an array or stream of
bytes into the appropriate characters, and there's always (at least
potentially) a buffer involved.
 
C

Chris Uppal

Mike Schilling wrote:

[me:]
Those are worse than InputStreamReader and OutputStreamWriter because ...
? :)

Well, since you ask...

They are confusingly similar to pre-existing classes.

They violate rules of English phrase construction, and class name construction
based on that.

(Incidentally, part of the problem is the non-symmetry in the existing names.
"Reader" and "InputStream" do not follow the same grammatical pattern.
While -- as it chances -- you can qualify "Reader" with "InputStream", the
reverse doesn't sit happily).

-- chris
 
C

Chris Uppal

Piotr Kobzda wrote:

[me:]
A pity that it doesn't throw an exception. "Suppressing"
the error is probably the best behaviour for most purposes, but it
would be nice (now I come to think of it) if we could tell a
decoding/encoding stream that we want it to be strict (just as we can
tell an actual CharserDecoder how it should treat "wrong" inputs).

You can easily achieve that setting "strictness" on CharsetDecoder
produced by your desired Charset. [,,,]
InputStreamReader isr = new InputStreamReader(bais, dec);

Ah! I hadn't realised you could do that. Thanks for the suggestion.

-- chris
 
M

M.J. Dance

Dale said:
The essential issue here is that Java (rightly so in my opinion)
associates direction with crossing the boundary between characters and
bytes. Going from character to bytes is only supported in the output or
writing direction. The implication being that conversion between the two
is associated with an external entity (a file, a server). The assumption
with Java is that once you have it as a character the program itself
should only deal with it as characters. It should only be converted to
bytes in order to send it outside of Java.

That is a fairly resonable way to do things in my opinion. In almost all
cases it is correct. There are some cases where you do want to go the
other way, but they are rare. And if they did support it, it would
probably be encouraging abuse by those that try to handle text data as
bytes which is just wrong.

Well. There are cases where you can't do without bytes. Cryptography, digests,
signing etc. all operate on bytes. And people do want to encrypt, digest and/or
sign text (t.i. a string of characters) from time to time.
This assumption also simplifies the whole character encoder/decoder
system. Consider the fact that one character can map to multiple bytes,
but the reverse is not true. One byte doesn't map to multiple characters
in any known encoding. If I am reading a byte from a "ReaderInputStream"
and one character from the String can map to multiple bytes you end up
having to have some form of buffer in between because the byte you read
may only be one of several for that first character. You would also end
up buffering in the reverse direction for a "WriterOutputStream" class
because the byte you write may not be enough to write the character. But
in the directionality supported by Java there is no need for such a
buffer. The idea of converting from characters to bytes is not as
straightforward as it seems on the surface.

In this case the reason Patricia needs it is a poorly designed class.
Since that class expects to read textual data it should support Readers.
It could in addition support InputStream (although I would mark that
support as deprecated because users should not be using it).

Even if not all the readers/writers are wrapped around a stream, there could be
a public getInputStream(...) or getOutputStream(...). It would just throw an
OperationNotSupportedException or something.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top