Java sockets and readLine

K

kahiga

I have a concept java question on how java is able to return from the
blocking readLine() after reading in a line ending in \n (*nix) or \r\n
(Win) or \r (Mac); Specifically for an inputstream coming from the
network (socket). When you create a socket connection to another
Computer running UNIX, MAC or Windows how does java know what the line
separator character is.

The basic analysis I can think would be that once you call readLine()
which is a blocking IO process, the jvm simply keeps reading in
incoming characters from the network until it detects a \r\n (for
windows) and then returns with the string.
Now if the computer is a Mac with uses \r for EOL, how does java know
to stop waiting for the \n and just return the string, or am I
misunderstanding the process?

Any Ideas are welcomed.
 
M

Mike Schilling

kahiga said:
I have a concept java question on how java is able to return from the
blocking readLine() after reading in a line ending in \n (*nix) or \r\n
(Win) or \r (Mac); Specifically for an inputstream coming from the
network (socket). When you create a socket connection to another
Computer running UNIX, MAC or Windows how does java know what the line
separator character is.

The basic analysis I can think would be that once you call readLine()
which is a blocking IO process, the jvm simply keeps reading in
incoming characters from the network until it detects a \r\n (for
windows) and then returns with the string.
Now if the computer is a Mac with uses \r for EOL, how does java know
to stop waiting for the \n and just return the string, or am I
misunderstanding the process?

Any Ideas are welcomed.

Any socket-to-socket communications protocol has to be defined at the byte
level. HTTP header lines, for instance, are defined always to end with CRLF
regardless of which platform is being used. You're right that doing
writeLine()s on a Mac expecting to be able to read the result with
readLine()s on Windows won't work. Don't do that.
 
R

Roedy Green

When you create a socket connection to another
Computer running UNIX, MAC or Windows how does java know what the line
separator character is.

readlin seems to be smart and works no matter what the lines separator
is . You can experiment with a file with different line separator and
it reads them all fine.
 
K

kahiga

I see you're point about using a predefined protocol communication but
I was thinking at a simpler level e.g. a simple java server with a
telnet client whose only purpose (the server) is to echo any line it
receives from the client. Obviously this is possible and it shouldn't
matter what platform the client is running on.
You're right that doing writeLine()s on a Mac expecting to be able to read
the result with readLine()s on Windows won't work.
Why? Seems like it should, otherwise wouldn't it break the WORA
principle if the code has to be platform specific?

I think I figured out what java is doing with readLine().

Case 1 - Line ends with a \n only.
Keep reading characters from network and when you see a \n, return the
buffered characters as a string.

Case2 - Line ends with a \r only.
Keep reading characters from network and when you see a \r, return the
buffered characters as a string.

Case3 - Line ends with a \r\n.
Keep reading characters from network and when you see a \r, return the
buffered characters as a string. If the next character read from the
network in \n, discard it.
 
P

Pete Barrett

I have a concept java question on how java is able to return from the
blocking readLine() after reading in a line ending in \n (*nix) or \r\n
(Win) or \r (Mac); Specifically for an inputstream coming from the
network (socket). When you create a socket connection to another
Computer running UNIX, MAC or Windows how does java know what the line
separator character is.

The basic analysis I can think would be that once you call readLine()
which is a blocking IO process, the jvm simply keeps reading in
incoming characters from the network until it detects a \r\n (for
windows) and then returns with the string.
Now if the computer is a Mac with uses \r for EOL, how does java know
to stop waiting for the \n and just return the string, or am I
misunderstanding the process?

Any Ideas are welcomed.

The documentation for BufferedReader says:

"Read a line of text. A line is considered to be terminated by any one
of a line feed ('\n'), a carriage return ('\r'), or a carriage return
followed immediately by a linefeed."

That seems fairly clear. Since the input is buffered, it can afford to
look ahead to the next character if it reads a carriage return.


Pete Barrett
 
T

Thomas Hawtin

kahiga said:
I see you're point about using a predefined protocol communication but
I was thinking at a simpler level e.g. a simple java server with a
telnet client whose only purpose (the server) is to echo any line it
receives from the client. Obviously this is possible and it shouldn't
matter what platform the client is running on.

Telnet is a predefined protocol. It defines a Network Virtual Terminal
(NVT). Unless the BINARY option is negotiated, the default end of line
is CR LF. Other protocols use a similar convention.

http://www.ietf.org/rfc/rfc0854.txt

TCP itself really does just give you octet streams.

Tom Hawtin
 
K

kahiga

Telnet is a predefined protocol. It defines a Network Virtual Terminal
(NVT). Unless the BINARY option is negotiated, the default end of line
is CR LF. Other protocols use a similar convention.

http://www.ietf.org/rfc/rfc0854.txt

True indeed (Wasn't aware of the rfc). I also found this article that
gave a more user friendly description of the "Telnet EOL convention":
http://www.freesoft.org/CIE/RFC/1123/31.htm.

I guess my example of a client was flawed, but what I was trying to
specify was a client using a non-predefined protocol. Maybe a better
example would be a custom java client that only sends lines of text to
the server and the server locally echo's each line of text while
using readLine() to read the text from the client.

I created a sample java client and sent 3 lines ending in different
EOL's:
"Hello world\r"
"Hello world\n"
"Hello world\r\n"
And the server was able to read all these lines correctly using
readLine().
 
K

kahiga

The documentation for BufferedReader says:
"Read a line of text. A line is considered to be terminated by any one
of a line feed ('\n'), a carriage return ('\r'), or a carriage return
followed immediately by a linefeed."

That seems fairly clear. Since the input is buffered, it can afford to
look ahead to the next character if it reads a carriage return.

This might be fine for files and for the local cases it may also use
the info from the <line.separator> to determine the EOL format.
However, in the case of the network, the server cannot "look ahead" to
see the next character. It has to wait for the client to send it. My
original case was this; if the client so far has sent, for example,
"Hello world\r" and the server is blocking on the readLine() method.
How does it know to return the current string and not keep waiting to
receive the next "\n".
 
C

Chris Uppal

kahiga said:
if the client so far has sent, for example,
"Hello world\r" and the server is blocking on the readLine() method.
How does it know to return the current string and not keep waiting to
receive the next "\n".

It doesn't. That's why protocols (including ones you create yourself) should
specify exactly what gets written on the wire, and why platform-specific
shortcuts like println() should not be used in their implementation.

-- chris
 
S

Steve Horsley

kahiga said:
I see you're point about using a predefined protocol communication but
I was thinking at a simpler level e.g. a simple java server with a
telnet client whose only purpose (the server) is to echo any line it
receives from the client. Obviously this is possible and it shouldn't
matter what platform the client is running on.

The telnet RFC specifically says that Carriage Return '\r' MUST
be followed by either NewLine '\n' or Null 0x00, depending on
whether a line feed action is required in addition to the
carriage return. A CR-NULL implies that the current line will be
overwritten by the following line (or overtyped if printing).

In addition, telnet can carry escape sequences that do things
like turn echo on/off and query the terminal type. So your eco
server will probably work in the sense that people would see what
they typed, but would not be a "proper" telnet implementation.

Why? Seems like it should, otherwise wouldn't it break the WORA
principle if the code has to be platform specific?

I think I figured out what java is doing with readLine().

Case 1 - Line ends with a \n only.
Keep reading characters from network and when you see a \n, return the
buffered characters as a string.

Case2 - Line ends with a \r only.
Keep reading characters from network and when you see a \r, return the
buffered characters as a string.

Case3 - Line ends with a \r\n.
Keep reading characters from network and when you see a \r, return the
buffered characters as a string. If the next character read from the
network in \n, discard it.
I think you are right - this describes readline(). Case 3 is
really case 2 in disguise. All you need is a rule that says to
drop a '\n' if it immediately follows a '\r'.

Steve
 
S

Steve Horsley

kahiga said:
This might be fine for files and for the local cases it may also use
the info from the <line.separator> to determine the EOL format.
However, in the case of the network, the server cannot "look ahead" to
see the next character. It has to wait for the client to send it. My
original case was this; if the client so far has sent, for example,
"Hello world\r" and the server is blocking on the readLine() method.
How does it know to return the current string and not keep waiting to
receive the next "\n".

It can afford to return as soon as it sees the '\r'. It just has
to make a note that next time it is called, if the first
character out is a '\n' then this should be dropped.

Steve
 
R

Roedy Green

The telnet RFC specifically says that Carriage Return '\r' MUST
be followed by either NewLine '\n' or Null 0x00, depending on
whether a line feed action is required in addition to the
carriage return. A CR-NULL implies that the current line will be
overwritten by the following line (or overtyped if printing).

That shows you how old the protocol must be. The null gives
additional time for the mechanical tty head to return to the left hand
side of the page.
 
R

Raymond DeCampo

Steve said:
It can afford to return as soon as it sees the '\r'. It just has to make
a note that next time it is called, if the first character out is a '\n'
then this should be dropped.

Exactly. There is not actually a need to "look ahead" at all.


Ray
 
P

Pete Barrett

This might be fine for files and for the local cases it may also use
the info from the <line.separator> to determine the EOL format.
However, in the case of the network, the server cannot "look ahead" to
see the next character. It has to wait for the client to send it. My
original case was this; if the client so far has sent, for example,
"Hello world\r" and the server is blocking on the readLine() method.
How does it know to return the current string and not keep waiting to
receive the next "\n".

I don't think there's anything in the documentation to say that
readLine MUST return as soons as the \r character is received? It
*could* wait until it can be sure, either because the next character
has been actually been received or the socket has closed, whether
there's a \n to follow the \r. But that would be an implementation
detail, and others have suggested a better way of dealing with it.

As far as I can see, a worse problem arises in BufferedReaderS if the
buffer is full and doesn't contain either \r or \n - what on earth
does readLine do then? I don't see anything in the documentation to
define what it does. If it doesn't expand the buffer, it can only
return the contents of the buffer as a String, which would hardly be
right.


Pete Barrett
 
R

Raymond DeCampo

Pete said:
I don't think there's anything in the documentation to say that
readLine MUST return as soons as the \r character is received? It
*could* wait until it can be sure, either because the next character
has been actually been received or the socket has closed, whether
there's a \n to follow the \r. But that would be an implementation
detail, and others have suggested a better way of dealing with it.

As far as I can see, a worse problem arises in BufferedReaderS if the
buffer is full and doesn't contain either \r or \n - what on earth
does readLine do then? I don't see anything in the documentation to
define what it does. If it doesn't expand the buffer, it can only
return the contents of the buffer as a String, which would hardly be
right.

There's no need to expand the buffer, as in the buffer holding
characters yet to be read. BufferedReader can simply treat itself as a
client; once the readLine() method reads a character from the buffer
that space is available to receive characters from the underlying
stream. This means that there is a second buffer, in the form of a
StringBuffer or StringBuilder, which is local to readLine() and is
creating the String to be returned.

Note: This is all speculation, I haven't looked at the implementation of
readLine().

Ray
 
R

Roedy Green

Note: This is all speculation, I haven't looked at the implementation of
readLine().

here is the main method in BufferedReader.readLine . It does not
return until it has hit EOL.

/**
* Read a line of text. A line is considered to be terminated by
any one
* of a line feed ('\n'), a carriage return ('\r'), or a carriage
return
* followed immediately by a linefeed.
*
* @param ignoreLF If true, the next '\n' will be skipped
*
* @return A String containing the contents of the line, not
including
* any line-termination characters, or null if the end
of the
* stream has been reached
*
* @see java.io.LineNumberReader#readLine()
*
* @exception IOException If an I/O error occurs
*/
String readLine(boolean ignoreLF) throws IOException {
StringBuffer s = null;
int startChar;
boolean omitLF = ignoreLF || skipLF;

synchronized (lock) {
ensureOpen();

bufferLoop:
for (;;) {

if (nextChar >= nChars)
fill();
if (nextChar >= nChars) { /* EOF */
if (s != null && s.length() > 0)
return s.toString();
else
return null;
}
boolean eol = false;
char c = 0;
int i;

/* Skip a leftover '\n', if necessary */
if (omitLF && (cb[nextChar] == '\n'))
nextChar++;
skipLF = false;
omitLF = false;

charLoop:
for (i = nextChar; i < nChars; i++) {
c = cb;
if ((c == '\n') || (c == '\r')) {
eol = true;
break charLoop;
}
}

startChar = nextChar;
nextChar = i;

if (eol) {
String str;
if (s == null) {
str = new String(cb, startChar, i -
startChar);
} else {
s.append(cb, startChar, i - startChar);
str = s.toString();
}
nextChar++;
if (c == '\r') {
skipLF = true;
}
return str;
}

if (s == null)
s = new StringBuffer(defaultExpectedLineLength);
s.append(cb, startChar, i - startChar);
}
}
}


/**
* Fill the input buffer, taking the mark into account if it is
valid.
*/
private void fill() throws IOException {
int dst;
if (markedChar <= UNMARKED) {
/* No mark */
dst = 0;
} else {
/* Marked */
int delta = nextChar - markedChar;
if (delta >= readAheadLimit) {
/* Gone past read-ahead limit: Invalidate mark */
markedChar = INVALIDATED;
readAheadLimit = 0;
dst = 0;
} else {
if (readAheadLimit <= cb.length) {
/* Shuffle in the current buffer */
System.arraycopy(cb, markedChar, cb, 0, delta);
markedChar = 0;
dst = delta;
} else {
/* Reallocate buffer to accommodate read-ahead
limit */
char ncb[] = new char[readAheadLimit];
System.arraycopy(cb, markedChar, ncb, 0, delta);
cb = ncb;
markedChar = 0;
dst = delta;
}
nextChar = nChars = delta;
}
}

int n;
do {
n = in.read(cb, dst, cb.length - dst);
} while (n == 0);
if (n > 0) {
nChars = dst + n;
nextChar = dst;
}
}
 
S

Scott Ellsworth

Steve Horsley said:
It can afford to return as soon as it sees the '\r'. It just has
to make a note that next time it is called, if the first
character out is a '\n' then this should be dropped.

This is certainly one way it _could_ be implemented, but earlier
versions of Java were not implemented this way. It was quite common to
see server software written that did a readLine() on a socket that
failed when run on a Mac, but that worked great on Windows.

Scott
 
M

Mike Schilling

kahiga said:
I see you're point about using a predefined protocol communication but
I was thinking at a simpler level e.g. a simple java server with a
telnet client whose only purpose (the server) is to echo any line it
receives from the client. Obviously this is possible and it shouldn't
matter what platform the client is running on.

Why? Seems like it should, otherwise wouldn't it break the WORA
principle if the code has to be platform specific?

I think it's been explained why it won't. The WORA principle is an ideal,
not an absolute. Java creates an abstraction layer, and so long as you can
stay within that layer, WORA works reasonably well. Reading bytes from a
socket lives outside that layer, just as reading raw bytes from the a disk
would.
 
M

Mike Schilling

Scott Ellsworth said:
This is certainly one way it _could_ be implemented, but earlier
versions of Java were not implemented this way. It was quite common to
see server software written that did a readLine() on a socket that
failed when run on a Mac, but that worked great on Windows.

I'm sorry to hear it was common to see server software written that did a
readLine().

One of the drawbacks of Java is that it provides a surface simplicity that
can disguise complex issues. It can fool people into thinking that building
a multi-threaded server is as simple:as scattering some 'synchronized's
around, or that persistence can be addressed merely by declaring that some
classes implement Serializeable.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top