Java sockets and readLine

Discussion in 'Java' started by kahiga, Sep 17, 2005.

  1. kahiga

    kahiga Guest

    I have a concept java question on how java is able to return from the
    blocking readLine() after reading in a line ending in \n (*nix) or \r\n
    (Win) or \r (Mac); Specifically for an inputstream coming from the
    network (socket). When you create a socket connection to another
    Computer running UNIX, MAC or Windows how does java know what the line
    separator character is.

    The basic analysis I can think would be that once you call readLine()
    which is a blocking IO process, the jvm simply keeps reading in
    incoming characters from the network until it detects a \r\n (for
    windows) and then returns with the string.
    Now if the computer is a Mac with uses \r for EOL, how does java know
    to stop waiting for the \n and just return the string, or am I
    misunderstanding the process?

    Any Ideas are welcomed.
    kahiga, Sep 17, 2005
    #1
    1. Advertising

  2. "kahiga" <> wrote in message
    news:...
    >I have a concept java question on how java is able to return from the
    > blocking readLine() after reading in a line ending in \n (*nix) or \r\n
    > (Win) or \r (Mac); Specifically for an inputstream coming from the
    > network (socket). When you create a socket connection to another
    > Computer running UNIX, MAC or Windows how does java know what the line
    > separator character is.
    >
    > The basic analysis I can think would be that once you call readLine()
    > which is a blocking IO process, the jvm simply keeps reading in
    > incoming characters from the network until it detects a \r\n (for
    > windows) and then returns with the string.
    > Now if the computer is a Mac with uses \r for EOL, how does java know
    > to stop waiting for the \n and just return the string, or am I
    > misunderstanding the process?
    >
    > Any Ideas are welcomed.


    Any socket-to-socket communications protocol has to be defined at the byte
    level. HTTP header lines, for instance, are defined always to end with CRLF
    regardless of which platform is being used. You're right that doing
    writeLine()s on a Mac expecting to be able to read the result with
    readLine()s on Windows won't work. Don't do that.
    Mike Schilling, Sep 17, 2005
    #2
    1. Advertising

  3. kahiga

    Roedy Green Guest

    On 16 Sep 2005 16:13:43 -0700, "kahiga" <> wrote
    or quoted :

    >When you create a socket connection to another
    >Computer running UNIX, MAC or Windows how does java know what the line
    >separator character is.


    readlin seems to be smart and works no matter what the lines separator
    is . You can experiment with a file with different line separator and
    it reads them all fine.
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Again taking new Java programming contracts.
    Roedy Green, Sep 17, 2005
    #3
  4. kahiga

    kahiga Guest

    I see you're point about using a predefined protocol communication but
    I was thinking at a simpler level e.g. a simple java server with a
    telnet client whose only purpose (the server) is to echo any line it
    receives from the client. Obviously this is possible and it shouldn't
    matter what platform the client is running on.

    >You're right that doing writeLine()s on a Mac expecting to be able to read
    >the result with readLine()s on Windows won't work.

    Why? Seems like it should, otherwise wouldn't it break the WORA
    principle if the code has to be platform specific?

    I think I figured out what java is doing with readLine().

    Case 1 - Line ends with a \n only.
    Keep reading characters from network and when you see a \n, return the
    buffered characters as a string.

    Case2 - Line ends with a \r only.
    Keep reading characters from network and when you see a \r, return the
    buffered characters as a string.

    Case3 - Line ends with a \r\n.
    Keep reading characters from network and when you see a \r, return the
    buffered characters as a string. If the next character read from the
    network in \n, discard it.
    kahiga, Sep 17, 2005
    #4
  5. kahiga

    Pete Barrett Guest

    On 16 Sep 2005 16:13:43 -0700, "kahiga" <> wrote:

    >I have a concept java question on how java is able to return from the
    >blocking readLine() after reading in a line ending in \n (*nix) or \r\n
    >(Win) or \r (Mac); Specifically for an inputstream coming from the
    >network (socket). When you create a socket connection to another
    >Computer running UNIX, MAC or Windows how does java know what the line
    >separator character is.
    >
    >The basic analysis I can think would be that once you call readLine()
    >which is a blocking IO process, the jvm simply keeps reading in
    >incoming characters from the network until it detects a \r\n (for
    >windows) and then returns with the string.
    >Now if the computer is a Mac with uses \r for EOL, how does java know
    >to stop waiting for the \n and just return the string, or am I
    >misunderstanding the process?
    >
    >Any Ideas are welcomed.


    The documentation for BufferedReader says:

    "Read a line of text. A line is considered to be terminated by any one
    of a line feed ('\n'), a carriage return ('\r'), or a carriage return
    followed immediately by a linefeed."

    That seems fairly clear. Since the input is buffered, it can afford to
    look ahead to the next character if it reads a carriage return.


    Pete Barrett
    Pete Barrett, Sep 17, 2005
    #5
  6. kahiga wrote:
    > I see you're point about using a predefined protocol communication but
    > I was thinking at a simpler level e.g. a simple java server with a
    > telnet client whose only purpose (the server) is to echo any line it
    > receives from the client. Obviously this is possible and it shouldn't
    > matter what platform the client is running on.


    Telnet is a predefined protocol. It defines a Network Virtual Terminal
    (NVT). Unless the BINARY option is negotiated, the default end of line
    is CR LF. Other protocols use a similar convention.

    http://www.ietf.org/rfc/rfc0854.txt

    TCP itself really does just give you octet streams.

    Tom Hawtin
    --
    Unemployed English Java programmer
    http://jroller.com/page/tackline/
    Thomas Hawtin, Sep 17, 2005
    #6
  7. kahiga

    kahiga Guest

    > Telnet is a predefined protocol. It defines a Network Virtual Terminal
    > (NVT). Unless the BINARY option is negotiated, the default end of line
    > is CR LF. Other protocols use a similar convention.
    >
    > http://www.ietf.org/rfc/rfc0854.txt


    True indeed (Wasn't aware of the rfc). I also found this article that
    gave a more user friendly description of the "Telnet EOL convention":
    http://www.freesoft.org/CIE/RFC/1123/31.htm.

    I guess my example of a client was flawed, but what I was trying to
    specify was a client using a non-predefined protocol. Maybe a better
    example would be a custom java client that only sends lines of text to
    the server and the server locally echo's each line of text while
    using readLine() to read the text from the client.

    I created a sample java client and sent 3 lines ending in different
    EOL's:
    "Hello world\r"
    "Hello world\n"
    "Hello world\r\n"
    And the server was able to read all these lines correctly using
    readLine().
    kahiga, Sep 17, 2005
    #7
  8. kahiga

    kahiga Guest

    > The documentation for BufferedReader says:
    >
    > "Read a line of text. A line is considered to be terminated by any one
    > of a line feed ('\n'), a carriage return ('\r'), or a carriage return
    > followed immediately by a linefeed."
    >
    > That seems fairly clear. Since the input is buffered, it can afford to
    > look ahead to the next character if it reads a carriage return.


    This might be fine for files and for the local cases it may also use
    the info from the <line.separator> to determine the EOL format.
    However, in the case of the network, the server cannot "look ahead" to
    see the next character. It has to wait for the client to send it. My
    original case was this; if the client so far has sent, for example,
    "Hello world\r" and the server is blocking on the readLine() method.
    How does it know to return the current string and not keep waiting to
    receive the next "\n".
    kahiga, Sep 17, 2005
    #8
  9. kahiga

    Chris Uppal Guest

    kahiga wrote:

    > if the client so far has sent, for example,
    > "Hello world\r" and the server is blocking on the readLine() method.
    > How does it know to return the current string and not keep waiting to
    > receive the next "\n".


    It doesn't. That's why protocols (including ones you create yourself) should
    specify exactly what gets written on the wire, and why platform-specific
    shortcuts like println() should not be used in their implementation.

    -- chris
    Chris Uppal, Sep 17, 2005
    #9
  10. kahiga wrote:
    > I see you're point about using a predefined protocol communication but
    > I was thinking at a simpler level e.g. a simple java server with a
    > telnet client whose only purpose (the server) is to echo any line it
    > receives from the client. Obviously this is possible and it shouldn't
    > matter what platform the client is running on.
    >


    The telnet RFC specifically says that Carriage Return '\r' MUST
    be followed by either NewLine '\n' or Null 0x00, depending on
    whether a line feed action is required in addition to the
    carriage return. A CR-NULL implies that the current line will be
    overwritten by the following line (or overtyped if printing).

    In addition, telnet can carry escape sequences that do things
    like turn echo on/off and query the terminal type. So your eco
    server will probably work in the sense that people would see what
    they typed, but would not be a "proper" telnet implementation.


    >> You're right that doing writeLine()s on a Mac expecting to be able to read
    >> the result with readLine()s on Windows won't work.

    > Why? Seems like it should, otherwise wouldn't it break the WORA
    > principle if the code has to be platform specific?
    >
    > I think I figured out what java is doing with readLine().
    >
    > Case 1 - Line ends with a \n only.
    > Keep reading characters from network and when you see a \n, return the
    > buffered characters as a string.
    >
    > Case2 - Line ends with a \r only.
    > Keep reading characters from network and when you see a \r, return the
    > buffered characters as a string.
    >
    > Case3 - Line ends with a \r\n.
    > Keep reading characters from network and when you see a \r, return the
    > buffered characters as a string. If the next character read from the
    > network in \n, discard it.
    >

    I think you are right - this describes readline(). Case 3 is
    really case 2 in disguise. All you need is a rule that says to
    drop a '\n' if it immediately follows a '\r'.

    Steve
    Steve Horsley, Sep 17, 2005
    #10
  11. kahiga wrote:
    >
    > This might be fine for files and for the local cases it may also use
    > the info from the <line.separator> to determine the EOL format.
    > However, in the case of the network, the server cannot "look ahead" to
    > see the next character. It has to wait for the client to send it. My
    > original case was this; if the client so far has sent, for example,
    > "Hello world\r" and the server is blocking on the readLine() method.
    > How does it know to return the current string and not keep waiting to
    > receive the next "\n".
    >


    It can afford to return as soon as it sees the '\r'. It just has
    to make a note that next time it is called, if the first
    character out is a '\n' then this should be dropped.

    Steve
    Steve Horsley, Sep 17, 2005
    #11
  12. kahiga

    Roedy Green Guest

    On Sat, 17 Sep 2005 13:08:44 +0100, Steve Horsley
    <> wrote or quoted :

    >The telnet RFC specifically says that Carriage Return '\r' MUST
    >be followed by either NewLine '\n' or Null 0x00, depending on
    >whether a line feed action is required in addition to the
    >carriage return. A CR-NULL implies that the current line will be
    >overwritten by the following line (or overtyped if printing).


    That shows you how old the protocol must be. The null gives
    additional time for the mechanical tty head to return to the left hand
    side of the page.
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Again taking new Java programming contracts.
    Roedy Green, Sep 17, 2005
    #12
  13. Steve Horsley wrote:
    > kahiga wrote:
    >
    >>
    >> This might be fine for files and for the local cases it may also use
    >> the info from the <line.separator> to determine the EOL format.
    >> However, in the case of the network, the server cannot "look ahead" to
    >> see the next character. It has to wait for the client to send it. My
    >> original case was this; if the client so far has sent, for example,
    >> "Hello world\r" and the server is blocking on the readLine() method.
    >> How does it know to return the current string and not keep waiting to
    >> receive the next "\n".
    >>

    >
    > It can afford to return as soon as it sees the '\r'. It just has to make
    > a note that next time it is called, if the first character out is a '\n'
    > then this should be dropped.
    >


    Exactly. There is not actually a need to "look ahead" at all.


    Ray

    --
    XML is the programmer's duct tape.
    Raymond DeCampo, Sep 17, 2005
    #13
  14. kahiga

    Pete Barrett Guest

    On 17 Sep 2005 03:04:44 -0700, "kahiga" <> wrote:

    >This might be fine for files and for the local cases it may also use
    >the info from the <line.separator> to determine the EOL format.
    >However, in the case of the network, the server cannot "look ahead" to
    >see the next character. It has to wait for the client to send it. My
    >original case was this; if the client so far has sent, for example,
    >"Hello world\r" and the server is blocking on the readLine() method.
    >How does it know to return the current string and not keep waiting to
    >receive the next "\n".


    I don't think there's anything in the documentation to say that
    readLine MUST return as soons as the \r character is received? It
    *could* wait until it can be sure, either because the next character
    has been actually been received or the socket has closed, whether
    there's a \n to follow the \r. But that would be an implementation
    detail, and others have suggested a better way of dealing with it.

    As far as I can see, a worse problem arises in BufferedReaderS if the
    buffer is full and doesn't contain either \r or \n - what on earth
    does readLine do then? I don't see anything in the documentation to
    define what it does. If it doesn't expand the buffer, it can only
    return the contents of the buffer as a String, which would hardly be
    right.


    Pete Barrett
    Pete Barrett, Sep 18, 2005
    #14
  15. Pete Barrett wrote:
    > On 17 Sep 2005 03:04:44 -0700, "kahiga" <> wrote:
    >
    >
    >>This might be fine for files and for the local cases it may also use
    >>the info from the <line.separator> to determine the EOL format.
    >>However, in the case of the network, the server cannot "look ahead" to
    >>see the next character. It has to wait for the client to send it. My
    >>original case was this; if the client so far has sent, for example,
    >>"Hello world\r" and the server is blocking on the readLine() method.
    >>How does it know to return the current string and not keep waiting to
    >>receive the next "\n".

    >
    >
    > I don't think there's anything in the documentation to say that
    > readLine MUST return as soons as the \r character is received? It
    > *could* wait until it can be sure, either because the next character
    > has been actually been received or the socket has closed, whether
    > there's a \n to follow the \r. But that would be an implementation
    > detail, and others have suggested a better way of dealing with it.
    >
    > As far as I can see, a worse problem arises in BufferedReaderS if the
    > buffer is full and doesn't contain either \r or \n - what on earth
    > does readLine do then? I don't see anything in the documentation to
    > define what it does. If it doesn't expand the buffer, it can only
    > return the contents of the buffer as a String, which would hardly be
    > right.
    >


    There's no need to expand the buffer, as in the buffer holding
    characters yet to be read. BufferedReader can simply treat itself as a
    client; once the readLine() method reads a character from the buffer
    that space is available to receive characters from the underlying
    stream. This means that there is a second buffer, in the form of a
    StringBuffer or StringBuilder, which is local to readLine() and is
    creating the String to be returned.

    Note: This is all speculation, I haven't looked at the implementation of
    readLine().

    Ray

    --
    XML is the programmer's duct tape.
    Raymond DeCampo, Sep 18, 2005
    #15
  16. kahiga

    Roedy Green Guest

    On Sun, 18 Sep 2005 13:17:34 GMT, Raymond DeCampo
    <> wrote or quoted :

    >Note: This is all speculation, I haven't looked at the implementation of
    >readLine().


    here is the main method in BufferedReader.readLine . It does not
    return until it has hit EOL.

    /**
    * Read a line of text. A line is considered to be terminated by
    any one
    * of a line feed ('\n'), a carriage return ('\r'), or a carriage
    return
    * followed immediately by a linefeed.
    *
    * @param ignoreLF If true, the next '\n' will be skipped
    *
    * @return A String containing the contents of the line, not
    including
    * any line-termination characters, or null if the end
    of the
    * stream has been reached
    *
    * @see java.io.LineNumberReader#readLine()
    *
    * @exception IOException If an I/O error occurs
    */
    String readLine(boolean ignoreLF) throws IOException {
    StringBuffer s = null;
    int startChar;
    boolean omitLF = ignoreLF || skipLF;

    synchronized (lock) {
    ensureOpen();

    bufferLoop:
    for (;;) {

    if (nextChar >= nChars)
    fill();
    if (nextChar >= nChars) { /* EOF */
    if (s != null && s.length() > 0)
    return s.toString();
    else
    return null;
    }
    boolean eol = false;
    char c = 0;
    int i;

    /* Skip a leftover '\n', if necessary */
    if (omitLF && (cb[nextChar] == '\n'))
    nextChar++;
    skipLF = false;
    omitLF = false;

    charLoop:
    for (i = nextChar; i < nChars; i++) {
    c = cb;
    if ((c == '\n') || (c == '\r')) {
    eol = true;
    break charLoop;
    }
    }

    startChar = nextChar;
    nextChar = i;

    if (eol) {
    String str;
    if (s == null) {
    str = new String(cb, startChar, i -
    startChar);
    } else {
    s.append(cb, startChar, i - startChar);
    str = s.toString();
    }
    nextChar++;
    if (c == '\r') {
    skipLF = true;
    }
    return str;
    }

    if (s == null)
    s = new StringBuffer(defaultExpectedLineLength);
    s.append(cb, startChar, i - startChar);
    }
    }
    }


    /**
    * Fill the input buffer, taking the mark into account if it is
    valid.
    */
    private void fill() throws IOException {
    int dst;
    if (markedChar <= UNMARKED) {
    /* No mark */
    dst = 0;
    } else {
    /* Marked */
    int delta = nextChar - markedChar;
    if (delta >= readAheadLimit) {
    /* Gone past read-ahead limit: Invalidate mark */
    markedChar = INVALIDATED;
    readAheadLimit = 0;
    dst = 0;
    } else {
    if (readAheadLimit <= cb.length) {
    /* Shuffle in the current buffer */
    System.arraycopy(cb, markedChar, cb, 0, delta);
    markedChar = 0;
    dst = delta;
    } else {
    /* Reallocate buffer to accommodate read-ahead
    limit */
    char ncb[] = new char[readAheadLimit];
    System.arraycopy(cb, markedChar, ncb, 0, delta);
    cb = ncb;
    markedChar = 0;
    dst = delta;
    }
    nextChar = nChars = delta;
    }
    }

    int n;
    do {
    n = in.read(cb, dst, cb.length - dst);
    } while (n == 0);
    if (n > 0) {
    nChars = dst + n;
    nextChar = dst;
    }
    }






    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Again taking new Java programming contracts.
    Roedy Green, Sep 18, 2005
    #16
  17. In article <dgh0ve$ipp$2surf.net>,
    Steve Horsley <> wrote:

    > kahiga wrote:
    > >
    > > This might be fine for files and for the local cases it may also use
    > > the info from the <line.separator> to determine the EOL format.
    > > However, in the case of the network, the server cannot "look ahead" to
    > > see the next character. It has to wait for the client to send it. My
    > > original case was this; if the client so far has sent, for example,
    > > "Hello world\r" and the server is blocking on the readLine() method.
    > > How does it know to return the current string and not keep waiting to
    > > receive the next "\n".
    > >

    >
    > It can afford to return as soon as it sees the '\r'. It just has
    > to make a note that next time it is called, if the first
    > character out is a '\n' then this should be dropped.


    This is certainly one way it _could_ be implemented, but earlier
    versions of Java were not implemented this way. It was quite common to
    see server software written that did a readLine() on a socket that
    failed when run on a Mac, but that worked great on Windows.

    Scott

    --
    Scott Ellsworth

    Java and database consulting for the life sciences
    Scott Ellsworth, Sep 19, 2005
    #17
  18. "kahiga" <> wrote in message
    news:...
    >I see you're point about using a predefined protocol communication but
    > I was thinking at a simpler level e.g. a simple java server with a
    > telnet client whose only purpose (the server) is to echo any line it
    > receives from the client. Obviously this is possible and it shouldn't
    > matter what platform the client is running on.
    >
    >>You're right that doing writeLine()s on a Mac expecting to be able to read
    >>the result with readLine()s on Windows won't work.

    > Why? Seems like it should, otherwise wouldn't it break the WORA
    > principle if the code has to be platform specific?


    I think it's been explained why it won't. The WORA principle is an ideal,
    not an absolute. Java creates an abstraction layer, and so long as you can
    stay within that layer, WORA works reasonably well. Reading bytes from a
    socket lives outside that layer, just as reading raw bytes from the a disk
    would.
    Mike Schilling, Sep 19, 2005
    #18
  19. "Scott Ellsworth" <> wrote in message
    news:...
    > In article <dgh0ve$ipp$2surf.net>,
    > Steve Horsley <> wrote:
    >
    >> kahiga wrote:
    >> >
    >> > This might be fine for files and for the local cases it may also use
    >> > the info from the <line.separator> to determine the EOL format.
    >> > However, in the case of the network, the server cannot "look ahead" to
    >> > see the next character. It has to wait for the client to send it. My
    >> > original case was this; if the client so far has sent, for example,
    >> > "Hello world\r" and the server is blocking on the readLine() method.
    >> > How does it know to return the current string and not keep waiting to
    >> > receive the next "\n".
    >> >

    >>
    >> It can afford to return as soon as it sees the '\r'. It just has
    >> to make a note that next time it is called, if the first
    >> character out is a '\n' then this should be dropped.

    >
    > This is certainly one way it _could_ be implemented, but earlier
    > versions of Java were not implemented this way. It was quite common to
    > see server software written that did a readLine() on a socket that
    > failed when run on a Mac, but that worked great on Windows.


    I'm sorry to hear it was common to see server software written that did a
    readLine().

    One of the drawbacks of Java is that it provides a surface simplicity that
    can disguise complex issues. It can fool people into thinking that building
    a multi-threaded server is as simple:as scattering some 'synchronized's
    around, or that persistence can be addressed merely by declaring that some
    classes implement Serializeable.
    Mike Schilling, Sep 22, 2005
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Stuart Clayman

    Java and Unix Domain sockets

    Stuart Clayman, Jul 10, 2003, in forum: Java
    Replies:
    2
    Views:
    909
    Marc Rochkind
    Jul 10, 2003
  2. oziris
    Replies:
    2
    Views:
    5,487
    Roedy Green
    Nov 29, 2005
  3. gavino
    Replies:
    4
    Views:
    529
    gavino
    Sep 20, 2010
  4. Jean-Michel
    Replies:
    0
    Views:
    351
    Jean-Michel
    Dec 22, 2007
  5. Andrew DeFaria
    Replies:
    1
    Views:
    198
    Ben Morrow
    Jan 30, 2008
Loading...

Share This Page