Re: Java I/O for Networking

Discussion in 'Java' started by Robert Olofsson, Aug 22, 2003.

  1. Erwin () wrote:
    : When trying to communicate with Web server, and using socket to send and
    : receive data, I am confused which method is best to send and receive data
    : from the server as there are plenty of methods and classes available to
    : choose.

    Talking to a web server is done with HTTP so I would suggest that you
    start by reading RFC2616 so you know what you need.
    Basically HTTP uses 8-bit character set. This means that if you send
    data with a writer you will(/may) send unicode to the server and that
    may be data that takes several bytes => server may get confused...
    This means that I have to suggest that you use byte buffers and write
    to the server (that is do _not_ use the print/println methods).

    As for the data in the request or response, that can be considered
    binary unless you plan to read it (in which case you know how to parse
    it already).

    If all you want to do is talk to a web server then starting with
    java.net.URLConnection may be enough. Other than that you have some
    code in the apache/jakarta commons package. And I have written a web
    proxy that supports the full HTTP/1.1 including caching that can be
    found at http://www.khelekore.org/rabbit/.

    /robo
    Robert Olofsson, Aug 22, 2003
    #1
    1. Advertising

  2. Robert Olofsson

    Jon A. Cruz Guest

    Robert Olofsson wrote:

    > Basically HTTP uses 8-bit character set.


    No.

    HTTP itself uses a 7-bit character set for the HTTP specific stuff (headers)

    Then it transports 8-bit bytes in the body.

    So two things. One is that HTTP itself deals with 7-bit characters only.
    The other that a "byte" is not identical to "char". Those are two
    different concepts that are sometimes mapped so that they overlap, but
    are still different things.


    > This means that if you send
    > data with a writer you will(/may) send unicode to the server and that
    > may be data that takes several bytes => server may get confused...
    > This means that I have to suggest that you use byte buffers and write
    > to the server (that is do _not_ use the print/println methods).


    Better than that.

    When sending text types (any MIME type starting with "text/"), just use
    an explicit character encoding when creating the OutputStreamWriter and
    be sure to name that encoding using the HTTP charset naming in the HTTP
    header that gets sent out.

    PrintWriter sending = new PrintWriter(new OutputStreamWriter(
    connection.getOutputStream(), "UTF8" ));

    Do not use println, as it will use a default platform (client computer)
    line ending. Use print() and use the appropriate line ending (usually
    either "\r\n" or "\n" ).


    For binary types (non-text types), do not use a Writer, and use write().
    Jon A. Cruz, Aug 23, 2003
    #2
    1. Advertising

  3. Jon A. Cruz () wrote:
    : Robert Olofsson wrote:

    : > Basically HTTP uses 8-bit character set.
    : No.
    : HTTP itself uses a 7-bit character set for the HTTP specific stuff (headers)

    check RFC2616 (http://www.ietf.org/rfc/rfc2616.txt) you will find
    this (only a few lines, read the whole text for missing pieces):

    OCTET = <any 8-bit sequence of data>
    TEXT = <any OCTET except CTLs, but including LWS>
    quoted-string = ( <"> *(qdtext | quoted-pair ) <"> )
    qdtext = <any TEXT except <">>
    message-header = field-name ":" [ field-value ]
    field-name = token
    field-value = *( field-content | LWS )
    field-content = <the OCTETs making up the field-value
    and consisting of either *TEXT or combinations
    of token, separators, and quoted-string>


    So I would say that HTTP uses an 8-bit character set. All commonly
    used headers are in 7-bit ASCII, but extensions may use the
    8-bit and are allowed to do that.

    : Then it transports 8-bit bytes in the body.

    : So two things. One is that HTTP itself deals with 7-bit characters only.
    : The other that a "byte" is not identical to "char". Those are two
    : different concepts that are sometimes mapped so that they overlap, but
    : are still different things.

    Correct a byte is the smallest adressable unit and may be 8, 12, 16 or
    even 36 bits depending on the hardware in java a byte is 8 bits.

    : Do not use println, as it will use a default platform (client computer)
    : line ending. Use print() and use the appropriate line ending (usually
    : either "\r\n" or "\n" ).

    This is correct. Many web servers allow headers with only \n or only
    \r since there are lot of broken clients. One should try to be correct
    though.

    /robo
    Robert Olofsson, Aug 24, 2003
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Rahul Sharma

    Is there Java.Networking group

    Rahul Sharma, Jul 23, 2003, in forum: Java
    Replies:
    6
    Views:
    397
    Jezuch
    Jul 23, 2003
  2. VisionSet

    java to java networking

    VisionSet, Nov 19, 2003, in forum: Java
    Replies:
    0
    Views:
    273
    VisionSet
    Nov 19, 2003
  3. Steve R. Burrus

    Need help w. Java Networking.

    Steve R. Burrus, Jun 5, 2004, in forum: Java
    Replies:
    12
    Views:
    753
    Bryce
    Jun 7, 2004
  4. John Galt
    Replies:
    1
    Views:
    686
    zzyzx
    Jul 15, 2004
  5. Winston

    Java Networking

    Winston, Mar 10, 2005, in forum: Java
    Replies:
    1
    Views:
    314
    iNFiDEL
    Mar 16, 2005
Loading...

Share This Page