How to write Unicode

Discussion in 'Java' started by Stefan Ram, Jul 5, 2007.

  1. Stefan Ram

    Stefan Ram Guest

    When writing into a Unicode text file, given that the Stream
    encoding was set to »UTF-8«, what is the proper, best or
    canonical way to terminate a line?

    Some possibilities are given on the following lines.

    printStream.printf( "\n" );
    printStream.printf( "%n" );
    printStream.print(( char )0x000A );
    printStream.print(( char )0x000D );
    printStream.print(( char )0x000D ); printStream.print(( char )0x000A );
    printStream.print(( char )0x0085 ); // 0x0085 is Unicode »NEL - next line«
    printStream.print(( char )0x2028 ); // 0x2028 is Unicode »line separator«
     
    Stefan Ram, Jul 5, 2007
    #1
    1. Advertising

  2. Stefan Ram wrote:
    > When writing into a Unicode text file, given that the Stream
    > encoding was set to »UTF-8«, what is the proper, best or
    > canonical way to terminate a line?
    >
    > Some possibilities are given on the following lines.
    >
    > printStream.printf( "\n" );
    > printStream.printf( "%n" );
    > printStream.print(( char )0x000A );
    > printStream.print(( char )0x000D );
    > printStream.print(( char )0x000D ); printStream.print(( char )0x000A );
    > printStream.print(( char )0x0085 ); // 0x0085 is Unicode »NEL - next line«
    > printStream.print(( char )0x2028 ); // 0x2028 is Unicode »line separator«


    For a disk file in UTF-8 I can not really see any reason not to use
    System.getProperty("line.separator").

    Arne
     
    =?ISO-8859-1?Q?Arne_Vajh=F8j?=, Jul 5, 2007
    #2
    1. Advertising

  3. Stefan Ram schrieb:
    > When writing into a Unicode text file, given that the Stream
    > encoding was set to »UTF-8«, what is the proper, best or
    > canonical way to terminate a line?
    >
    > Some possibilities are given on the following lines.
    >
    > printStream.printf( "\n" );
    > printStream.printf( "%n" );
    > printStream.print(( char )0x000A );
    > printStream.print(( char )0x000D );
    > printStream.print(( char )0x000D ); printStream.print(( char )0x000A );
    > printStream.print(( char )0x0085 ); // 0x0085 is Unicode »NEL - next line«
    > printStream.print(( char )0x2028 ); // 0x2028 is Unicode »line separator«
    >

    Not to forget
    printStream.println();

    --
    Thomas
     
    Thomas Fritsch, Jul 5, 2007
    #3
  4. Stefan Ram

    Lew Guest

    Stefan Ram schrieb:
    >> When writing into a Unicode text file, given that the Stream
    >> encoding was set to »UTF-8«, what is the proper, best or
    >> canonical way to terminate a line?
    >> Some possibilities are given on the following lines.
    >>
    >> printStream.printf( "\n" );
    >> printStream.printf( "%n" );
    >> printStream.print(( char )0x000A );
    >> printStream.print(( char )0x000D );
    >> printStream.print(( char )0x000D ); printStream.print(( char )0x000A );
    >> printStream.print(( char )0x0085 ); // 0x0085 is Unicode »NEL - next
    >> line«
    >> printStream.print(( char )0x2028 ); // 0x2028 is Unicode »line
    >> separator«


    Thomas Fritsch wrote:
    > Not to forget
    > printStream.println();


    Lest we forget:
    > All characters printed by a PrintStream are converted into bytes using the platform's default character encoding. The PrintWriter class should be used in situations that require writing characters rather than bytes.


    Assuming that your variable "printStream" is of type "PrintStream", which you
    did not aver.

    I get a cringe seeing "the Stream encoding was set" - Java IO Streams don't
    have encodings. The PrintStream methods use encodings, but the Stream doesn't.

    To answer your question, printf()'s "%n" specifies "the platform-specific line
    separator", but that has nothing to do with encodings.

    --
    Lew
     
    Lew, Jul 7, 2007
    #4
  5. Stefan Ram

    Roedy Green Guest

    On Sat, 07 Jul 2007 11:17:16 -0400, Lew <> wrote,
    quoted or indirectly quoted someone who said :

    >
    >I get a cringe seeing "the Stream encoding was set" - Java IO Streams don't
    >have encodings. The PrintStream methods use encodings, but the Stream doesn't.


    When you are playing with encodings you use a Reader/Writer.

    see http://mindprod.com/applet/fileio.html
    for sample code.
    --
    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
     
    Roedy Green, Jul 12, 2007
    #5
  6. Stefan Ram

    Rob Guest

    On Jul 4, 8:59 pm, Arne Vajhøj <> wrote:
    > Stefan Ram wrote:
    > > When writing into a Unicode text file, given that the Stream
    > > encoding was set to »UTF-8«, what is the proper, best or
    > > canonical way to terminate a line?

    >
    > > Some possibilities are given on the following lines.

    >
    > > printStream.printf( "\n" );
    > > printStream.printf( "%n" );
    > > printStream.print(( char )0x000A );
    > > printStream.print(( char )0x000D );
    > > printStream.print(( char )0x000D ); printStream.print(( char )0x000A );
    > > printStream.print(( char )0x0085 ); // 0x0085 is Unicode »NEL - next line«
    > > printStream.print(( char )0x2028 ); // 0x2028 is Unicode »line separator«

    >
    > For a disk file in UTF-8 I can not really see any reason not to use
    > System.getProperty("line.separator").
    >
    > Arne


    If you're trying to get from a Java String to UTF-8 bytes, you could
    try using String.getBytes("UTF-8"). The JDK will take care of
    converting for you. If your Java String contains \n I'd expect it to
    be converted to UTF-8 properly. Once you have the byte array you can
    write the bytes directly to the file.
     
    Rob, Jul 12, 2007
    #6
  7. Rob wrote:
    > On Jul 4, 8:59 pm, Arne Vajhøj <> wrote:
    >> Stefan Ram wrote:
    >>> When writing into a Unicode text file, given that the Stream
    >>> encoding was set to »UTF-8«, what is the proper, best or
    >>> canonical way to terminate a line?
    >>> Some possibilities are given on the following lines.
    >>> printStream.printf( "\n" );
    >>> printStream.printf( "%n" );
    >>> printStream.print(( char )0x000A );
    >>> printStream.print(( char )0x000D );
    >>> printStream.print(( char )0x000D ); printStream.print(( char )0x000A );
    >>> printStream.print(( char )0x0085 ); // 0x0085 is Unicode »NEL - next line«
    >>> printStream.print(( char )0x2028 ); // 0x2028 is Unicode »line separator«

    >> For a disk file in UTF-8 I can not really see any reason not to use
    >> System.getProperty("line.separator").

    >
    > If you're trying to get from a Java String to UTF-8 bytes, you could
    > try using String.getBytes("UTF-8"). The JDK will take care of
    > converting for you. If your Java String contains \n I'd expect it to
    > be converted to UTF-8 properly. Once you have the byte array you can
    > write the bytes directly to the file.


    1) \n is line separator on Unix/Linux - it is not line seaprator
    on all platforms.

    2) \n (and \r) are the same in ASCII, ISO-8859-1, UTF-8 etc..

    Arne
     
    =?ISO-8859-1?Q?Arne_Vajh=F8j?=, Jul 14, 2007
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Robert Mark Bram
    Replies:
    0
    Views:
    3,928
    Robert Mark Bram
    Sep 28, 2003
  2. ygao

    unicode wrap unicode object?

    ygao, Apr 8, 2006, in forum: Python
    Replies:
    6
    Views:
    551
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Apr 8, 2006
  3. Gabriele *darkbard* Farina

    Unicode digit to unicode string

    Gabriele *darkbard* Farina, May 16, 2006, in forum: Python
    Replies:
    2
    Views:
    522
    Gabriele *darkbard* Farina
    May 16, 2006
  4. gabor
    Replies:
    13
    Views:
    556
    Leo Kislov
    Nov 18, 2006
  5. Rob Knop
    Replies:
    1
    Views:
    290
Loading...

Share This Page