How to write Unicode

S

Stefan Ram

When writing into a Unicode text file, given that the Stream
encoding was set to »UTF-8«, what is the proper, best or
canonical way to terminate a line?

Some possibilities are given on the following lines.

printStream.printf( "\n" );
printStream.printf( "%n" );
printStream.print(( char )0x000A );
printStream.print(( char )0x000D );
printStream.print(( char )0x000D ); printStream.print(( char )0x000A );
printStream.print(( char )0x0085 ); // 0x0085 is Unicode »NEL - next line«
printStream.print(( char )0x2028 ); // 0x2028 is Unicode »line separator«
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Stefan said:
When writing into a Unicode text file, given that the Stream
encoding was set to »UTF-8«, what is the proper, best or
canonical way to terminate a line?

Some possibilities are given on the following lines.

printStream.printf( "\n" );
printStream.printf( "%n" );
printStream.print(( char )0x000A );
printStream.print(( char )0x000D );
printStream.print(( char )0x000D ); printStream.print(( char )0x000A );
printStream.print(( char )0x0085 ); // 0x0085 is Unicode »NEL - next line«
printStream.print(( char )0x2028 ); // 0x2028 is Unicode »line separator«

For a disk file in UTF-8 I can not really see any reason not to use
System.getProperty("line.separator").

Arne
 
T

Thomas Fritsch

Stefan said:
When writing into a Unicode text file, given that the Stream
encoding was set to »UTF-8«, what is the proper, best or
canonical way to terminate a line?

Some possibilities are given on the following lines.

printStream.printf( "\n" );
printStream.printf( "%n" );
printStream.print(( char )0x000A );
printStream.print(( char )0x000D );
printStream.print(( char )0x000D ); printStream.print(( char )0x000A );
printStream.print(( char )0x0085 ); // 0x0085 is Unicode »NEL - next line«
printStream.print(( char )0x2028 ); // 0x2028 is Unicode »line separator«
Not to forget
printStream.println();
 
L

Lew

Thomas said:
Not to forget
printStream.println();

Lest we forget:
All characters printed by a PrintStream are converted into bytes using the platform's default character encoding. The PrintWriter class should be used in situations that require writing characters rather than bytes.

Assuming that your variable "printStream" is of type "PrintStream", which you
did not aver.

I get a cringe seeing "the Stream encoding was set" - Java IO Streams don't
have encodings. The PrintStream methods use encodings, but the Stream doesn't.

To answer your question, printf()'s "%n" specifies "the platform-specific line
separator", but that has nothing to do with encodings.
 
R

Rob

For a disk file in UTF-8 I can not really see any reason not to use
System.getProperty("line.separator").

Arne

If you're trying to get from a Java String to UTF-8 bytes, you could
try using String.getBytes("UTF-8"). The JDK will take care of
converting for you. If your Java String contains \n I'd expect it to
be converted to UTF-8 properly. Once you have the byte array you can
write the bytes directly to the file.
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Rob said:
If you're trying to get from a Java String to UTF-8 bytes, you could
try using String.getBytes("UTF-8"). The JDK will take care of
converting for you. If your Java String contains \n I'd expect it to
be converted to UTF-8 properly. Once you have the byte array you can
write the bytes directly to the file.

1) \n is line separator on Unix/Linux - it is not line seaprator
on all platforms.

2) \n (and \r) are the same in ASCII, ISO-8859-1, UTF-8 etc..

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top