Don't do this at home

Discussion in 'Java' started by Roedy Green, Apr 29, 2004.

  1. Roedy Green

    Roedy Green Guest

    Sun set a very bad example by naming a utility
    nativetoascii.

    It will convert various encodings to Unicode and back.

    It should have been a pair of utilities called something like:

    toUnicode and toNative

    or NativeToUnicode and UnicodeToNative

    ASCII is NOT Unicode!


    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Apr 29, 2004
    #1
    1. Advertising

  2. arne thormodsen <> scribbled the following:
    > "Roedy Green" <> wrote in message
    > news:...
    >> Sun set a very bad example by naming a utility
    >> nativetoascii.
    >>
    >> It will convert various encodings to Unicode and back.
    >>
    >> It should have been a pair of utilities called something like:
    >>
    >> toUnicode and toNative
    >>
    >> or NativeToUnicode and UnicodeToNative
    >>
    >> ASCII is NOT Unicode!


    > Reminds me of when I used to work on I18N and someone would ask me a
    > question like "Why is the ASCII code for O-umlaut different on a Mac
    > and a Windows system?". If I was feeling grumpy I'd say "it isn't"
    > and see if they could figure out the point I was making.


    I've still not yet quite forgiven the incident (on a non-technical
    newsgroup) where someone wrote the UTF-8 rendition of the copyright
    symbol and added "(that's a copyright symbol for the ASCII-impaired)",
    as if ASCII was a generic name for the entire concept of an ordered
    set of character glyphs. What's really infuriating is that the guy
    flamed me for correcting him.

    --
    /-- Joona Palaste () ------------- Finland --------\
    \-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
    "Life without ostriches is like coffee with milk."
    - Mika P. Nieminen
     
    Joona I Palaste, Apr 29, 2004
    #2
    1. Advertising

  3. "Roedy Green" <> wrote in message
    news:...
    > Sun set a very bad example by naming a utility
    > nativetoascii.
    >
    > It will convert various encodings to Unicode and back.
    >
    > It should have been a pair of utilities called something like:
    >
    > toUnicode and toNative
    >
    > or NativeToUnicode and UnicodeToNative
    >
    > ASCII is NOT Unicode!
    >


    Reminds me of when I used to work on I18N and someone would ask me a
    question like "Why is the ASCII code for O-umlaut different on a Mac
    and a Windows system?". If I was feeling grumpy I'd say "it isn't"
    and see if they could figure out the point I was making.

    --arne
     
    arne thormodsen, Apr 29, 2004
    #3
  4. Roedy Green wrote:

    > Sun set a very bad example by naming a utility
    > nativetoascii.
    >
    > It will convert various encodings to Unicode and back.


    No it won't. It will convert various encodings to ASCII with non-ascii
    characters represented as ASCII escape sequences representing their Unicode
    code.

    > It should have been a pair of utilities called something like:
    >
    > toUnicode and toNative
    >
    > or NativeToUnicode and UnicodeToNative


    That would misrepresent what is actually done even worse.


    > ASCII is NOT Unicode!


    "Unicode" isn't a text encoding at all.
     
    Michael Borgwardt, Apr 30, 2004
    #4
  5. Roedy Green

    Roedy Green Guest

    On Fri, 30 Apr 2004 11:06:28 +0200, Michael Borgwardt
    <> wrote or quoted :

    >No it won't. It will convert various encodings to ASCII with non-ascii
    >characters represented as ASCII escape sequences representing their Unicode
    >code.


    Do you mean unicode-8 or some ad hoc representation?

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Apr 30, 2004
    #5
  6. Roedy Green wrote:
    >>No it won't. It will convert various encodings to ASCII with non-ascii
    >>characters represented as ASCII escape sequences representing their Unicode
    >>code.

    >
    >
    > Do you mean unicode-8


    If you meant UTF-8, no. UTF-8 is not confined to ASCII.

    > or some ad hoc representation?


    I mean the unicode escapes defined in the Java Language Specification:
    http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html#44591
     
    Michael Borgwardt, May 3, 2004
    #6
  7. Roedy Green

    Roedy Green Guest

    On Mon, 03 May 2004 10:39:02 +0200, Michael Borgwardt
    <> wrote or quoted :

    >If you meant UTF-8, no. UTF-8 is not confined to ASCII.
    >
    >> or some ad hoc representation?


    It seem to use that when you convert its ascii to "ASCII" encoding
    too. I wonder how it would escape accidental \uxxxx if you used it on
    a java program for example.


    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, May 4, 2004
    #7
  8. Roedy Green wrote:
    > It seem to use that when you convert its ascii to "ASCII" encoding
    > too.


    I don't understand this sentence.

    > I wonder how it would escape accidental \uxxxx if you used it on
    > a java program for example.


    Since they are ASCII, native2ascii leaves them unchanged. The compiler
    will turn them into the corresponding unicode character. It's really
    just a special case of character escapes, which all begin with \.
    If you want a literal \u0046 in your program, you have to type
    \\u0046, just like you have to type \\n to get \n and not a linefeed.
    The difference is that the unicode escapes are processed before the
    lexical analyzation so that a unicode-escaped linefeed is equivalent
    to a linefeed in the source code, not a linefeed character.
     
    Michael Borgwardt, May 4, 2004
    #8
  9. /Roedy Green/:

    > It seem to use that when you convert its ascii to "ASCII" encoding
    > too.


    It will convert "ANSI" to ASCII. That's it - it will take a text
    file, decode it using the system/native encoding and produce plain
    ASCII encoded file where characters outside the ASCII repertoire are
    replaced with \uXXXX escapes.

    --
    Stanimir
     
    Stanimir Stamenkov, May 4, 2004
    #9
  10. Roedy Green

    Roedy Green Guest

    On Tue, 04 May 2004 10:00:14 +0200, Michael Borgwardt
    <> wrote or quoted :

    >> It seem to use that when you convert its ascii to "ASCII" encoding
    >> too.

    >
    >I don't understand this sentence.


    ascii is also the name of an encoding. So you can convert from its
    intermediate format to the official ASCII encoding, which seems to use
    these /uxxxx things too. That was a surprise. I would have expected ?
    or SUB for any exotic character.

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, May 4, 2004
    #10
  11. Roedy Green wrote:
    >>>It seem to use that when you convert its ascii to "ASCII" encoding
    >>>too.

    >>
    >>I don't understand this sentence.

    >
    >
    > ascii is also the name of an encoding.


    It's nothing BUT the name of an encoding.

    > So you can convert from its
    > intermediate format


    Which intermediate format and what "it" would that be?

    > to the official ASCII encoding, which seems to use
    > these /uxxxx things too.


    No, those escape sequences have nothing to do with the ASCII
    encoding except being composed entirely of ASCII characters.
    They are defined in the Java Language Specification and
    all specification-compliant compilers will interpret them.

    The point of these escape sequences and native2ascii is to
    create a "normalized" format for Java source code that everyone
    can deal with but still allows the full range of Unicode
    characters to be used (well, the full range of UCS-2 anyway).
     
    Michael Borgwardt, May 5, 2004
    #11
  12. Roedy Green

    Roedy Green Guest

    On Wed, 05 May 2004 11:15:02 +0200, Michael Borgwardt
    <> wrote or quoted :

    >The point of these escape sequences and native2ascii is to
    >create a "normalized" format for Java source code that everyone
    >can deal with but still allows the full range of Unicode
    >characters to be used (well, the full range of UCS-2 anyway).


    Fine but that is not what you normally expect from a translation to
    ASCII. You would expect exotic characters to translate to ? on sub
    the way they do for all the other encodings.

    This ASCII is not your father's ASCII.

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, May 6, 2004
    #12
  13. Roedy Green <> wrote in message news:<>...
    > Fine but that is not what you normally expect from a translation to
    > ASCII. You would expect exotic characters to translate to ? on sub
    > the way they do for all the other encodings.
    >
    > This ASCII is not your father's ASCII.


    I agree that the name of the tool does not adequately describe it's
    purpose, is in fact somewhat misleading.
     
    Michael Borgwardt, May 6, 2004
    #13
  14. Roedy Green

    Dale King Guest

    "Roedy Green" <> wrote in
    message news:...
    > On Wed, 05 May 2004 11:15:02 +0200, Michael Borgwardt
    > <> wrote or quoted :
    >
    > >The point of these escape sequences and native2ascii is to
    > >create a "normalized" format for Java source code that everyone
    > >can deal with but still allows the full range of Unicode
    > >characters to be used (well, the full range of UCS-2 anyway).

    >
    > Fine but that is not what you normally expect from a translation to
    > ASCII. You would expect exotic characters to translate to ? on sub
    > the way they do for all the other encodings.
    >
    > This ASCII is not your father's ASCII.


    Have you bothered to read the documentation for the native2ascii tool?

    http://java.sun.com/j2se/1.4.2/docs/tooldocs/windows/native2ascii.html

    It makes it very clear what it does. It says it converts it to
    Unicode-encoded characters. The only place the word ASCII appears is in the
    name of the tool. The output of the tool (or the input if you put it into
    reverse) is ASCII only. So there is nothing misleading about saying it
    converts to ASCII.

    I fail to see what you are complaining about. They cannot put every bit of
    information about what it does into the name of the tool. Calling it
    native2unicode would have been very much incorrect. It seems you would only
    be happy if they called it native2ascii_with_unicode_escape_sequences, which
    I'm afraid is a bit too much to type.

    The moral here is that you cannot assume everything about how a tool works
    just by the name. Sometimes you have to read the documentation.

    > --
    > Canadian Mind Products, Roedy Green.


    Still refuse to put the space after the dashes?

    --
    Dale King
     
    Dale King, May 7, 2004
    #14
  15. Roedy Green

    Roedy Green Guest

    On Fri, 7 May 2004 14:45:23 -0500, "Dale King" <>
    wrote or quoted :

    >
    >It makes it very clear what it does. It says it converts it to
    >Unicode-encoded characters. The only place the word ASCII appears is in the
    >name of the tool.


    Which is my only complaint, other than the goofy -reverse option.


    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, May 7, 2004
    #15
  16. Roedy Green

    Roedy Green Guest

    On Sat, 8 May 2004 08:54:54 GMT-5, (Dale
    King) wrote or quoted :

    >And I still fail to see what there is to complain about. The
    >tool takes a file in a native encoding and converts it to ASCII.
    >I don't see what you find wrong with the name since it accurately
    >reflects what it does.


    the name nativeToAscii fails in two respects.

    1. The utility does not do a standard "conversion to ASCII". It
    converts to an Sun-invented encoding scheme described with ASCII
    characters. It is not ASCII any more than Base64 is. Ascii only
    represents 128 chars. Sun's encoding represents 64K.

    2. nativeToAscii sometimes converts "ascii" to native, the reverse of
    what its name implies.


    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, May 11, 2004
    #16
  17. Roedy Green

    P.Hill Guest

    Roedy Green wrote:

    > 1. The utility does not do a standard "conversion to ASCII".


    No, agreed, but it does produce a file which contains only ASCII characters,
    regardless of what certain sequences of ASCII characters are intended to
    represent in the particular "language"/"encoding".
    These ASCII files can be safely processed by utilities which only work with
    ASCII.

    > described with ASCII
    > characters. It is not ASCII any more than Base64 is.


    This is an alternative interpretation of what an "ASCII file" should
    contain.

    RFC 1642 which uses Base64 encoding
    http://www.faqs.org/rfcs/rfc1642.html
    "Internet mail (STD 11, RFC 822) currently supports only 7-
    bit US ASCII as a character set."
    [...]
    "This document describes a new transformation format of Unicode that
    contains only 7-bit ASCII characters"
    [...]
    "UTF-7 encodes Unicode characters as US-ASCII"

    That is the similar usage as the name nativeToAscii suggests.

    > Ascii only
    > represents 128 chars. Sun's encoding represents 64K.


    The characters in the file are pure ASCII, thus the name.

    Sorry that you expect the name to be nativeToAsciiWithOtherCharactersEscaped

    -Paul
     
    P.Hill, May 11, 2004
    #17
  18. Roedy Green

    Roedy Green Guest

    On Tue, 11 May 2004 16:25:58 -0600, "P.Hill" <>
    wrote or quoted :

    >
    >Sorry that you expect the name to be nativeToAsciiWithOtherCharactersEscaped

    I would call them toNative and fromNative with an implied interchange
    format. Don't confuse the issue by calling it ASCII .It is not. This
    is NOT the way you encode those characters in ASCII. All the weird
    ones ones should be SUB or ?

    ASCII files are human readable things with chars 0..128. I don't
    count mime, base64 or sun's Unicode encoding as ASCII even though it
    use the ASCII set. It is no more Ascii than Indonsian is English
    because they use the same alphabet.

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, May 12, 2004
    #18
  19. Roedy Green

    Dale King Guest

    Hello, Roedy Green !
    You wrote:

    > On Sat, 8 May 2004 08:54:54 GMT-5,

    (Dale
    > King) wrote or quoted :
    >
    > >And I still fail to see what there is to complain about. The
    > >tool takes a file in a native encoding and converts it to

    ASCII.
    > >I don't see what you find wrong with the name since it

    accurately
    > >reflects what it does.

    >
    > the name nativeToAscii fails in two respects.
    >
    > 1. The utility does not do a standard "conversion to ASCII".

    It
    > converts to an Sun-invented encoding scheme described with

    ASCII
    > characters. It is not ASCII any more than Base64 is.


    No, that is not a valid comparison. In base64 the value the value
    0x41 is merely encoding some combination of bits and does not
    signify the letter A as it does in ASCII. The same is not true of
    the output of native2ascii. Every numeric value actually means
    the same thing as it does in ASCII.

    > Ascii only
    > represents 128 chars. Sun's encoding represents 64K.


    Not a valid criticism. Each of its 128 values maps to one
    abstract symbol, but people have been finding ways to use
    combinations of those symbols to represent other things for years
    now. For example they might use e^ to signify the letter ê.
    Saying that is not ASCII is like saying it isn't ASCII because
    you can combine the letters to form words. ASCII only specifies
    the meaning of individual symbols and does not limit what meaning
    you apply to the combinations of those symbols.

    > 2. nativeToAscii sometimes converts "ascii" to native, the

    reverse of
    > what its name implies.


    In which case the command would be native2ascii -reverse which
    seems fairly self explanatory to me. They could have split that
    into 2 programs, but that would be inferior to a single program
    in my mind. And remember that it is usually much rarer that the
    reverse direction is used. I'm not even sure that the reverse
    option was in the original version.

    > ASCII files are human readable things with chars 0..128.


    So is the output of native2ascii.

    > I don't
    > count mime, base64 or sun's Unicode encoding as ASCII even

    though it
    > use the ASCII set.


    You have to seperate the numeric values from the meaning assigned
    to specific values. I can't speak for MIME but you are correct
    that base64 is not ASCII. Even though it uses the same range of
    values it does not assign the same meaning as does ASCII. The
    output of native2ascii uses the same numeric range and assigns
    the same meaning to those values and is therefore ASCII.

    > It is no more Ascii than Indonsian is English
    > because they use the same alphabet.


    I don't believe Indonesian actually uses the same alphabet as
    English, but that is beside the point.

    Once again the issue is not the bit patterns, but the meanings
    assigned to those bit patterns. Indonesian is not English because
    they do not ascribe the same meanings to the letters. But that
    is not the case with native2ascii since it assigns the same
    meaning as does ASCII.

    A better analogy would be an English text that contained a
    foreign word or phrase. Does the text cease to be English because
    of this?

    > This
    > is NOT the way you encode those characters in ASCII.


    No you don't encode them at all in ASCII. You have to encode them
    in some other convention on top of ASCII, by using one or more
    ASCII characters. However that is still ASCII because the meaning
    of each code unit is the same.

    >All the weird
    > ones ones should be SUB or ?


    I don't see why you think that. Clearly they have to be dropped
    or replaced by something else. I don't see what makes one
    replacement more natural than another. If you replace them with ?
    that is not truly correct since the original file did not have a
    ? there. Let's say instead that the replacement was instead the
    string "{non-ASCII character}", would that make it non ASCII? The
    fact that they choose a repleacement that differs by character
    and is reversible seems to have no bearing on whether it is
    ASCII.
    --
    Dale King
    My Blog: http://daleking.homedns.org/Blog
     
    Dale King, Apr 15, 2006
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. DP
    Replies:
    0
    Views:
    1,145
  2. nancyflorida
    Replies:
    0
    Views:
    337
    nancyflorida
    Nov 12, 2007
  3. nancyflorida
    Replies:
    0
    Views:
    322
    nancyflorida
    Nov 12, 2007
  4. * Its my Pleasure *
    Replies:
    0
    Views:
    382
    * Its my Pleasure *
    Feb 20, 2008
  5. Roedy Green

    Don't do this at home

    Roedy Green, Aug 28, 2009, in forum: Java
    Replies:
    6
    Views:
    326
    Roedy Green
    Aug 31, 2009
Loading...

Share This Page