unicode characters in asci file

Discussion in 'HTML' started by WindAndWaves, Nov 22, 2004.

  1. WindAndWaves

    WindAndWaves Guest

    Hi Gurus

    Do you know if it is possible to display unicode characters (e.g. japanese
    ones) in a asci based file?

    TIA

    - Nicolaas
     
    WindAndWaves, Nov 22, 2004
    #1
    1. Advertising

  2. WindAndWaves

    Philip Ronan Guest

    WindAndWaves wrote:

    > Do you know if it is possible to display unicode characters (e.g. japanese
    > ones) in a asci based file?


    I'm not sure I understand what you mean.

    The ASCII character set contains 128 characters consisting of uppercase and
    lowercase Roman alphabets, Arabic numerals from 0-9, various punctuation
    characters and 32 control codes. No Japanese characters there at all.

    The Unicode standard contains thousands of characters, but it isn't the same
    thing as ASCII.

    If you want to include Japanese characters in an *HTML* file, then you
    should use Unicode character entities. For example, the characters for
    "Japan" are 日本.

    Is that what you wanted?

    --
    Philip Ronan

    (Please remove the "z"s if replying by email)
     
    Philip Ronan, Nov 22, 2004
    #2
    1. Advertising

  3. Philip Ronan <> wrote:

    > If you want to include Japanese characters in an *HTML* file, then you
    > should use Unicode character entities.


    Make it "could". Why not use UTF-8? But technically it is indeed possible
    to write an HTML document that is ASCII encoded, yet contains any
    characters you want.

    And they are character references, not entities. See
    http://www.cs.tut.fi/~jkorpela/chars/ref.html

    > For example, the characters for
    > "Japan" are 日本.


    Using decimal notation works more often, though the difference is getting
    more and more marginal.

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
    Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html
     
    Jukka K. Korpela, Nov 22, 2004
    #3
  4. Philip Ronan <> wrote:
    > [...] you should use Unicode character entities.


    Jukka K. Korpela replied:
    > Make it "could". Why not use UTF-8?


    UTF-8 *is* unicode. It's just an encoding. Philip didn't specify any
    encoding - OP might as well use UCS, although it's a large.

    Sybren
    --
    The problem with the world is stupidity. Not saying there should be a
    capital punishment for stupidity, but why don't we just take the
    safety labels off of everything and let the problem solve itself?
     
    Sybren Stuvel, Nov 22, 2004
    #4
  5. WindAndWaves

    Steve Pugh Guest

    On Mon, 22 Nov 2004 15:28:52 +0100, Sybren Stuvel
    <> wrote:

    > Philip Ronan <> wrote:
    >> [...] you should use Unicode character entities.

    >
    > Jukka K. Korpela replied:
    >> Make it "could". Why not use UTF-8?

    >
    > UTF-8 *is* unicode. It's just an encoding. Philip didn't specify any
    > encoding - OP might as well use UCS, although it's a large.


    Philip said to use "Unicode character entities" and from his example it is
    clear that he was talking about "Numeric character references" -
    As such Philip didn't need to specify any encoding - in HTML
    all character references are always to Unicode so the encoding used would
    be irrelevant.

    Jukka was pointing out that instead of the character references the OP
    could use UTF-8 and include the characters directly in the page.

    Steve
     
    Steve Pugh, Nov 22, 2004
    #5
  6. WindAndWaves

    WindAndWaves Guest

    "Steve Pugh" <> wrote in message
    news:eek:pshve0zgo06el5p@stevepughlaptop...
    > On Mon, 22 Nov 2004 15:28:52 +0100, Sybren Stuvel
    > <> wrote:
    >
    > > Philip Ronan <> wrote:
    > >> [...] you should use Unicode character entities.

    > >
    > > Jukka K. Korpela replied:
    > >> Make it "could". Why not use UTF-8?

    > >
    > > UTF-8 *is* unicode. It's just an encoding. Philip didn't specify any
    > > encoding - OP might as well use UCS, although it's a large.

    >
    > Philip said to use "Unicode character entities" and from his example it is
    > clear that he was talking about "Numeric character references" -
    > As such Philip didn't need to specify any encoding - in HTML
    > all character references are always to Unicode so the encoding used would
    > be irrelevant.
    >
    > Jukka was pointing out that instead of the character references the OP
    > could use UTF-8 and include the characters directly in the page.
    >
    > Steve


    Thank you all for your replies. I know understand that it is indeed
    possible to have 'funny' characters in an ascii file. You see, I have an
    index file, which I would like to load quickly, but also contains some
    Japanese, Russian, Chinese, etc.. characters (links pointing to translations
    of the page). Now, I could either double the file in size by saving it as
    unicode or I could use the codes to specify the characters that I
    need.

    Can someone please confirm that I understood this correctly.

    Thank you


    - Nicolaas

    PS does anyone know of any programs / online applications that can translate
    characters into these codes ()
     
    WindAndWaves, Nov 22, 2004
    #6
  7. "WindAndWaves" <> wrote:

    > I have an index file, which I would like to load quickly, but also
    > contains some Japanese, Russian, Chinese, etc.. characters (links
    > pointing to translations of the page).


    Ideally, we would use language negotiation (a protocol for selecting
    content based on the language preferences in the browser and information
    on existing versions in the server) for sending the user the best
    alternative available. But this is unreliable since most people have
    wrong language settings in their browsers, so a multilingual index file
    is indeed needed for a multilingual site.

    > Now, I could either double
    > the file in size by saving it as unicode or I could use the
    > codes to specify the characters that I need.


    You can use either of the methods, but please note that using Unicode
    does not double the file size. Well, sometimes it might, but normally it
    won't. In UTF-8, each Ascii character takes just one octet (byte), just
    as in a pure Ascii file. Other characters take two or more octets each,
    but if your document (including HTML markup, which uses Ascii only) is
    dominantly Ascii characters, the increase in file size won't be big, and
    it'll probably be a little smaller than the size of a version that uses
    references. (After all, Ӓ is seven octets.)

    > PS does anyone know of any programs / online applications that can
    > translate characters into these codes ()


    There are many of them, for different platforms. See
    http://www.alanwood.net/unicode/utilities_editors.html
    (which is about Unicode editors, which let you work with UTF-8 in
    general, but they often have an output mode that uses ).

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
    Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html
     
    Jukka K. Korpela, Nov 22, 2004
    #7
  8. Steve Pugh enlightened us with:
    > Jukka was pointing out that instead of the character references the
    > OP could use UTF-8 and include the characters directly in the page.


    Ah, ok! Indeed, that's very possible, and I do it often.

    Sybren
    --
    The problem with the world is stupidity. Not saying there should be a
    capital punishment for stupidity, but why don't we just take the
    safety labels off of everything and let the problem solve itself?
     
    Sybren Stuvel, Nov 23, 2004
    #8
  9. WindAndWaves enlightened us with:
    > I know understand that it is indeed possible to have 'funny'
    > characters in an ascii file.


    Strictly speaking, it's not. You have references to 'funny'
    characters, but the references themselve are ASCII again, so no
    'funny' characters are actually in the file. Or you have UTF-8 'funny'
    characters in the file, but then the file isn't ASCII any more.

    > You see, I have an index file, which I would like to load quickly,
    > but also contains some Japanese, Russian, Chinese, etc.. characters
    > (links pointing to translations of the page). Now, I could either
    > double the file in size by saving it as unicode or I could use the
    > codes to specify the characters that I need.


    You understood it incorrectly. If you were to use UCS to store the
    unicode, you'd be right. If you use UTF-8 to store the unicode, the
    ASCII characters would still take a single byte, and the others two or
    more.

    > PS does anyone know of any programs / online applications that can
    > translate characters into these codes ()


    I think HTML tidy can do that.

    Sybren
    --
    The problem with the world is stupidity. Not saying there should be a
    capital punishment for stupidity, but why don't we just take the
    safety labels off of everything and let the problem solve itself?
     
    Sybren Stuvel, Nov 23, 2004
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mike Meyer

    Why asci-only symbols?

    Mike Meyer, Oct 11, 2005, in forum: Python
    Replies:
    11
    Views:
    497
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Oct 18, 2005
  2. Laszlo Nagy
    Replies:
    6
    Views:
    661
  3. Terry Reedy
    Replies:
    0
    Views:
    553
    Terry Reedy
    Jul 1, 2008
  4. Grzegorz ¦liwiñski
    Replies:
    2
    Views:
    1,027
    Grzegorz ¦liwiñski
    Jan 19, 2011
  5. Zqd Zqd

    how to change ASCI to Hex

    Zqd Zqd, Sep 7, 2008, in forum: Ruby
    Replies:
    0
    Views:
    142
    Zqd Zqd
    Sep 7, 2008
Loading...

Share This Page