Saving an UTF-8 file

Discussion in 'Ruby' started by Miquel Oliete, Nov 12, 2006.

  1. Hi All

    I have a problem (newbie problem).

    I don't know how to write a file using utf-8 encoding. Can you help
    me.

    Thanks in advance

    Kind regards

    --

    Miquel (a.k.a. Ton)
    Linux User #286784
    GPG Key : 4D91EF7F
    Debian GNU/Linux (Linux Wolverine 2.6.14)

    Welcome to the jungle, we got fun and games
    Guns n' Roses


    ______________________________________________
    LLama Gratis a cualquier PC del Mundo.
    Llamadas a fijos y móviles desde 1 céntimo por minuto.
    http://es.voice.yahoo.com
     
    Miquel Oliete, Nov 12, 2006
    #1
    1. Advertising

  2. --------------enig36A9690EDCEC40E778E1437B
    Content-Type: text/plain; charset=ISO-8859-1
    Content-Transfer-Encoding: quoted-printable

    Paul Lutus wrote:
    > It isn't something you can specify in a
    > plain-text file.=20


    Byte order mark?

    A specification it is not, but generally a good hint. There are gotchas
    though if you process it with software that's not Unicode-unaware.

    David Vallner


    --------------enig36A9690EDCEC40E778E1437B
    Content-Type: application/pgp-signature; name="signature.asc"
    Content-Description: OpenPGP digital signature
    Content-Disposition: attachment; filename="signature.asc"

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.5 (MingW32)

    iD8DBQFFVxsOy6MhrS8astoRAjl0AJ43wIhwUKUEMQuacwNrG4k/2K2u+wCfZ+Ml
    XymvLNBfkTBgpx5QCNI5o+M=
    =4xi8
    -----END PGP SIGNATURE-----

    --------------enig36A9690EDCEC40E778E1437B--
     
    David Vallner, Nov 12, 2006
    #2
    1. Advertising

  3. On 11/12/06, David Vallner <> wrote:
    > Paul Lutus wrote:
    > > It isn't something you can specify in a
    > > plain-text file.

    > Byte order mark?


    Not meaningful in UTF-8, since it's all a defined series of bytes
    (it's always the same order on all platforms).

    -austin
    --
    Austin Ziegler * * http://www.halostatue.ca/
    * * http://www.halostatue.ca/feed/
    *
     
    Austin Ziegler, Nov 12, 2006
    #3
  4. --------------enigFDFC7DAF641778185DD2931D
    Content-Type: text/plain; charset=ISO-8859-1
    Content-Transfer-Encoding: quoted-printable

    Austin Ziegler wrote:
    > On 11/12/06, David Vallner <> wrote:
    >> Paul Lutus wrote:
    >> > It isn't something you can specify in a
    >> > plain-text file.

    >> Byte order mark?

    >=20
    > Not meaningful in UTF-8, since it's all a defined series of bytes
    > (it's always the same order on all platforms).
    >=20
    > -austin


    Yes, but it can be used as a "this file is UTF-8" marker by convention.
    And cause problems in software that doesn't recognize the convention,
    for added hilarity.

    David Vallner


    --------------enigFDFC7DAF641778185DD2931D
    Content-Type: application/pgp-signature; name="signature.asc"
    Content-Description: OpenPGP digital signature
    Content-Disposition: attachment; filename="signature.asc"

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.5 (MingW32)

    iD8DBQFFV092y6MhrS8astoRAmNNAJ9BBzoLkBGs4/m9szlsnFk/sMo8kQCfeK7M
    9BIkq6oY+lUEyq8YrCawXk4=
    =DNHt
    -----END PGP SIGNATURE-----

    --------------enigFDFC7DAF641778185DD2931D--
     
    David Vallner, Nov 12, 2006
    #4
  5. On 11/12/06, David Vallner <> wrote:
    > Austin Ziegler wrote:
    > > On 11/12/06, David Vallner <> wrote:
    > >> Paul Lutus wrote:
    > >> > It isn't something you can specify in a
    > >> > plain-text file.
    > >> Byte order mark?

    > > Not meaningful in UTF-8, since it's all a defined series of bytes
    > > (it's always the same order on all platforms).

    > Yes, but it can be used as a "this file is UTF-8" marker by convention.
    > And cause problems in software that doesn't recognize the convention,
    > for added hilarity.


    It's a bad convention, because it adds meaningless bytes to the
    beginning of a file. I'm not saying that an unadorned document is
    better, but better to do something that has actual meaning than doing
    a pointless BOM.

    -austin
    --
    Austin Ziegler * * http://www.halostatue.ca/
    * * http://www.halostatue.ca/feed/
    *
     
    Austin Ziegler, Nov 12, 2006
    #5
  6. Miquel Oliete

    Dido Sevilla Guest

    On 11/12/06, Miquel Oliete <> wrote:
    > Hi All
    >
    > I have a problem (newbie problem).
    >
    > I don't know how to write a file using utf-8 encoding. Can you help
    > me.


    Well, how are you storing the Unicode characters are you using
    internally? If your Unicode string within Ruby is stored as an array
    of ints, then

    File.open("output_file.utf8") do |fp|
    fp.puts(data.pack("U*"))
    end

    should be sufficient. If you have a Ruby string that uses some other
    encoding (e.g. ISO-8859-1), then you must use the iconv library to
    convert the string to UTF-8:

    require 'iconv'

    cd = Iconv.new('utf-8', 'iso-8859-1')
    File.open("output_file.utf8") do |fp|
    fp.puts(cd.iconv(data))
    end

    When you do i18n, l10n, and m17n, strings become meaningless unless
    they have an attached encoding.
     
    Dido Sevilla, Nov 12, 2006
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. JJBW
    Replies:
    1
    Views:
    10,546
    Joerg Jooss
    Apr 24, 2004
  2. =?Utf-8?B?QXNoYQ==?=
    Replies:
    3
    Views:
    463
  3. Luis Esteban Valencia
    Replies:
    0
    Views:
    2,556
    Luis Esteban Valencia
    Jan 6, 2005
  4. moonhkt
    Replies:
    18
    Views:
    2,634
    Roedy Green
    Feb 5, 2010
  5. Kioko --
    Replies:
    3
    Views:
    351
    Walton Hoops
    Mar 24, 2010
Loading...

Share This Page