replacing diacritics by simple character

Discussion in 'Ruby' started by Une Bévue, Sep 25, 2007.

  1. Une Bévue

    Une Bévue Guest

    do u know of a way to replace diacritics by simple character (ie. : é
    -o-> e)

    the same with ligatures (ie. : Æ -o-> AE )

    using tables ?



    --
    Une Bévue
    Une Bévue, Sep 25, 2007
    #1
    1. Advertising

  2. Une Bévue

    F. Senault Guest

    Le 25 septembre à 18:25, Une Bévue a écrit :

    (Hello again... :) )

    > do u know of a way to replace diacritics by simple character (ie. : é
    > -o-> e)
    >
    > the same with ligatures (ie. : Æ -o-> AE )
    >
    > using tables ?


    IConv can do that for you :

    >> require "iconv"

    => true
    >> i = Iconv.new("ASCII//TRANSLIT", "ISO-8859-15")

    => #<Iconv:0x84d4448>
    >> i.iconv("aéouï Æ")

    => "a'eou"i AE"
    >> i.iconv("aéouï Æ").gsub(/[^a-zA-Z0-9 ]/, '')

    => "aeoui AE"

    Fred
    --
    I've found an axe can do a lot for a paper-mangling printer. Especially
    if you shout for one at the top of your voice, and then a cow orker
    brings you said instrument. Suddenly, no more paper jams.
    (Kai Henningsen in the SDM)
    F. Senault, Sep 25, 2007
    #2
    1. Advertising

  3. > --
    > I've found an axe can do a lot for a paper-mangling printer. Especially
    > if you shout for one at the top of your voice, and then a cow orker

    --------------------------------------------------------------------------------------^
    ???
    > brings you said instrument. Suddenly, no more paper jams.
    > (Kai Henningsen in the SDM)
    >
    >


    :D
    Michal Suchanek, Sep 25, 2007
    #3
  4. Une Bévue

    Une Bévue Guest

    F. Senault <> wrote:

    > IConv can do that for you :
    >
    > >> require "iconv"

    > => true
    > >> i = Iconv.new("ASCII//TRANSLIT", "ISO-8859-15")

    > => #<Iconv:0x84d4448>
    > >> i.iconv("aéouï Æ")

    > => "a'eou"i AE"
    > >> i.iconv("aéouï Æ").gsub(/[^a-zA-Z0-9 ]/, '')

    > => "aeoui AE"


    Fine thanks a lot Fred à c't'heure ;-)

    Have a good wine celler ;-)

    ça marche même avec de l'UTF-8

    works also with UTF-8
    --
    Une Bévue
    Une Bévue, Sep 25, 2007
    #4
  5. Une Bévue

    PA Guest

    On Sep 25, 2007, at 18:55, F. Senault wrote:

    >> do u know of a way to replace diacritics by simple character (ie. : =

    =C3=A9
    >> -o-> e)
    >>
    >> the same with ligatures (ie. : =C3=86 -o-> AE )
    >>
    >> using tables ?

    >
    > IConv can do that for you :


    An alternative approach is something like Sean M. Burke's=20
    Text::Unidecode:

    http://interglacial.com/~sburke/tpj/as_html/tpj22.html
    http://search.cpan.org/~sburke/Text-Unidecode-0.04/lib/Text/Unidecode.pm


    Here is an example of an implementation of Unidecode in Lua [1]:

    local Unidecode =3D require( 'Unidecode' )

    print( Unidecode( '=D0=9C=D0=BE=D1=81=D0=BA=D0=B2=D0=B0=CC=81' ) )
    print( Unidecode( '=E5=8C=97=E4=BA=AC' ) )
    print( Unidecode( '=E1=BC=88=CE=B8=CE=B7=CE=BD=E1=BE=B6' ) )
    print( Unidecode( '=EC=84=9C=EC=9A=B8' ) )
    print( Unidecode( '=E6=9D=B1=E4=BA=AC' ) )
    print( Unidecode( '=E4=BA=AC=E9=83=BD=E5=B8=82' ) )
    print( Unidecode( '=E0=A4=A8=E0=A5=87=E0=A4=AA=E0=A4=BE=E0=A4=B2' ) )
    print( Unidecode( '=D7=AA=D6=B5=D6=BC=D7=9C=D6=BE=D7=90=D6=B8=D7=91=D6=B4=D7=
    =99=D7=91-=D7=99=D6=B8=D7=A4=D7=95=D6=B9' ) )
    print( Unidecode( '=D8=AA=D9=8E=D9=84=D9=92 =D8=A3=D9=8E=D8=A8=D9=90=D9=8A=
    =D8=A8=D9=92 =D9=8A=D9=8E=D8=A7=D9=81=D9=8E=D8=A7' ) )
    print( Unidecode( '=D8=AA=D9=87=D8=B1=D8=A7=D9=86' ) )
    print( Unidecode( 'G=C3=A9ometrie Diff=C3=A9rentielle' ) )

    > Moskva
    > beijing
    > Athena
    > seoul
    > dongjing
    > jingdushi
    > nepaal
    > te'labiyb-yapvo
    > tal 'abiyb yaafaa
    > thran
    > Geometrie Differentielle


    Cheers,

    PA.

    [1] http://dev.alt.textdrive.com/browser/HTTP/Unidecode.lua=
    PA, Sep 25, 2007
    #5
  6. Une Bévue

    F. Senault Guest

    Le 25 septembre à 20:12, Michal Suchanek a écrit :

    >> --
    >> I've found an axe can do a lot for a paper-mangling printer. Especially
    >> if you shout for one at the top of your voice, and then a cow orker

    > --------------------------------------------------------------------------------------^
    > ???


    It's intentional. Cow orker was probably a typo in the olden times, but
    has entered the mainstream since then. Just ask google : "Results 1 -
    10 of about 37,200 for "cow orker". (0.19 seconds)" :)

    Fred
    --
    I feel it move across my skin. I'm reaching up and reaching out, I'm
    reaching for the random or what ever will bewilder me. And following
    our will and wind we may just go where no one's been. We'll ride the
    spiral to the end and may just go where no one's been. (Tool, Lateralus)
    F. Senault, Sep 25, 2007
    #6
  7. F. Senault wrote:
    > IConv can do that for you :
    >
    >>> require "iconv"

    > => true
    >>> i = Iconv.new("ASCII//TRANSLIT", "ISO-8859-15")

    > => #<Iconv:0x84d4448>
    >>> i.iconv("aéouï Æ")

    > => "a'eou"i AE"
    >>> i.iconv("aéouï Æ").gsub(/[^a-zA-Z0-9 ]/, '')

    > => "aeoui AE"


    That doesn't work on all platforms. For me:

    >> require "iconv"

    => true
    >> i = Iconv.new("ASCII//TRANSLIT", "UTF-8")

    => #<Iconv:0xb7cf28e0>
    >> i.iconv("aéouï Æ")

    => "a?ou? AE"

    :-(
    Daniel DeLorme, Sep 25, 2007
    #7
  8. Une Bévue

    Une Bévue Guest

    Daniel DeLorme <> wrote:

    >
    > That doesn't work on all platforms. For me:
    >
    > >> require "iconv"

    > => true
    > >> i = Iconv.new("ASCII//TRANSLIT", "UTF-8")

    > => #<Iconv:0xb7cf28e0>
    > >> i.iconv("aéouï Æ")

    > => "a?ou? AE"
    >
    > :-(


    Are u sure about the encoding of "aéouï Æ" ?

    because i did it with UTF-8, it works :

    -- the script ----------------------------------------------------------
    #! /usr/bin/env ruby

    require "iconv"

    i = Iconv.new("ASCII//TRANSLIT", "UTF-8")

    p i.iconv("aéouï Æ")
    # => "a'eou\"i AE"

    p i.iconv("aéouï Æ").gsub(/[^a-zA-Z0-9 ]/, '')
    # => "aeoui AE"

    p i.iconv("Être ou ne pas être, c'est la question. aéouï Æ, wie heiß du
    ?").gsub(/[^a-zA-Z0-9' ]/, '').gsub(/[' ]/, '_').gsub(/(.*)_$/, '\1')
    # => "Etre_ou_ne_pas_etre_c_est_la_question_a_eoui_AE_wie_heiss_du"

    p i.iconv("Être ou ne pas être, c'est la question. aéouï Æ, wie heiß
    du?").gsub(/[^a-zA-Z0-9' ]/, '').gsub(/[' ]/, '_').gsub(/(.*)_$/, '\1')
    # => "Etre_ou_ne_pas_etre_c_est_la_question_a_eoui_AE_wie_heiss_du"
    ------------------------------------------------------------------------
    --
    Une Bévue
    Une Bévue, Sep 25, 2007
    #8
  9. Une Bévue wrote:
    > Daniel DeLorme <> wrote:
    >
    >> That doesn't work on all platforms. For me:
    >>
    >> >> require "iconv"

    >> => true
    >> >> i = Iconv.new("ASCII//TRANSLIT", "UTF-8")

    >> => #<Iconv:0xb7cf28e0>
    >> >> i.iconv("aéouï Æ")

    >> => "a?ou? AE"
    >>
    >> :-(

    >
    > Are u sure about the encoding of "aéouï Æ" ?


    yep.

    >> str = "aéouï Æ"

    => "a\303\251ou\303\257 \303\206" #(that's utf8 allright)
    >> i.iconv(str)

    => "a?ou? AE"

    but like I said, translit doesn't work the same on all platforms (I'm on
    ubuntu btw)

    Daniel
    Daniel DeLorme, Sep 25, 2007
    #9
  10. Une Bévue

    Une Bévue Guest

    Daniel DeLorme <> wrote:

    > but like I said, translit doesn't work the same on all platforms (I'm on
    > ubuntu btw)


    i'm running Mac OS X 10.4.10...
    --
    Une Bévue
    Une Bévue, Sep 26, 2007
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    703
  2. Guest

    encoded diacritics

    Guest, Sep 29, 2004, in forum: XML
    Replies:
    1
    Views:
    731
    Jukka K. Korpela
    Sep 29, 2004
  3. Tor Inge Rislaa

    Replacing character with ASCII code (HTML)

    Tor Inge Rislaa, Nov 13, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    346
    Tor Inge Rislaa
    Nov 13, 2006
  4. jm
    Replies:
    5
    Views:
    3,896
  5. Rob Meade

    Replacing - and not Replacing...

    Rob Meade, Apr 5, 2005, in forum: ASP General
    Replies:
    5
    Views:
    262
    Chris Hohmann
    Apr 11, 2005
Loading...

Share This Page