A question about Iconv arguments

Discussion in 'Ruby' started by Axel Etzold, Jun 9, 2007.

  1. Axel Etzold

    Axel Etzold Guest

    Dear all,

    I need to convert some accented text, and I would like to know
    what arguments I have to give Iconv to produce the desired output.
    E.g., in Italian, the word for Friday is "venerdi", where the
    "i" carries a dash (small i with grave accent).
    If you type this into Wikipedia search in Italian
    (which I believed to be in utf-8 encoding),
    it will load:

    http://it.wikipedia.org/wiki/Venerdì ,

    yet this syntax:

    converted_doc = Iconv.new(output_encoding, input_encoding).iconv(doc)

    gives me "venerd\303\254" when I convert from latin1 encoding.

    What arguments do I have to use ?

    Thank you,

    Best regards,

    Axel




    --
    GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
    Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail
     
    Axel Etzold, Jun 9, 2007
    #1
    1. Advertising

  2. Axel Etzold

    Alex Young Guest

    Axel Etzold wrote:
    > Dear all,
    >
    > I need to convert some accented text, and I would like to know
    > what arguments I have to give Iconv to produce the desired output.
    > E.g., in Italian, the word for Friday is "venerdi", where the
    > "i" carries a dash (small i with grave accent).
    > If you type this into Wikipedia search in Italian
    > (which I believed to be in utf-8 encoding),
    > it will load:
    >
    > http://it.wikipedia.org/wiki/Venerdì ,
    >
    > yet this syntax:
    >
    > converted_doc = Iconv.new(output_encoding, input_encoding).iconv(doc)
    >
    > gives me "venerd\303\254" when I convert from latin1 encoding.

    That looks right to me - if I write that into a UTF-8 HTML document, it
    displays correctly. What are you expecting?

    --
    Alex
     
    Alex Young, Jun 9, 2007
    #2
    1. Advertising

  3. Axel Etzold

    Axel Etzold Guest

    Dear Alex,

    thank you for responding.
    If I try to get a webpage that has accents in its address,
    like

    > require "rubygems"
    > require "rio"
    > require 'iconv'
    > output_encoding = 'utf-8'
    > doc="Venerdì"
    > converted_doc = Iconv.new(output_encoding, 'latin1').iconv(doc)
    > rio("http://www.wikipedia.org/wiki/" + converted_doc)>rio("a.html")


    I get an error message:

    /usr/local/lib/ruby/1.8/uri/common.rb:436:in `split': bad URI(is not URI?): http://www.wikipedia.org/wiki/venerdì (URI::InvalidURIError)
    from /usr/local/lib/ruby/1.8/uri/common.rb:485:in `parse'
    from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/withpath.rb:285:in `uri_from_string_'
    from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/uri.rb:74:in `arg0_info_'
    from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/uri.rb:83:in `init_from_args_'
    from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/uri.rb:56:in `initialize'
    from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/base.rb:80:in `new'
    from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/base.rb:80:in `parse'
    from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/builder.rb:111:in `build'
    from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/factory.rb:412:in `create_state'
    from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio.rb:65:in `initialize'
    from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio.rb:76:in `new'
    from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio.rb:76:in `rio'
    from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/kernel.rb:42:in `rio'


    This doesn't happen if I type in:

    rio("http://www.wikipedia.org/wiki/Venerd%C3%AC")>rio("a.html")

    So I need to know what conversion arguments I need to give Iconv to
    turn "Venerdì" into "Venerd%C3%AC".

    Best regards,

    Axel
    --
    Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
    Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
     
    Axel Etzold, Jun 10, 2007
    #3
  4. Axel Etzold

    Axel Etzold Guest

    I've managed to solve this problem like this:

    require "rubygems"
    require "rio"
    require 'iconv'


    def to_hex(number)
    number=number.abs
    binary=''
    while number>0
    digit=number%16
    if digit<10
    binary<<digit.to_s
    elsif digit==10
    binary<<'A%'
    elsif digit==11
    binary<<'B%'
    elsif digit==12
    binary<<'C%'
    elsif digit==13
    binary<<'D%'
    elsif digit==14
    binary<<'E%'
    elsif digit==15
    binary<<'F%'
    end
    number=(number-digit)/16
    end
    return binary.reverse.gsub(/%([A-F])%([A-F])/,'%\1\2')
    end

    class String
    def wiki_addr
    converted_doc = Iconv.new('utf-8', 'latin1').iconv(self)
    res=''
    converted_doc.split(//).each{|x|
    if /[a-zA-Z0-9\_ ]/.match(x)
    res<<x
    else
    res<<to_hex(x[0])
    end
    }
    return res
    end
    end


    doc ="venerdì"
    doc.wiki_addr
    rio("http://it.wikipedia.org/wiki/"+ doc.wiki_addr)>rio("a.html")

    Best regards,

    Axel
    --
    Psssst! Schon vom neuen GMX MultiMessenger gehört?
    Der kanns mit allen: http://www.gmx.net/de/go/multimessenger
     
    Axel Etzold, Jun 10, 2007
    #4
  5. Axel Etzold wrote:
    > I've managed to solve this problem like this:
    >
    > require "rubygems"
    > require "rio"
    > require 'iconv'
    >
    >
    > def to_hex(number)
    > number=number.abs
    > binary=''
    > while number>0
    > digit=number%16
    > if digit<10
    > binary<<digit.to_s
    > elsif digit==10
    > ...


    I guess you're not aware of neither:
    1234.to_s(16)
    nor:
    "%x" % 1234

    For situations like the above, even a lookup-array or a case/when would
    be better.

    Regards
    Stefan

    --
    Posted via http://www.ruby-forum.com/.
     
    Stefan Rusterholz, Jun 10, 2007
    #5
  6. Axel Etzold

    Axel Etzold Guest

    Dear Stefan,

    thank you for bringing this to notice!
    (Slightly varying Voltaire, I might
    have been able to write a shorter
    program had I had more leisure and
    more knowledge).
    I'll try your suggestion.
    Best regards,

    Axel
    --
    Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
    Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
     
    Axel Etzold, Jun 10, 2007
    #6
  7. Hi,

    At Sun, 10 Jun 2007 18:05:49 +0900,
    Axel Etzold wrote in [ruby-talk:254981]:
    > I've managed to solve this problem like this:


    $ ruby -riconv -rcgi -e 'puts CGI.escape(Iconv.conv("utf-8", "latin1", "venerd\354"))'
    venerd%C3%AC

    --
    Nobu Nakada
     
    Nobuyoshi Nakada, Jun 11, 2007
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Strong IsOnlyWord

    How to fix the bug about iconv for python?

    Strong IsOnlyWord, Dec 26, 2005, in forum: Python
    Replies:
    1
    Views:
    654
    Strong IsOnlyWord
    Dec 26, 2005
  2. yong

    about iconv

    yong, Mar 13, 2006, in forum: C Programming
    Replies:
    1
    Views:
    388
    Vladimir S. Oka
    Mar 13, 2006
  3. kp
    Replies:
    5
    Views:
    476
  4. Tim Ferrell

    Iconv.iconv and Windows XP

    Tim Ferrell, Oct 2, 2005, in forum: Ruby
    Replies:
    4
    Views:
    465
    nobuyoshi nakada
    Oct 4, 2005
  5. Krzysztof Cierpisz

    iconv "\n" (Iconv::InvalidCharacter)

    Krzysztof Cierpisz, Sep 8, 2009, in forum: Ruby
    Replies:
    0
    Views:
    205
    Krzysztof Cierpisz
    Sep 8, 2009
Loading...

Share This Page