character encoding question

Discussion in 'Ruby' started by Amishera Amishera, Mar 26, 2010.

  1. I have an html file which is encoded in UTF-8. The file contains the
    following text:

    It's a wonderful life

    now the character code 39 is for aphostrohpe in UTF8. so suppose I got
    the 39 out of the text using:

    s="It's a wonderful life"

    s.gsub(/&#(\d+);/, '\1')

    The output is

    It39s a wonderful life

    So firstly I am having trouble making it

    It\39s a wonderful life

    Secondly I manually did this in test_utf8.rb:

    puts "It\39s a wonderful life"

    and ran it

    ruby test_utf8.rb > utf8.txt

    but by opening it in the open office by setting the encoding to utf-8
    the output is

    It#9s a wonderful life

    So how to correctly parse the collect and convert html character
    reference to encoded charcters in utf-8 and then save file?

    Thanks.
    --
    Posted via http://www.ruby-forum.com/.
    Amishera Amishera, Mar 26, 2010
    #1
    1. Advertising


  2. > s="It's a wonderful life"


    I stumbled across this:
    -----------------------

    require 'cgi'
    s=CGI.unescapeHTML("It's a wonderful life")


    -----------------------
    David
    --
    Posted via http://www.ruby-forum.com/.
    David Springer, Mar 26, 2010
    #2
    1. Advertising

  3. try something like this:
    -------------------------------------
    require 'cgi'
    s="UPPERCASE Russian Alphabet\n".encode('utf-8')
    s+=CGI.unescapeHTML("АБВГ".encode('utf-8'))
    s+=CGI.unescapeHTML("ДЕЖЗ".encode('utf-8'))
    s+=CGI.unescapeHTML("ИЙКЛ".encode('utf-8'))
    s+=CGI.unescapeHTML("МНОП".encode('utf-8'))
    s+=CGI.unescapeHTML("РСТУ".encode('utf-8'))
    s+=CGI.unescapeHTML("ФХЦЧ".encode('utf-8'))
    s+=CGI.unescapeHTML("ШЩЪЫ".encode('utf-8'))
    s+=CGI.unescapeHTML("ЬЭЮЯ".encode('utf-8'))
    puts s
    -------------------------------------
    David
    --
    Posted via http://www.ruby-forum.com/.
    David Springer, Mar 26, 2010
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Hardy Wang

    Encoding.Default and Encoding.UTF8

    Hardy Wang, Jun 8, 2004, in forum: ASP .Net
    Replies:
    5
    Views:
    18,781
    Jon Skeet [C# MVP]
    Jun 9, 2004
  2. raavi
    Replies:
    2
    Views:
    890
    raavi
    Mar 2, 2006
  3. Jake Barnes
    Replies:
    4
    Views:
    3,245
    Jake Barnes
    Dec 5, 2005
  4. KwikRick
    Replies:
    1
    Views:
    349
    Christos TZOTZIOY Georgiou
    Aug 22, 2003
  5. Mark

    xml, character encoding, asp question

    Mark, Mar 7, 2005, in forum: ASP General
    Replies:
    7
    Views:
    233
    Tony Proctor
    May 5, 2005
Loading...

Share This Page