Discussion in 'Ruby' started by Miquel Oliete, Nov 13, 2006.

  1. Hi all

    How can I convert from utf-8 to HMTL ampersand entities and from HTML
    ampersand entities to utf-8 (I'm searching it a lot but I can found it)?

    Thanks in advance


    Miquel Oliete, Nov 13, 2006
    iconv from UTF8 to UTF16, add up the two bytes, pray it wasn't a
    surrogate pair, and convert to hex?

    David Vallner

    David Vallner, Nov 13, 2006
  3. UTF-8 to HTML convertion is trivial.
    HTML to UTF-8 is almost trivial, you just need to decide the set of
    supported &-entities. This example handles only  and &,
    but it should be obvious how to extend it to other entities (if you
    want to do so).

    class String
    def utf8_to_html
    gsub(/([^\000-\177])|(&)/u) {
    if $2
    sprintf("&#x%x;", $1.unpack("U")[0])
    def html_to_utf8
    gsub(/&(?:#x([0-9a-fA-F]+)|(amp));/) {
    if $2
    [$1.hex].pack "U"
    Tomasz Wegrzanowski, Nov 14, 2006
    Thanks to everybody. I will try your answers tonight at home.


    Miquel, Nov 14, 2006
