how to remove strange characters

Discussion in 'Ruby' started by Li Chen, Oct 7, 2008.

  1. Li Chen

    Li Chen Guest

    Hi all,

    I grap some info from a webpage. Sometimes I get some stranges
    characters as follows (by p):
    To depart in a hurry; abscond: \342\200\234Your horse
    has\nabsquatulated!\342\200\235 (Robert M. Bird) To die.

    or (by print):
    To depart in a hurry; abscond: “Your horse has absquatulated!â€Â
    (Robert M. Bird) To die.

    Any idea to to get rid of them?


    Thanks,

    Li
    --
    Posted via http://www.ruby-forum.com/.
     
    Li Chen, Oct 7, 2008
    #1
    1. Advertising

  2. Li Chen

    Li Chen Guest

    Stephen Celis wrote:

    > Those are multi-byte characters (curly quotes, in this case). You
    > probably don't want to get rid of them, but you can use the iconv
    > library to transliterate them back to their ASCII almost-equivalents:
    >
    >>> string = "To depart in a hurry; abscond: \342\200\234Your horse has\nabsquatulated!\342\200\235 (Robert M. Bird) To die."

    > => "To depart in a hurry; abscond: \342\200\234Your horse
    > has\nabsquatulated!\342\200\235 (Robert M. Bird) To die."
    >>> require 'iconv'

    > => true
    >>> puts Iconv.iconv('ascii//translit', 'utf-8', string).to_s

    > To depart in a hurry; abscond: "Your horse has
    > absquatulated!" (Robert M. Bird) To die.
    > => nil
    >
    > Stephen


    Thank you,

    Li
    --
    Posted via http://www.ruby-forum.com/.
     
    Li Chen, Oct 8, 2008
    #2
    1. Advertising

  3. Li Chen

    Li Chen Guest

    Hi Stephen and others,

    Iconv only works for some characters. It doesn't work for the following
    scripts.

    Any idea?

    Thanks,

    Li


    C:\Users\Alex>irb
    irb(main):001:0> require 'iconv'
    => true
    irb(main):002:0> string1="Fatal injury or ruin:\223Hath some fond lover
    tic'd thee to thy bane?\224
    \342\200\246"
    => "Fatal injury or ruin:\223Hath some fond lover tic'd thee to thy
    bane?\224\342\200\246"
    irb(main):003:0> puts
    Iconv.iconv('ASCII//TRANSLIT','utf-8',string1).to_s
    Iconv::IllegalSequence: "\223Hath some fond "...
    from (irb):3:in `iconv'
    from (irb):3
    irb(main):004:0>





    --
    Posted via http://www.ruby-forum.com/.
     
    Li Chen, Oct 8, 2008
    #3
  4. Li Chen

    Pablo Q. Guest

    [Note: parts of this message were removed to make it a legal post.]

    what do you think doing something like this?

    class String
    def remove_nonascii(replacement)
    n=self.split("")
    self.slice!(0..self.size)
    n.each{|b|
    if (b[0].to_i< 32 || b[0].to_i>124) then
    self.concat(replacement)
    elsif
    [34,35,37,42,43,44,45,47,60,61,62,63,91,92,93,94,96,123].include?(b[0].to_i)
    self.concat(replacement)
    else
    self.concat(b)
    end
    }
    self.to_s
    end
    end

    "Fatal injury or ruin:\223Hath some fond lover tic'd thee to
    thybane?\224\342\200\246".remove_nonascii('+')

    => "Fatal injury or ruin:+Hath some fond lover tic'd thee to thybane+++++"

    how you can see, it made the replacement with char '+'.


    2008/10/8 Li Chen <>

    > Hi Stephen and others,
    >
    > Iconv only works for some characters. It doesn't work for the following
    > scripts.
    >
    > Any idea?
    >
    > Thanks,
    >
    > Li
    >
    >
    > C:\Users\Alex>irb
    > irb(main):001:0> require 'iconv'
    > => true
    > irb(main):002:0> string1="Fatal injury or ruin:\223Hath some fond lover
    > tic'd thee to thy bane?\224
    > \342\200\246"
    > => "Fatal injury or ruin:\223Hath some fond lover tic'd thee to thy
    > bane?\224\342\200\246"
    > irb(main):003:0> puts
    > Iconv.iconv('ASCII//TRANSLIT','utf-8',string1).to_s
    > Iconv::IllegalSequence: "\223Hath some fond "...
    > from (irb):3:in `iconv'
    > from (irb):3
    > irb(main):004:0>
    >
    >
    >
    >
    >
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >



    --
    Pablo Q.
     
    Pablo Q., Oct 8, 2008
    #4
  5. Li Chen

    Nit Khair Guest

    Li Chen wrote:
    > Hi all,
    >
    > I grap some info from a webpage. Sometimes I get some stranges
    > characters as follows (by p):
    > To depart in a hurry; abscond: \342\200\234Your horse
    > has\nabsquatulated!\342\200\235 (Robert M. Bird) To die.


    Here's a quick hack I used recently. It was messing my display on
    ncurses, and I did not need the characters.

    dataitem.gsub!(/[^[:space:][:print:]]/,'')

    I got this while googling, iirc, its used somewhere in ROR.
    --
    Posted via http://www.ruby-forum.com/.
     
    Nit Khair, Oct 9, 2008
    #5
  6. Li Chen

    Li Chen Guest

    Nit Khair wrote:
    > Here's a quick hack I used recently. It was messing my display on
    > ncurses, and I did not need the characters.
    >
    > dataitem.gsub!(/[^[:space:][:print:]]/,'')
    >
    > I got this while googling, iirc, its used somewhere in ROR.


    It works on scenario where iconv doesn't work. Good job!!!

    Li

    --
    Posted via http://www.ruby-forum.com/.
     
    Li Chen, Oct 9, 2008
    #6
  7. Li Chen

    Bilyk, Alex Guest

    Installing 1.9 from binaries on Windows Qs

    There is no one-click installer for 1.9 on Windows as far as I can tell. Do=
    wnloading and unpacking the ziped binaries didn't get me very far as both r=
    uby and irb complain that something is missing. Does binary distribution re=
    quire me to install anything else? Like libraries? If this is the case what=
    additional stuff do I need to make 1.9 to work and where can I get it?

    Thanks,
    Alex
     
    Bilyk, Alex, Oct 10, 2008
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Simon-Pierre  Jarry
    Replies:
    2
    Views:
    2,405
    Henrik
    Aug 10, 2005
  2. Radovan Garabik

    Re: remove special characters from line

    Radovan Garabik, Jul 1, 2003, in forum: Python
    Replies:
    0
    Views:
    698
    Radovan Garabik
    Jul 1, 2003
  3. Egor Bolonev

    Re: remove special characters from line

    Egor Bolonev, Jul 1, 2003, in forum: Python
    Replies:
    2
    Views:
    556
    Chris Rennert
    Jul 1, 2003
  4. tshad
    Replies:
    6
    Views:
    21,515
    tshad
    Aug 8, 2006
  5. rvino
    Replies:
    0
    Views:
    4,696
    rvino
    Aug 14, 2007
Loading...

Share This Page