Saving the web, charset problems and symbols problems

Discussion in 'Ruby' started by Sak Na rede, Jan 30, 2009.

  1. Sak Na rede

    Sak Na rede Guest

    Hi all!

    I think that a lot of ruby scripts are for web crawling, web scrapping
    and many more applications with the web. I'm working with the web too, I
    try to save text of many different webs. In this moment I'm trying to
    solve two problems:

    1 - How to standard the charset of the web. There are a lot of
    differents charsets and I think that it must be possible another
    solution that see every charset and convert to proper charset each time.
    (By the way, what is the best method to see charset of a file? command
    file is not very good, I think)

    2 - How to convert HTML to plain text. I use Hpricot but a lot of very
    rare simbols continues there like "€" or "”". Wich is the most used
    method?

    Thanks a lot
    --
    Posted via http://www.ruby-forum.com/.
     
    Sak Na rede, Jan 30, 2009
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John Kandell
    Replies:
    4
    Views:
    4,169
    eeebop
    Dec 10, 2004
  2. Luis Esteban Valencia
    Replies:
    0
    Views:
    2,520
    Luis Esteban Valencia
    Jan 6, 2005
  3. Sid Ismail

    Symbols charset problem

    Sid Ismail, Jun 12, 2006, in forum: HTML
    Replies:
    21
    Views:
    3,496
    Jonathan N. Little
    Jun 13, 2006
  4. Lovely Angel For You

    Saving Images While Saving ASP Pages !

    Lovely Angel For You, Oct 2, 2003, in forum: ASP General
    Replies:
    1
    Views:
    210
    Curt_C [MVP]
    Oct 3, 2003
  5. optimistx

    javascript charset <> page charset

    optimistx, Aug 14, 2008, in forum: Javascript
    Replies:
    2
    Views:
    279
    optimistx
    Aug 15, 2008
Loading...

Share This Page