Best practice for translating web page character data so that page will be scrapable/e-mailable

Discussion in 'ASP .Net' started by Guest, Aug 17, 2007.

  1. Guest

    Guest Guest

    I have web pages that I periodically want to a) programmatically "scrape",
    and b) programmatically send in e-mail. These web pages are built via
    content management systems and occassionally have Word "curly quotation
    marks" and other weird entities embedded in them.

    If you fail to translate characters properly, you have the familiar problem
    of some characters turning into question marks when sent in e-mail and/or
    scraped. You will see this problem all of the time on web-based newsletters
    and the like.

    When I was working in classic ASP, I wrote "translate" functions that would
    render weird characters into their safe equivalents using a simple string
    "replace". This was a limited solution because it was premised on my ability
    to identify all of the problematic characters myself and translate them.

    I am wondering if there is an all-in-one solution to this problem inside or
    outside of the .NET framework. I have read a bit about the character
    encoding classes and I'm hoping that one of them represent a complete
    solution to my problem.

    Can anyone offer any guidance?

    Thanks,
    -KF
    Guest, Aug 17, 2007
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Corobori
    Replies:
    5
    Views:
    352
    Corobori
    May 7, 2005
  2. Replies:
    2
    Views:
    528
  3. Alan Silver
    Replies:
    13
    Views:
    553
    Alan Silver
    Jun 29, 2006
  4. Harald Massa
    Replies:
    4
    Views:
    334
    Harald Massa
    Mar 12, 2005
  5. oldyork90
    Replies:
    1
    Views:
    156
    Jeremy J Starcher
    Sep 10, 2008
Loading...

Share This Page