M
Mike
Hi, I am using Python to scrape web pages and I do not have problem
unless I run into a site that is utf-8. It seems & is changed to &
when the site is utf-8.
If I try to replace it with .replace('&','&') it for some reason
does not replace it.
For example: http://today.reuters.co.uk/news/default.aspx
The url in the page looks like this
http://today.reuters.co.uk/news/New...423599_RTRUKOC_0_UK-BRITAIN-CONSERVATIVES.xml
However when I pull it into python the URL ends up looking like this
(notice the & instead of just & in the URL)
http://today.reuters.co.uk/news/new...11_RTRUKOC_0_UK-CONSTRUCTION-BPB-STGOBAIN.xml
Any ideas?
unless I run into a site that is utf-8. It seems & is changed to &
when the site is utf-8.
If I try to replace it with .replace('&','&') it for some reason
does not replace it.
For example: http://today.reuters.co.uk/news/default.aspx
The url in the page looks like this
http://today.reuters.co.uk/news/New...423599_RTRUKOC_0_UK-BRITAIN-CONSERVATIVES.xml
However when I pull it into python the URL ends up looking like this
(notice the & instead of just & in the URL)
http://today.reuters.co.uk/news/new...11_RTRUKOC_0_UK-CONSTRUCTION-BPB-STGOBAIN.xml
Any ideas?