How do I correctly download Wikipedia pages?

Steven D'Aprano · Nov 25, 2009

I'm trying to scrape a Wikipedia page from Python. Following instructions
here:

http://en.wikipedia.org/wiki/Wikipedia:Database_download
http://en.wikipedia.org/wiki/Special:Export

I use the URL "http://en.wikipedia.org/wiki/Special:Export/Train" instead
of just "http://en.wikipedia.org/wiki/Train". But instead of getting the
page I expect, and can see in my browser, I get an error page:

....
Our servers are currently experiencing a technical problem. This is
probably temporary and should be fixed soon
....

(Output is obviously truncated for your sanity and mine.)

Is there a trick to downloading from Wikipedia with urllib?

ShoqulKutlu · Nov 25, 2009

Hi,

Try not to be caught if you send multiple requests

Have a look at here: http://wolfprojects.altervista.org/changeua.php

Regards
Kutlu

Steven D'Aprano · Nov 25, 2009

Hi,

Try not to be caught if you send multiple requests

Have a look at here: http://wolfprojects.altervista.org/changeua.php

Thanks, that seems to work perfectly.

Cousin Stanley · Nov 26, 2009

I'm trying to scrape a Wikipedia page from Python.
....

On occasion I use a program under Debian Linux
called wikipedia2text that is very handy
for downloading wikipedia pages as plain text files ....

Description: displays Wikipedia articles on the command line

This script fetches Wikipedia articles (currently supports
around 30 Wikipedia languages) and displays them as plain text
in a pager or just sends the text to standard out. Alternatively
it opens the Wikipedia article in a (possibly GUI) web browser
or just shows the URL of the appropriate Wikipedia article.

Example directed through the lynx browser ....

wp2t -b lynx gorilla > gorilla.txt

Script to fetch Wikipedia text	4	Oct 11, 2006
FLV download script works, but I want to enhance it	3	May 5, 2009
Help with my responsive home page	2	Dec 14, 2022
FAQ 8.10 How do I read and write the serial port?	0	Jan 15, 2011
How do I use ASP to download a file that has particular ntfs permissions	4	Jul 12, 2004
SQL Server and .NET Interview questions free download	0	Oct 28, 2006
Download the JAVA , .NET and SQL Server interview with answers	0	Sep 14, 2006
<faqentry>How do I convert a Number into a String with exactly 2 decimal places?</faqentry>	19	Jan 28, 2007

How do I correctly download Wikipedia pages?

Steven D'Aprano

ShoqulKutlu

Steven D'Aprano

Cousin Stanley

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads