Wikipedia - conversion of in SQL database stored data to HTML

C

Claudio Grondi

Is there an already available script/tool able to
extract records and generate proper HTML
code out of the data stored in the Wikipedia
SQL data base?
e.g.
converting all occurences of
[[xxx|yyy]] to <a href=xxx>yyy</a>
etc.
Or even better a script/tool able to generate
and write to the disk all the HTML files
if given the YYYYMMDDD_cur_table.sql
data, so that the Wikipedia content
becomes available on local computer
without running a server?

By the way:
has someone succeeded in installation of
a local Wikipedia server? As I remember the
problem caused me to fail on this was that
mySQL server was not able to handle a
database larger than 2 GByte
(the english part of current data and
usually the ..._old_table.sql exceed
this size).

Claudio
 
L

Leif K-Brooks

Claudio said:
Is there an already available script/tool able to extract records and
generate proper HTML code out of the data stored in the Wikipedia SQL
data base?

They're not in Python, but there are a couple of tools available here:
By the way: has someone succeeded in installation of a local
Wikipedia server?

I loaded all of the Wikipedia data into a local MySQL server a while
back without any problems. I haven't attempted to run Mediawiki on top
of that, but I don't see why that wouldn't work.
 
C

Claudio Grondi

<http://tinyurl.com/692pt> redirects (if not just down) to
http://en.wikipedia.org/wiki/Wikipe...L_tree_dumps_for_mirroring_or_CD_distribution

I see from this page only one tool (not a couple) which is available
to download and use:

http://www.tommasoconforti.com/ the home of Wiki2static

Wiki2static (version 0.61, 02th Aug 2004)
http://www.tommasoconforti.com/wiki/wiki2static.tar.gz
is a Perl script to convert a Wikipedia SQL dump
into an html tree suitable for offline browsing or CD distribution.

I failed to find any documentation, so was forced to play
directly with the script settings myself:

$main_prefix = "u:/WikiMedia-Static-HTML/";
$wiki_language = "pl";

and running (in the current directory of the script):
\> wiki2static.pl Q:\WikiMedia-MySQL-Dump\pl\20040727_cur_table.sql
to test the script on a file with small (112 MByte)
size of the SQL dump .

The script is running now for over half an hour
and has created yet 1.555 folders and
generated 527 files with a total size of 6 MBytes
consuming only 16 seconds of CPU time.
I estimate the time until the script is ready to appr.
6 hours for a 100 MByte file, which gives 120 hours
for a 2 GByte file of the english dump ...

Any further hints? What am I doing wrong?

(There are now 1.627 folders and 1.307 files with
a total size of 15.6 MB after one hour runtime and
consumption of 20 seconds CPU time even if
I increased the priority of the process to high
on my W2K box running perl 5.8.3 half an hour
ago)

Claudio
P.S.What was the size of the dump file imported to
the MySQL database? Importing only the current
version which was "a while back" smaller
than 2 GByte (skipping the history dump)
causes no problems with MySQL.
 
C

Claudio Grondi

$main_prefix = "u:/WikiMedia-Static-HTML/";
$wiki_language = "pl";
The script is running now for over half an hour
and has created yet 1.555 folders and
generated 527 files with a total size of 6 MBytes
consuming only 16 seconds of CPU time.
I estimate the time until the script is ready to appr.
6 hours for a 100 MByte file, which gives 120 hours
for a 2 GByte file of the english dump ...
Any further hints? What am I doing wrong?

Inbetween I have noticed, that the script started to
download media files from the Internet. Setting
$include_media = 2;
in the script solved the problem.

Thank you Leif for pointing me to
http://tinyurl.com/692pt

What I am still missing is a binary of texvc for
Windows. Have maybe someone a ready-to-use
compiled version or can point me to one?

Conversion from Perl to Python seems (except a
service provided by
http://www.crazy-compilers.com/bridgekeeper/ )
to be not (yet) available and Perl syntax is for me
so far away from what I already know, that
I see currently no chance to come up with
a Python version of
http://www.tommasoconforti.com/wiki/wiki2static.tar.gz

Claudio
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top