Wikipedia - conversion of in SQL database stored data to HTML

Discussion in 'Python' started by Claudio Grondi, Mar 21, 2005.

  1. Is there an already available script/tool able to
    extract records and generate proper HTML
    code out of the data stored in the Wikipedia
    SQL data base?
    e.g.
    converting all occurences of
    [[xxx|yyy]] to <a href=xxx>yyy</a>
    etc.
    Or even better a script/tool able to generate
    and write to the disk all the HTML files
    if given the YYYYMMDDD_cur_table.sql
    data, so that the Wikipedia content
    becomes available on local computer
    without running a server?

    By the way:
    has someone succeeded in installation of
    a local Wikipedia server? As I remember the
    problem caused me to fail on this was that
    mySQL server was not able to handle a
    database larger than 2 GByte
    (the english part of current data and
    usually the ..._old_table.sql exceed
    this size).

    Claudio
     
    Claudio Grondi, Mar 21, 2005
    #1
    1. Advertising

  2. Claudio Grondi wrote:
    > Is there an already available script/tool able to extract records and
    > generate proper HTML code out of the data stored in the Wikipedia SQL
    > data base?


    They're not in Python, but there are a couple of tools available here:
    <http://tinyurl.com/692pt>.

    > By the way: has someone succeeded in installation of a local
    > Wikipedia server?


    I loaded all of the Wikipedia data into a local MySQL server a while
    back without any problems. I haven't attempted to run Mediawiki on top
    of that, but I don't see why that wouldn't work.
     
    Leif K-Brooks, Mar 21, 2005
    #2
    1. Advertising

  3. <http://tinyurl.com/692pt> redirects (if not just down) to
    http://en.wikipedia.org/wiki/Wikipe...L_tree_dumps_for_mirroring_or_CD_distribution

    I see from this page only one tool (not a couple) which is available
    to download and use:

    http://www.tommasoconforti.com/ the home of Wiki2static

    Wiki2static (version 0.61, 02th Aug 2004)
    http://www.tommasoconforti.com/wiki/wiki2static.tar.gz
    is a Perl script to convert a Wikipedia SQL dump
    into an html tree suitable for offline browsing or CD distribution.

    I failed to find any documentation, so was forced to play
    directly with the script settings myself:

    $main_prefix = "u:/WikiMedia-Static-HTML/";
    $wiki_language = "pl";

    and running (in the current directory of the script):
    \> wiki2static.pl Q:\WikiMedia-MySQL-Dump\pl\20040727_cur_table.sql
    to test the script on a file with small (112 MByte)
    size of the SQL dump .

    The script is running now for over half an hour
    and has created yet 1.555 folders and
    generated 527 files with a total size of 6 MBytes
    consuming only 16 seconds of CPU time.
    I estimate the time until the script is ready to appr.
    6 hours for a 100 MByte file, which gives 120 hours
    for a 2 GByte file of the english dump ...

    Any further hints? What am I doing wrong?

    (There are now 1.627 folders and 1.307 files with
    a total size of 15.6 MB after one hour runtime and
    consumption of 20 seconds CPU time even if
    I increased the priority of the process to high
    on my W2K box running perl 5.8.3 half an hour
    ago)

    Claudio
    P.S.
    >> I loaded all of the Wikipedia data into a local MySQL server a while
    >> back without any problems.

    What was the size of the dump file imported to
    the MySQL database? Importing only the current
    version which was "a while back" smaller
    than 2 GByte (skipping the history dump)
    causes no problems with MySQL.

    "Leif K-Brooks" <> schrieb im Newsbeitrag
    news:...
    > Claudio Grondi wrote:
    > > Is there an already available script/tool able to extract records and
    > > generate proper HTML code out of the data stored in the Wikipedia SQL
    > > data base?

    >
    > They're not in Python, but there are a couple of tools available here:
    > <http://tinyurl.com/692pt>.
    >
    > > By the way: has someone succeeded in installation of a local
    > > Wikipedia server?

    >
    > I loaded all of the Wikipedia data into a local MySQL server a while
    > back without any problems. I haven't attempted to run Mediawiki on top
    > of that, but I don't see why that wouldn't work.
     
    Claudio Grondi, Mar 21, 2005
    #3
  4. > $main_prefix = "u:/WikiMedia-Static-HTML/";
    > $wiki_language = "pl";
    > The script is running now for over half an hour
    > and has created yet 1.555 folders and
    > generated 527 files with a total size of 6 MBytes
    > consuming only 16 seconds of CPU time.
    > I estimate the time until the script is ready to appr.
    > 6 hours for a 100 MByte file, which gives 120 hours
    > for a 2 GByte file of the english dump ...


    > Any further hints? What am I doing wrong?


    Inbetween I have noticed, that the script started to
    download media files from the Internet. Setting
    $include_media = 2;
    in the script solved the problem.

    Thank you Leif for pointing me to
    http://tinyurl.com/692pt

    What I am still missing is a binary of texvc for
    Windows. Have maybe someone a ready-to-use
    compiled version or can point me to one?

    Conversion from Perl to Python seems (except a
    service provided by
    http://www.crazy-compilers.com/bridgekeeper/ )
    to be not (yet) available and Perl syntax is for me
    so far away from what I already know, that
    I see currently no chance to come up with
    a Python version of
    http://www.tommasoconforti.com/wiki/wiki2static.tar.gz

    Claudio
     
    Claudio Grondi, Mar 22, 2005
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Gavin Joyce
    Replies:
    0
    Views:
    4,536
    Gavin Joyce
    Jul 5, 2003
  2. The Crow
    Replies:
    2
    Views:
    403
    The Crow
    Dec 28, 2005
  3. Harry Zoroc
    Replies:
    1
    Views:
    943
    Gregory Vaughan
    Jul 12, 2004
  4. Andreas Schwarz
    Replies:
    0
    Views:
    115
    Andreas Schwarz
    Aug 27, 2004
  5. dt

    convert wikipedia to html?

    dt, Mar 4, 2007, in forum: Perl Misc
    Replies:
    3
    Views:
    139
    Xiong Changnian
    Apr 1, 2007
Loading...

Share This Page