MediaWiki to RTF/Word/PDF

J

Josh English

I have several pages exported from a private MediaWiki that I need to
convert to a PDF document, or an RTF document, or even a Word
document.

So far, the only Python module I have found that parses MediaWiki
files is mwlib, which only runs on Unix, as far as I can tell. I'm
working on Windows here.

Has anyone heard of a module that parses wiki markup and transforms
it? Or am I looking at XSLT?

Josh
 
D

Diez B. Roggisch

Am 17.02.10 22:00, schrieb Josh English:
I have several pages exported from a private MediaWiki that I need to
convert to a PDF document, or an RTF document, or even a Word
document.

So far, the only Python module I have found that parses MediaWiki
files is mwlib, which only runs on Unix, as far as I can tell. I'm
working on Windows here.

I think you stand a chance making it run under windows using mingw.
Might be a bit daunting though.

Other than that, yep, XSLT is your friend.

Diez
 
J

John Bokma

Josh English said:
I have several pages exported from a private MediaWiki that I need to
convert to a PDF document, or an RTF document, or even a Word
document.

So far, the only Python module I have found that parses MediaWiki
files is mwlib, which only runs on Unix, as far as I can tell. I'm
working on Windows here.

Has anyone heard of a module that parses wiki markup and transforms
it? Or am I looking at XSLT?

One option might be to install a printer driver that prints to PDF and
just print the web pages.

Using Saxon or AltovaXML and a suitable stylesheet might give you the
nicest looking result though (but quite some work).
 
P

Paul Rubin

Josh English said:
Has anyone heard of a module that parses wiki markup and transforms
it? Or am I looking at XSLT?

MediaWiki markup is quite messy and unless MediaWiki has an XML export
feature that I don't know about, I don't see what good XSLT can do you.
(The regular MediaWiki API generates XML results with wiki markup
embedded in it). It looks like mediawiki itself can create pdf's (see
any page on en.wikibooks.org for example), but the rendered pdf is not
that attractive.

I remember something about a Haskell module to parse mediawiki markup,
if that is of any use.
 
S

Snaky Love

Hi,

- I checked some ways doing this, and starting over with a new thing
will give you a lot of headaches - all XSLT processors have one or
another problem - success depends very much on how you where using
wikipedia (plugins?) and you will have to expect a lot of poking
around with details and still not beeing happy with the solutions
available - there are many really crazy approaches out there to
generate pdf of mediawiki "markup" - many tried, not many succeeded,
most of them stop at "good enough for me"-level. So it might be a good
idea, not running too far away from what they are doing at
http://code.pediapress.com - you will spend much less time with
installing ubuntu in a virtualbox.

However there is one quite impressive tool, that does pdf conversion
via css and is good for getting the job done quick and not too dirty:
http://www.princexml.com/samples/ - scroll down to the mediawiki
examples - they offer a free license for non-commercial projects.

Good luck!

Have a nice day,
Snaky
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top