DocBook to PDF

H

Hal Fulton

I'm wanting to do some docbook to pdf conversion.

I assume the "right" way would be some combination
of REXML and PDF::Writer.

Anyone done anything like this?

Ideally I'd like to be able to tweak a few things
such as margins and page size without changing
anything else...

Ideas welcome.


Thanks,
Hal
 
J

James Britt

Hal said:
I'm wanting to do some docbook to pdf conversion.

I assume the "right" way would be some combination
of REXML and PDF::Writer.

Anyone done anything like this?

Please, nobody shoot me, but I have to believe this can easily be done
with Java tools.

Now, if the tweak-factor is such that actually handling Java-matter is
too painful, then perhaps a Ruby solution is better. But from casual
following of the xml-dev mailing list, it seems that this sort of thing
is a well-solved matter in Java.


Just a thought.


James

--

http://www.ruby-doc.org - Ruby Help & Documentation
http://www.artima.com/rubycs/ - Ruby Code & Style: Writers wanted
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys
http://www.30secondrule.com - Building Better Tools
 
S

Steven Jenkins

Hal said:
I'm wanting to do some docbook to pdf conversion.

I assume the "right" way would be some combination
of REXML and PDF::Writer.

Anyone done anything like this?

I have Ruby code that converts a usable subset of DocBook to LaTeX. It's
a prototype, but it works. It'd be pretty easy to extend it for elements
it doesn't handle, modulo difficult things like bibliographic references
and citations.
Ideally I'd like to be able to tweak a few things
such as margins and page size without changing
anything else...

All that's in a separate style file.

I developed this for a specific application: producing formal
specification documents by extracting functional models and requirements
from a database. I turn the XML into HTML with XSLT and into PDF using
LaTeX and dvipdf.

I'll find the latest version tonight and email it to you. If it's
useful, maybe I'll put it on Rubyforge.

Steve
 
K

Keith Fahlgren

I'm wanting to do some docbook to pdf conversion.

Hi,

Here's a suggestion from a coworker of mine:

Does this need to be a pure Ruby solution? If not, I'd suggest using (or
adapting) the standard DocBook stylesheets
(http://wiki.docbook.org/topic/DocBookXslStylesheets), and using
(perhaps via the Ruby libxslt bindings) those to generate XSL-FO, which you can
then feed into FOP (http://xmlgraphics.apache.org/fop/) or the FO formatter
of your choice.

Alternately, OpenOffice.org can open (and reasonably) format DocBook
files, and has a built-in PDF printer, and is scriptable via Ruby (though I
wouldn't necessarily recommend it).


HTH,
Keith
 
L

Lyle Johnson

Please, nobody shoot me, but I have to believe this can easily be done
with Java tools.

Now, if the tweak-factor is such that actually handling Java-matter is
too painful, then perhaps a Ruby solution is better. But from casual
following of the xml-dev mailing list, it seems that this sort of thing
is a well-solved matter in Java.

It is fairly straightforward to convert DocBook/XML to PDF using a
combination of Java tools. I am using the Saxon XSLT processor
(http://saxon.sourceforge.net/), the standard DocBook XSL stylesheets
(http://docbook.sourceforge.net/projects/xsl/), and FOP
(http://xmlgraphics.apache.org/fop/) to accomplish this.

I think that, at some point in the past, I considered using Ruby tools
to do this but they either didn't exist or weren't quite up to snuff
(especially with regards to an XSLT processor).
 
B

Bob Hutchison

I'm wanting to do some docbook to pdf conversion.

I assume the "right" way would be some combination
of REXML and PDF::Writer.

Anyone done anything like this?

Ideally I'd like to be able to tweak a few things
such as margins and page size without changing
anything else...

Ideas welcome.

My first thought is Holy Crap! Really, I've got a kid about that age :)

The typical XML way to do this is DocBook -> XSL:FO -> pdf because
you'll be able to use Norm Walsh's stuff. If you do this and you have
input documents bigger than, say, two pages, you'll be wanting the
*fastest* XML processors you can get. I did this a couple of years
ago and found libxml2 and libxslt2 the way to go. Getting a good
XSL:FO processor was quite a trick then, and I have not been keeping
up to date, so keep that in mind. The only free one that I could find
was the Apache FOP processor, and it wasn't all that good
(incomplete). There are a couple of commercial processors that are
apparently very good, but I wasn't willing to spend a couple hundred
bucks on them.

An alternative that is worked quite well was to use the SGML
processors for DocBook.

As I said, I've not been keeping up, but apparently both the Apache
XML processors and Saxon have become much faster in the last couple
of years. Both of those are Java. Unfortunately I don't think Ruby
has a remote chance of being useful as an XML processor for this kind
of application until Ruby gets much faster.

Putting Ruby into that pipeline for processing the stream sounds more
reasonable, but still, that stream is going to get *very* long once
you've got XSL:FO.

If you really want to do this yourself brace yourself for a lot of
work. Maybe you should choose a subset of DocBook (isn't there a small
(ish) subset already defined?)

Are you committed to DocBook? If not you should have a look at DITA
(from IBM), and consider latex/contex or one of the groff macro
packages (like om (mom)). I'm having some fun with publicon from
Wolfram these days (it reminds me of FrameMaker but runs on OS/X (and
Windows, linux coming, maybe) and can generate HTML, XML, and latex
output). If publicon works out (I'm using it to document a ruby
project I'll be open sourcing soon, and a couple of things for work)
I'll be sticking with that.

Cheers,
Bob
 
S

Steven Jenkins

Bob said:
The typical XML way to do this is DocBook -> XSL:FO -> pdf because
you'll be able to use Norm Walsh's stuff. If you do this and you have
input documents bigger than, say, two pages, you'll be wanting the
*fastest* XML processors you can get. I did this a couple of years ago
and found libxml2 and libxslt2 the way to go. Getting a good XSL:FO
processor was quite a trick then, and I have not been keeping up to
date, so keep that in mind. The only free one that I could find was the
Apache FOP processor, and it wasn't all that good (incomplete). There
are a couple of commercial processors that are apparently very good,
but I wasn't willing to spend a couple hundred bucks on them.

I eventually went with LaTeX because I'm not just trying to make marks
on paper, but trying to make beautiful documents that measure up to high
standards of typesetting. All the out-of-the-box DocBook/XSL stuff I
tried produced ugly output. Maybe things have gotten better.

The real appeal of LaTeX for me is that it operates on document objects
at approximately the same level of abstraction as DocBook itself. It's
fairly straightforward to translate between the two, and then use styles
and macros to control the output formatting.
An alternative that is worked quite well was to use the SGML processors
for DocBook.

As I said, I've not been keeping up, but apparently both the Apache XML
processors and Saxon have become much faster in the last couple of
years. Both of those are Java. Unfortunately I don't think Ruby has a
remote chance of being useful as an XML processor for this kind of
application until Ruby gets much faster.

It depends on the application. I'm using a brute force REXML parser and
it's plenty fast enough for what I need to do. My test data is about
120k (37 pages typeset), and is fairly complex structurally: nested
sections, variablelists, EPS figures, etc. I can convert it on an old P3
in about 10 seconds. LaTeX is blazingly fast, so I can afford a little
slowness upstream.
Putting Ruby into that pipeline for processing the stream sounds more
reasonable, but still, that stream is going to get *very* long once
you've got XSL:FO.

If you really want to do this yourself brace yourself for a lot of
work. Maybe you should choose a subset of DocBook (isn't there a small
(ish) subset already defined?)

You don't need a very big subset for many documents. Mine handles

appendix
article
articleinfo
biblioid
blockquote
caption
colspec
emphasis
entry
figure
formalpara
imagedata
itemizedlist
listitem
mediaobject
orderedlist
para
pubdate
row
section
simpara
table
term
tgroup
thead
title
variablelist
xref

in about 350 lines of Ruby. It also handles profiles.

Steve
 
H

Hal Fulton

Keith said:
On Sat, 12 Nov 2005, Hal Fulton wrote:


Does this need to be a pure Ruby solution? If not, I'd suggest using (or
adapting) the standard DocBook stylesheets
(http://wiki.docbook.org/topic/DocBookXslStylesheets), and using
(perhaps via the Ruby libxslt bindings) those to generate XSL-FO, which you can
then feed into FOP (http://xmlgraphics.apache.org/fop/) or the FO formatter
of your choice.

That sounds tedious and error-prone to me -- I don't normally use any
of this stuff. And I don't even know what FO is.
Alternately, OpenOffice.org can open (and reasonably) format DocBook
files, and has a built-in PDF printer, and is scriptable via Ruby (though I
wouldn't necessarily recommend it).

There doesn't really *need* to be any scripting necessarily. I just want
to be able to reformat the doc in a couple of different ways.

My OOo, however, doesn't seem to know DocBook. It's probably old (1.1.0) --
what is the newest one? Or is this some separate plugin?


Thanks,
Hal
 
H

Hal Fulton

Lyle said:
It is fairly straightforward to convert DocBook/XML to PDF using a
combination of Java tools. I am using the Saxon XSLT processor
(http://saxon.sourceforge.net/), the standard DocBook XSL stylesheets
(http://docbook.sourceforge.net/projects/xsl/), and FOP
(http://xmlgraphics.apache.org/fop/) to accomplish this.

I think that, at some point in the past, I considered using Ruby tools
to do this but they either didn't exist or weren't quite up to snuff
(especially with regards to an XSLT processor).

Wow, Lyle. You continue to amaze me. :)

What might be straightforward for you might not be for me.

I don't have any of these tools installed, and I've never heard of
FOP or FO.

Still my best shot?


Thanks,
Hal
 
H

Hal Fulton

Bob said:
My first thought is Holy Crap! Really, I've got a kid about that age :)

Huh? Smiley or not, I don't get this remark. And I so hate to be
humor-impaired. ;)
The typical XML way to do this is DocBook -> XSL:FO -> pdf because
you'll be able to use Norm Walsh's stuff. If you do this and you have
input documents bigger than, say, two pages, you'll be wanting the
*fastest* XML processors you can get.

This sounds similar ot others' advice, so I will be looking into it.

As for speed, I do have a large doc -- 200 pages or so -- but as I
will only reformat it once or twice, I'm not sure I care much
about speed. Unless it's a "cyclic" thing where I have to tweak it
and look at the results and tweak again.
Are you committed to DocBook? If not you should have a look at DITA
(from IBM), and consider latex/contex or one of the groff macro
packages (like om (mom)). I'm having some fun with publicon from
Wolfram these days (it reminds me of FrameMaker but runs on OS/X (and
Windows, linux coming, maybe) and can generate HTML, XML, and latex
output). If publicon works out (I'm using it to document a ruby project
I'll be open sourcing soon, and a couple of things for work) I'll be
sticking with that.

For this particular project, the source is in DocBook. It's been
transformed into other forms -- RTF, PDF, TeX, HTML.

There might be other ways to do this -- all I really want is to
change the page size (and possibly margins) and re-flow. But the
original source is DocBook.


Thanks,
Hal
 
L

Lyle Johnson

What might be straightforward for you might not be for me.

I don't have any of these tools installed, and I've never heard of
FOP or FO.

Still my best shot?

I don't remember exactly how I put all the pieces together back when I
was starting to look at this, but it's certainly not because I
actually understand how it all works. ;)

FO stands for "formatting object", and that is literally everything
that I know about it. Seriously. I know that when I process my DocBook
XML documents with Saxon (which is a standalone executable-type
program), it uses some XSL instructions in the DocBook XSL stylesheets
to produce an FO file. Oh, I don't know anything about XSL either, by
the way. I just know that it's a piece of the puzzle. Anyways, I then
can use Apache's FOP program (also a command-line program) to spit out
a PDF.

We can talk about it more offline if you do decide to go this route.
It's mostly a job of downloading the stuff and installing it, though.
 
D

Devin Mullins

Hal said:
That sounds tedious and error-prone to me -- I don't normally use any
of this stuff. And I don't even know what FO is.

XSL-FO is a W3C standard, a product of the XSL working group (the same
people who put out XSLT, the language for converting XML documents).
XSL-FO is an intermediary XML language for represented "formatted
objects" (rich text documents). Many tools exist to convert XSL-FO files
(documents) into PDF, PNG, etc. Apache FOP is the popular Java one, but
you don't need to know Java to use it. It has a command line interface
whereby you feed it the filename of an XSL-FO file and the filename of
your desired PDF file.

Capisce?

Devin
(Tedious? Error-prone? It's a W3C XML standard, so, probably. But,
luckily, you don't have to write any of that crap -- just use it.)
 
D

David A. Black

Hi --

That sounds tedious and error-prone to me -- I don't normally use any
of this stuff. And I don't even know what FO is.

There's a bit of setup involved, but I've got scripts for saxon (XSLT
parser) and FOP, and it's all pretty streamlined. I've always tried
to bundle as much of this stuff as possible into a single directory;
and though I haven't upgraded it lately, I can share it as a bundle if
needed.


David
 
D

Devin Mullins

Devin said:
XSL-FO is a W3C standard, a product of the XSL working group (the same
people who put out XSLT, the language for converting XML documents).
XSL-FO is an intermediary XML language for represented "formatted
objects" (rich text documents). Many tools exist to convert XSL-FO
files (documents) into PDF, PNG, etc. Apache FOP is the popular Java
one, but you don't need to know Java to use it. It has a command line
interface whereby you feed it the filename of an XSL-FO file and the
filename of your desired PDF file.

To drill it in:

DocBook --(XSLT)--> XSL-FO --(FOP)--> PDF

The XSLT pictured is a specific XSLT stylesheet for converting DocBook
to XSL-FO. XSLT is a general XML language specification for XML file
conversion, as specified before, and according to Keith, there exists a
specific XSLT "script" for DocBook --> XSL-FO. You will need an XSLT
"interpreter" to run it. Xalan is one. (Google it.)

FOP is FOP. It needs no specific thing. XSL-FO and PDF are both fairly
standard.

Devin
 
R

Rich Morin

It would be WONDERFUL if someone would create a ReadMe that gives
explicit instructions for setting up a minimalist DocBook tool
chain. I've asked for such a thing on the DocBook mailing list
([email protected]), but nobody there has been willing
or able to come up with one.

-r
 
L

Lyle Johnson

It would be WONDERFUL if someone would create a ReadMe that gives
explicit instructions for setting up a minimalist DocBook tool
chain.

Well, point them to the instructions in this thread and they're most
of the way there. ;)

Seriously, though, I remember that when I was getting started with
DocBook someone had written up a document like that, one which covered
things from the DocBook/SGML perspective. He included stuff about an
SGML mode for Emacs, and a lot of the complications that come along
for the ride when you're dealing with SGML processing. In contrast,
I've found that things are remarkably straightforward when dealing
with DocBook/XML. Devin summed it up really well in one of his
previous posts in this thread, there are really just two "tools" that
you'll need to get your hands on (an XSLT processor, and FOP).
 
N

Nicholas Van Weerdenburg

That sounds tedious and error-prone to me -- I don't normally use any
of this stuff. And I don't even know what FO is.


There doesn't really *need* to be any scripting necessarily. I just want
to be able to reformat the doc in a couple of different ways.

My OOo, however, doesn't seem to know DocBook. It's probably old (1.1.0) = --
what is the newest one? Or is this some separate plugin?


Thanks,
Hal

It's a bit tedious to install, but the toolchain is mostly automatic.

docbook->doc book xsl-> xsl-fo-> pdf

All you reall do is execute the toolchain against the docbook. So, you
don't really need to know xslt, xsl-fo, or pdf.

Compared to Rexml and PDF::Writer, this is a whole lot simpler. Spend
a couple of hours configuring tools, create a batch file or ant
script, and you are rolling.

Sun had created an Ant tool called pipeline for handling processes
much like this. When I looked at it was mostly a concept. Not sure
where it went.

Interestingly, xsl-fo was the first concept behind xslt
(t=3Dtransformations) which is used to mangle xml docs from one format
to another. During the production of "xsl" they need to general
transformations became apparent, so they split xsl into xsl-t
(transformations) and xsl-fo (formatting objects). Now everyone uses
xslt and xsl-fo is languishing in near obscurity, even though it was
the initial impetus behind the whole xsl thing.

Regards,
Nick
 
J

James Britt

Nicholas Van Weerdenburg wrote:
...
It's a bit tedious to install, but the toolchain is mostly automatic.

docbook->doc book xsl-> xsl-fo-> pdf

It's been a while, but that's my recollection of using it.

Mind you, I was also doing steady Java/J2EE coding, and deploying
applications to iPlanet, so notions of simple or hard are *very* relative.

But grep around the xml-dev list archives, because this is, I think, a
common question.

http://lists.xml.org/archives/xml-dev/

James

--

http://www.ruby-doc.org - Ruby Help & Documentation
http://www.artima.com/rubycs/ - Ruby Code & Style: Premiere Ruby Journal
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys
http://www.30secondrule.com - Building Better Tools
 
H

Hal Fulton

Nicholas said:
It's a bit tedious to install, but the toolchain is mostly automatic.

docbook->doc book xsl-> xsl-fo-> pdf

All you reall do is execute the toolchain against the docbook. So, you
don't really need to know xslt, xsl-fo, or pdf.

Well, I have to know something at some point if I am going to change
the page size, which is the whole point of the exercise.

At what point would I make that change? (I won't ask "how" until I
actually have the tools installed.)
Compared to Rexml and PDF::Writer, this is a whole lot simpler. Spend
a couple of hours configuring tools, create a batch file or ant
script, and you are rolling.

OK, that sounds good.

I wonder if there is a need for any of these tools in Ruby? Would
we gain anything or not? I'm asking partly just to stay more
on-topic. ;)


Hal
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top