If I get you right, it's better to do a direct XML to XSL-FO
conversion with XSLT? But I don't quite understand how i can transform
my server xhtml pages to PDF without loosing my markup/layout which
I've created with XSL and CSS. Isn't their some kind of program which
will accept my XML + XSL + CCS as input and render an XSL-FO file for
me? I know some programs that will do a XML + XSL to XSL-FO
conversion. But then my CSS is'nt read, which will create very ugly
PDF documents.
There may be, but I'm not familiar with it; it hasn't come up in my
processing because my HTML and PDF documents are structured differently
enough that I've always had two channels for processing.
If I recall your original question, you wanted to build PDF based on the
XHTML. In my experience, the formatting differences between web and
print media are usually great enough to warrant different formatting.
For instance, on screen your horizontal space is very limited -- there's
only so much information you can present on a screen. On a page,
though, you have much more space (you can fit more legible text across a
page than across a screen), and that changes certain formatting
decisions. Footnotes are a pain in the ass on a web site; on paper they
can be very effective. Floats[1] are meaningless on a web page, but
very useful on paper. Two-column display doesn't generally look good on
a web page but again can be very effective on paper.
I have built some very good-looking documents, both in HTML and PDF.
I've written a number of reference documents for work where my 'base'
XML is highly domain specific. I run it through a transformation that
generates DocBookXML (which takes me from 'domain specific' to 'document
specific', which then gets run through other transformations. The HTML
version can connect to CSS (getting pretty web pages), the PDF might be
generated through a transformation to XSL-FO (which, from what I can
see, has a superset of CSS capabilities) or through LaTeX, which has
very impressive formatting capabilities.
[1] remember high school textbooks, where there would be a reference to
a table, but that table didn't appear until the top of the next
page? The table 'floated' to the next good place to place it. I've
had web pages print very oddly because a table split unnecessarily
or -- almost *worse* -- didn't quite fit and the entire thing got
moved to the next page, leaving a huge white area on the page where
it 'started'.
Maybee I'm to lazy and should create a XSL file to convert my XML into
a well formatted XSL-FO file, but something is telling me such a
possibillity is out there...
I'm sure it is. If nothing else, you can load the HTML pages into a
browser, print to PDF (or PostScript and convert using Distiller or
other tool). If you do that, though, you limit yourself to the
formatting of the web pages, unless you load enough content indicators
(class attributes, etc.) into the HTML. In this case you're trying to
extract domain-specific information from the HTML to reconstruct the
initial state... a challenging task, to be sure, and fragile -- changes
to the HTML formatting might break everything you've done to convert
from HTML to PDF.
In my experience, it's much easier to run both series of transformations
up to the presentational transformation, then split at that point. In
both cases you have the richest, most domain-specific information
available and insulate yourself from changes to presentation, in either
direction.
Keith