html to pdf

Discussion in 'XML' started by Surbhi, May 13, 2008.

  1. Surbhi

    Surbhi Guest

    Hi
    We have HTML datasheets but now we want then in PDF format because
    page layout is very bad when HTML is printed.

    I am through with the XML and XSLT part. But donot hav any idea of XSL-
    FO.
    I guess, i would need to use "fo" tags in the XSLT. Could someone
    suggest me some good reference material, or pointers for this.

    Its a bit urgent, please do help.

    Best Regards
    Surbhi
     
    Surbhi, May 13, 2008
    #1
    1. Advertising

  2. Surbhi

    Guest

    Surbhi escribió:
    > We have HTML datasheets but now we want then in PDF format because
    > page layout is very bad when HTML is printed.
    >
    > I am through with the XML and XSLT part. But donot hav any idea of
    > XSL- FO. I guess, i would need to use "fo" tags in the XSLT. Could
    > someone suggest me some good reference material, or pointers for
    > this.
    >
    > Its a bit urgent, please do help.


    http://www.antennahouse.com/XSLsample/sample-xsl-xhtml2fo.zip

    --
    Manuel Collado - http://lml.ls.fi.upm.es/~mcollado
     
    , May 13, 2008
    #2
    1. Advertising

  3. Surbhi

    Andy Dingley Guest

    On 13 May, 10:23, Surbhi <> wrote:

    > We have HTML datasheets but now we want then in PDF format because
    > page layout is very bad when HTML is printed.


    I wouldn't give up on printed HTML, but I can understand how you're
    thinking here.

    You have two options, "Print HTML to PDF", using tools such as Adobe's
    (expensive) or Foxit (open-source, simpler, free)

    Otherwise go down the XSLT, XSL:FO, PDF route. You'll probably find
    Apache FOP to be the easiest route from :FO to PDF. I do this a lot (a
    quarter of my working day) and host it all within Java and Ant as a
    "make" framework to glue it all together. For bigger systems, Cocoon
    or Apache Forrest are worth looking at too.

    Learning to code the XSL:FO is painful, the rest is well-established
    pipeline tools that just get on with it and work. You'll find that
    good HTML + CSS knowledge is a good starting point to understanding
    XSL:FO properties and rendering. If you have that, then just the W3C
    specs for XSL:FO are enough to work with. If you want a CSS
    background, read Lie & Bos "Cascading Style Sheets". Usenet group
    c.i.w.a.s is good too.

    This stuff isn't an easy bit of knowledge to learn, so start simple
    and get _something_ working first, then look to expand it. It's very
    useful long-term though, so it does repay the effort.
     
    Andy Dingley, May 13, 2008
    #3
  4. Surbhi

    Ken Starks Guest

    Surbhi wrote:
    > Hi
    > We have HTML datasheets but now we want then in PDF format because
    > page layout is very bad when HTML is printed.
    >
    > I am through with the XML and XSLT part. But donot hav any idea of XSL-
    > FO.
    > I guess, i would need to use "fo" tags in the XSLT. Could someone
    > suggest me some good reference material, or pointers for this.
    >
    > Its a bit urgent, please do help.
    >
    > Best Regards
    > Surbhi


    Whether you would be better
    to transform your XML directly to 'Formatting Objects' or to
    transform it indirectly ( to 'Docbook' or something similar with an
    off-the-shelf transformation to xsl-fo ) is a moot point.

    There are also a few off-the-shelf stylesheets that convert html
    directly to pdf, but the Typographic quality varies.


    Whatever, you still need a 'serialiser' to convert the xsl-fo into pdf.
    Apache FOP is a popular open source one.

    Wikipedia is a good place to start:

    http://en.wikipedia.org/wiki/XSL_Formatting_Objects


    If you want a complete system, so you can concentrate of learning one
    bit at a time (such as xsl-fo),
    you could try Apache Cocoon, and use the 'Hello World' example
    where the same XML is converted to MANY output formats, including
    pdf, xhtml, svg, postscript, flash, open document, Excel ...

    (There is an example of the kind of stuff you need to
    do, at:

    http://cocoon.apache.org/2.1/howto/howto-html-pdf-publishing.html

    )


    Some of the major software vendors have tutorials in the use
    of xsl-fo. For example 'Render-X' and 'Antenna House'; most of
    the material is just as relevant to a free serialiser such as
    Apache FOP. (If graphics are important, you may need a 'try and
    see' approach , particularly with vector graphics and
    transparency which are degraded or lost by some serialisers).

    On the other hand, at least one serialiser now goes beyond the xsl-fo
    specification, allowing rudimentary interactive forms in the pdf.

    Finally, there are systems which you can use to convert your XML
    to LaTeX, and from there you will get very high quality output. But
    LaTeX is yet another massive leaning task if none of your team already
    know it.
     
    Ken Starks, May 13, 2008
    #4
  5. Surbhi

    Andy Dingley Guest

    On 13 May, 15:24, Ken Starks <> wrote:

    > Whether you would be better
    > to transform your XML directly to 'Formatting Objects' or to
    > transform it indirectly ( to 'Docbook' or something similar with an
    > off-the-shelf transformation to xsl-fo ) is a moot point.


    I wouldn't go down that route, via DocBook.

    Of course this all depends a _lot_ on the quality of the HTML. HTML
    3.2 with presentation guff all over it is a lot more trouble to work
    with than pure-semantics HTML 4 + CSS. This is true for any processing
    toolset. HTML 4 with a bad case of "divitis" is actually one of the
    easiest targets for conversion to XSL:FO. Bad practice for coding
    semantic HTML, but a closer match to your target here.

    HTML is somewhat more generalised than DocBook, so converting
    "upwards" to DocBook is unlikely to have any more structure implied in
    it than is simply inferred automatically from the HTML. DocBook isn't
    some fantastic panacea anyway - I've rarely used it in practice, as
    its minor advantages over HTML are all too often outweighed by being
    yet another format. Unless you need book-level structuring, if all you
    need is inline markup, paragraphs and headings, then HTML 4 gives you
    nearly as much anyway.

    I'd consider going from HTML to DocBook if I was concatenating a
    number of pages to make one single DocBook representing the whole set
    as a site, but very rarely for single page stuff.

    As to the use of pre-existing transforms for DocBook to XSL:FO, then
    these are certainly available and well-done, but they're not as useful
    as one might think. This is for two reasons: they're not as necessary
    as one might think, and it's not so hard to do without them.

    The off-the-shelf DocBook stylesheets have a big advantage in that
    they're competent, full implementations of all DocBook elements. Now
    most of us just don't need that, because we only author a tiny subset
    of DocBook anyway. I've never used the <kitchen-sink> element,
    although I'm sure DocBook has one somewhere. This is particularly the
    case for auto-generated DocBook out of HTML. Secondly, it's not that
    hard to write a minimal XSLT to make simple (i.e. little formatting
    subtlety) XSL:FO. Thirdly it's harder to make XSL:FO with complex
    formatting. If you don't need this, either use the pre-exisitng
    stylesheet or write your own - neither is impossibly hard. If you _do_
    need complex formatting, you probably have to write your own XSLT
    whether you like it or not.
     
    Andy Dingley, May 14, 2008
    #5
  6. Surbhi

    Ken Starks Guest

    I agree with you, Andy. DocBook is a poor example, being far too
    heavy. I think I was really thinking of something more lightweight
    such as LinuxDoc (which you can take into Lyx for tweaking). I have
    also, recently, given .dita a quick spin, but it also seems to
    be yet another format. (It too has many more elements than html,
    by the way.)

    Yours,

    Ken.

    Andy Dingley wrote:
    > On 13 May, 15:24, Ken Starks <> wrote:
    >
    >> Whether you would be better
    >> to transform your XML directly to 'Formatting Objects' or to
    >> transform it indirectly ( to 'Docbook' or something similar with an
    >> off-the-shelf transformation to xsl-fo ) is a moot point.

    >
    > I wouldn't go down that route, via DocBook.
    >
    > ... <snip> many good points.
    >
     
    Ken Starks, May 16, 2008
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ricardo Pog
    Replies:
    1
    Views:
    488
    Austin Ziegler
    Mar 26, 2008
  2. Sean Nakasone
    Replies:
    1
    Views:
    428
    Farrel Lifson
    Apr 14, 2008
Loading...

Share This Page