export sites/pages to PDF

Discussion in 'Python' started by jvdb, Aug 12, 2008.

  1. jvdb

    jvdb Guest

    Hi all,

    My employer is asking for a solution that outputs the content of urls
    to pdf. It must be the content as seen within the browser.
    Can someone help me on this? It must be able to export several kind of
    pages with all kind of content (javascript, etc.)
    jvdb, Aug 12, 2008
    #1
    1. Advertising

  2. jvdb

    Stef Mientki Guest

    jvdb wrote:
    > Hi all,
    >
    > My employer is asking for a solution that outputs the content of urls
    > to pdf. It must be the content as seen within the browser.
    > Can someone help me on this? It must be able to export several kind of
    > pages with all kind of content (javascript, etc.)
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >

    pdfCreator does the job.

    cheers,
    Stef
    Stef Mientki, Aug 12, 2008
    #2
    1. Advertising

  3. jvdb

    jvdb Guest

    Hi Stef!

    Thanks for your answer, but i forgot to mention that i have to run
    this on unix/linux.


    On Aug 12, 9:06 pm, Stef Mientki <> wrote:
    > jvdb wrote:
    > > Hi all,

    >
    > > My employer is asking for a solution that outputs the content of urls
    > > to pdf. It must be the content as seen within the browser.
    > > Can someone help me on this? It must be able to export several kind of
    > > pages with all kind of content (javascript, etc.)
    > > --
    > >http://mail.python.org/mailman/listinfo/python-list

    >
    > pdfCreator does the job.
    >
    > cheers,
    > Stef
    jvdb, Aug 12, 2008
    #3
  4. jvdb

    norseman Guest

    Nick Craig-Wood wrote:
    > jvdb <> wrote:
    >> My employer is asking for a solution that outputs the content of urls
    >> to pdf. It must be the content as seen within the browser.
    >> Can someone help me on this? It must be able to export several kind of
    >> pages with all kind of content (javascript, etc.)

    >
    > Sounds like you'd be best off scripting a browser.
    >
    > Eg under KDE you can print to PDF from Konqueror using dcop to remote
    > control it.
    >
    > Here is a demo... start Konqueror, select the PDF printer manually
    > before you start. (You can automate this I expect!)
    >
    > Run
    >
    > dcop konq*
    >
    > to find the id of the running konqueror (in my case
    > "konqueror-18286"), then open a URL
    >
    > dcop konqueror-18286 konqueror-mainwindow#1 openURL http://www.google.com
    >
    > To print to a PDF file
    >
    > dcop konqueror-18286 html-widget2 print 1
    >
    > Web site converted to PDF in ~/print.pdf ;-)
    >
    > Easy enough to script that with python.
    >
    > See here for some more info on dcop :-
    >
    > http://www.ibm.com/developerworks/linux/library/l-dcop/
    >


    =========================================
    If you are running KDE - go with Nick's method.

    If the project is as it sounds - an in-house thing.
    Meaning the web stuff is created by "you".

    IF (BIG IF) you have a limited amount of URLs to deal with
    AND
    The pages are NOT going to change shape via the print command
    (some use one .css for screen and another for print)
    AND
    you are using UNIX of some sort:

    Open the page and print the postscript output to a file.
    One file per page.

    Then:

    with this in a script:
    >>>>>>>>>>>>>>>>

    #!/bin/sh
    # ps2pdf.scr
    # converts a single ps file to a pdf file
    # april 2000
    # SLT
    #
    ofil=`basename $1 .ps`
    gs -sDEVICE=pdfwrite -q \
    -dBATCH -dNOPAUSE -r300 \
    -sOutputFile=\|cat >$ofil.pdf $1
    >>>>>>>>>>>>>>>>


    Do:
    ps2pdf.scr file.ps


    If you have a number of .ps files to convert:

    for f in *.ps; do ps2pdf.scr $f; done


    In Windows - set the default printer to PDF to file and just print.
    Don't expect to concat the PDFs into a single "book",
    without a third party program.


    NOTE:
    If (in UNIX) you want the whole base-on in one file, set up the
    printer section to ">>" (append) each output to the single file.
    Depending on browser you may need to do some header cleaning.



    Steve
    norseman, Aug 13, 2008
    #4
  5. jvdb

    Tim Roberts Guest

    jvdb <> wrote:
    >
    >My employer is asking for a solution that outputs the content of urls
    >to pdf. It must be the content as seen within the browser.
    >Can someone help me on this? It must be able to export several kind of
    >pages with all kind of content (javascript, etc.)


    There are a number of obstacles to this. Printer pages are a different
    size from screen windows, so the browser does the layout differently.
    Further, many style sheets have rules that are "screen only" or "print
    only".

    If you really want an image of exactly what's on the screen, then I don't
    think you have any option other than a screen capture utility, like "xwd".
    --
    Tim Roberts,
    Providenza & Boekelheide, Inc.
    Tim Roberts, Aug 18, 2008
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Kevin Buchan
    Replies:
    1
    Views:
    456
    Eric Lawrence [MSFT]
    Feb 20, 2004
  2. Richard Coltrane

    ReportViewer export to PDF = PDF[1].

    Richard Coltrane, Feb 25, 2007, in forum: ASP .Net
    Replies:
    1
    Views:
    1,042
    Richard Coltrane
    Feb 25, 2007
  3. Maarten Porters
    Replies:
    1
    Views:
    420
    Florian Gilcher
    Jul 28, 2008
Loading...

Share This Page