export sites/pages to PDF

J

jvdb

Hi all,

My employer is asking for a solution that outputs the content of urls
to pdf. It must be the content as seen within the browser.
Can someone help me on this? It must be able to export several kind of
pages with all kind of content (javascript, etc.)
 
S

Stef Mientki

jvdb said:
Hi all,

My employer is asking for a solution that outputs the content of urls
to pdf. It must be the content as seen within the browser.
Can someone help me on this? It must be able to export several kind of
pages with all kind of content (javascript, etc.)
pdfCreator does the job.

cheers,
Stef
 
J

jvdb

Hi Stef!

Thanks for your answer, but i forgot to mention that i have to run
this on unix/linux.
 
N

norseman

Nick said:
Sounds like you'd be best off scripting a browser.

Eg under KDE you can print to PDF from Konqueror using dcop to remote
control it.

Here is a demo... start Konqueror, select the PDF printer manually
before you start. (You can automate this I expect!)

Run

dcop konq*

to find the id of the running konqueror (in my case
"konqueror-18286"), then open a URL

dcop konqueror-18286 konqueror-mainwindow#1 openURL http://www.google.com

To print to a PDF file

dcop konqueror-18286 html-widget2 print 1

Web site converted to PDF in ~/print.pdf ;-)

Easy enough to script that with python.

See here for some more info on dcop :-

http://www.ibm.com/developerworks/linux/library/l-dcop/

=========================================
If you are running KDE - go with Nick's method.

If the project is as it sounds - an in-house thing.
Meaning the web stuff is created by "you".

IF (BIG IF) you have a limited amount of URLs to deal with
AND
The pages are NOT going to change shape via the print command
(some use one .css for screen and another for print)
AND
you are using UNIX of some sort:

Open the page and print the postscript output to a file.
One file per page.

Then:

with this in a script:#!/bin/sh
# ps2pdf.scr
# converts a single ps file to a pdf file
# april 2000
# SLT
#
ofil=`basename $1 .ps`
gs -sDEVICE=pdfwrite -q \
-dBATCH -dNOPAUSE -r300 \
-sOutputFile=\|cat >$ofil.pdf $1
Do:
ps2pdf.scr file.ps


If you have a number of .ps files to convert:

for f in *.ps; do ps2pdf.scr $f; done


In Windows - set the default printer to PDF to file and just print.
Don't expect to concat the PDFs into a single "book",
without a third party program.


NOTE:
If (in UNIX) you want the whole base-on in one file, set up the
printer section to ">>" (append) each output to the single file.
Depending on browser you may need to do some header cleaning.



Steve
(e-mail address removed)
 
T

Tim Roberts

jvdb said:
My employer is asking for a solution that outputs the content of urls
to pdf. It must be the content as seen within the browser.
Can someone help me on this? It must be able to export several kind of
pages with all kind of content (javascript, etc.)

There are a number of obstacles to this. Printer pages are a different
size from screen windows, so the browser does the layout differently.
Further, many style sheets have rules that are "screen only" or "print
only".

If you really want an image of exactly what's on the screen, then I don't
think you have any option other than a screen capture utility, like "xwd".
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top