convert pdf to png

C

Carl K

I need to take the take the pdf output from reportlab and create a preview image
for a web page. so png or something. I am sure ghostscript will be involved.
I am guessing PIL or ImageMagic ?

all sugestions welcome.

Carl K
 
R

Ramsey Nasser

I need to take the take the pdf output from reportlab and create a preview image
for a web page. so png or something. I am sure ghostscript will be involved.
I am guessing PIL or ImageMagic ?

all sugestions welcome.

Carl K

PIL's support for pdf files is write only, so thats out of the question.

I just tried ImageMagik from the console and it converted a pdf into
png in a snap, so that seems to be your best bet.
 
J

Jaap Spies

Carl said:
I need to take the take the pdf output from reportlab and create a
preview image for a web page. so png or something. I am sure
ghostscript will be involved. I am guessing PIL or ImageMagic ?

all sugestions welcome.

If it is a multi page pdf Imagemagick will do:

convert file.pdf page-%03d.png

Jaap
 
C

Carl K

Jaap said:
If it is a multi page pdf Imagemagick will do:

convert file.pdf page-%03d.png

I need python code to do this. It is going to be run on a someone else's shared
host web server, security and performance is an issue. So I would rather not
run stuff via popen. I also need something that is easy to install. (either
easy_install or a distro package)

I just looked at what it takes to install PythonMagick:

Requists for installation is:
boost
boost-python
python 2.5
Magick++ (>= 6.2)
and for building:
pkg-config
libtool
make

That is looking like maybe "not easy to install"

so I need to make an easy_install-able .tgz or some other way of making the image.

Carl K
 
G

Grant Edwards

I need python code to do this. It is going to be run on a
someone else's shared host web server, security and
performance is an issue. So I would rather not run stuff via
popen.

Use subprocess.

Trying to eliminate popen because of the overhead when running
ghostscript to render PDF (I assume convert uses gs?) is about
like trimming an elephants toenails to save weight.
I also need something that is easy to install. (either
easy_install or a distro package)

That's a problem.
 
C

Carl K

Grant said:
Use subprocess.

Trying to eliminate popen because of the overhead when running
ghostscript to render PDF (I assume convert uses gs?) is about
like trimming an elephants toenails to save weight.

maybe, but I wouldn't be so sure.

currently the pdf is created in a python StringIO buffer and returned to the
browser; so it never becomes a file. using convert means I have to first save
it as a file, convert from file to file, read the file, delete the 2 files. so 6
file operations where before there were none. That may be more of a load than
the ghostscript part.

Carl K
 
J

Jaap Spies

Carl said:
maybe, but I wouldn't be so sure.

currently the pdf is created in a python StringIO buffer and returned to
the browser; so it never becomes a file. using convert means I have to
first save it as a file, convert from file to file, read the file,
delete the 2 files. so 6 file operations where before there were none.
That may be more of a load than the ghostscript part.

Carl K

Are you clever and am I stupid? I did not read this in your original post!

Jaap
 
A

Andrew MacIntyre

Grant said:
Use subprocess.

Trying to eliminate popen because of the overhead when running
ghostscript to render PDF (I assume convert uses gs?) is about
like trimming an elephants toenails to save weight.

Using ctypes to call Ghostscript's API also works well. I've only done
this on Windows, but it should also work on other systems with ctypes
support.

--
 
C

Carl K

Andrew said:
Using ctypes to call Ghostscript's API also works well. I've only done
this on Windows, but it should also work on other systems with ctypes
support.

sounds good, but I have 0.0 clue what that actually means.

Can you give me what you did with windows in hopes that I can figure out how to
do it in Linux? I am guessing it shouldn't be to different. (well, hoping...)

Carl K
 
C

Carl K

Jaap said:
Are you clever and am I stupid? I did not read this in your original post!

Here is what the code looks like that generates the pdf:

buffer = StringIO()
rw = dReportWriter(OutputFile=buffer, ReportFormFile=xmlfile, Cursor=ds)
rw.write()
pdf = buffer.getvalue()
return pdf

Carl K
 
D

Diez B. Roggisch

Carl said:
maybe, but I wouldn't be so sure.

currently the pdf is created in a python StringIO buffer and returned to
the browser; so it never becomes a file. using convert means I have to
first save it as a file, convert from file to file, read the file,
delete the 2 files. so 6 file operations where before there were none.
That may be more of a load than the ghostscript part.

So what? I'm not sure about current HD speeds, but a couple of years ago
these were about 30MByte/s - and should be faster today. Which equals
240MBit/s, much more than your user's internet connection. and this is
raw IO speed, not counting disk caches.

In other words: given the overall latency of a network connection, your
file operations shouldn't shave off more than a split-second. So if
you _can_ go the subprocess-road, do it. It's the easiest way. And
withou further knowledge of the GS-library (that you lack, as do I) -
how do you know that it works "in memory", and doesn't actually expect a
file-name or pointer?

Diez
 
R

Rob Wolfe

Carl K said:
I need to take the take the pdf output from reportlab and create a
preview image for a web page. so png or something. I am sure
ghostscript will be involved. I am guessing PIL or ImageMagic ?

all sugestions welcome.

Did you try to use `reportPM` from rl_addons [1]_?
This is an extension of the reportlab package.

There is also PIL needed and on my linux box
I needed some additional fonts [2]_.

And then I could create PNG directly from reportlab, e.g:

<code>
from reportlab.graphics.shapes import Drawing, String
from reportlab.graphics import renderPM

d = Drawing(400, 200)
d.add(String(150, 100, 'Hello World', fontSize=18))
renderPM.drawToFile(d, 'test.png', 'PNG')
</code>

... [1] http://www.reportlab.co.uk/svn/public/reportlab/trunk/rl_addons/
... [2] http://www.reportlab.com/ftp/fonts/pfbfer.zip

HTH,
Rob
 
C

Carl K

Diez said:
So what? I'm not sure about current HD speeds, but a couple of years ago
these were about 30MByte/s - and should be faster today. Which equals
240MBit/s, much more than your user's internet connection. and this is
raw IO speed, not counting disk caches.

server is doing a ton of SQL queries (yes, moving to a 2nd box would be nice.
might happen mid 2008) so adding HD is an issue. not sure how much, but enough
to try to avoid it.
In other words: given the overall latency of a network connection, your
file operations shouldn't shave off more than a split-second.

those split seconds can add up. The server is aleady overloaded, so adding more
is a big no no.
> So if you
_can_ go the subprocess-road, do it. It's the easiest way. And withou
further knowledge of the GS-library (that you lack, as do I) - how do
you know that it works "in memory", and doesn't actually expect a
file-name or pointer?

I am willing to take that chance. much better than the 6 hits I know would
happen using

I have a feeling if I have to create a file, we will go with plan B: send the
client a pdf and let the user deal with it. Not as nice and slick, but won't
bog the server.

Carl K
 
C

Carl K

Rob said:
Carl K said:
I need to take the take the pdf output from reportlab and create a
preview image for a web page. so png or something. I am sure
ghostscript will be involved. I am guessing PIL or ImageMagic ?

all sugestions welcome.

Did you try to use `reportPM` from rl_addons [1]_?
This is an extension of the reportlab package.

There is also PIL needed and on my linux box
I needed some additional fonts [2]_.

And then I could create PNG directly from reportlab, e.g:

<code>
from reportlab.graphics.shapes import Drawing, String
from reportlab.graphics import renderPM

d = Drawing(400, 200)
d.add(String(150, 100, 'Hello World', fontSize=18))
renderPM.drawToFile(d, 'test.png', 'PNG')
</code>

.. [1] http://www.reportlab.co.uk/svn/public/reportlab/trunk/rl_addons/
.. [2] http://www.reportlab.com/ftp/fonts/pfbfer.zip

This sounds like what I was looking for. some how this got missed when I poked
around reportlab land.

Thanks much.

Carl K
 
G

Grant Edwards

So what? I'm not sure about current HD speeds, but a couple of years ago
these were about 30MByte/s - and should be faster today. Which equals
240MBit/s, much more than your user's internet connection. and this is
raw IO speed, not counting disk caches.

Unless the file is really huge (or the server is overloaded),
the bytes will probably never even hit a platter. If you're
using any even remotely modern OS, short-lived tempfiles used
as you desdcribe are basically just memory-buffers with a
filesystem API.
 
A

Andrew MacIntyre

Carl said:
sounds good, but I have 0.0 clue what that actually means.

Can you give me what you did with windows in hopes that I can figure out how to
do it in Linux? I am guessing it shouldn't be to different. (well, hoping...)

ctypes is a foreign function interface (FFI) extension that became part
of the standard library with Python 2.5 (& is available for 2.3 & 2.4).
It is supported on Linux, *BSD & Solaris (I think) in addition to Windows.

Ghostscript for quite some time has had support for being used as a
library (DLL on Windows). There are only a small number of API functions
exported, and there is information about the net for calling these API
functions from Visual Basic. I wrote a wrapper module using ctypes for
the API based on the C header and the VB information.

To get the best rendering, some understanding of Ghostscript options is
required particularly for image format outputs (eg for anti-aliasing text).

--
 
D

Diez B. Roggisch

Carl said:
server is doing a ton of SQL queries (yes, moving to a 2nd box would be
nice. might happen mid 2008) so adding HD is an issue. not sure how
much, but enough to try to avoid it.

Keeping stuff in memory provoking paging isn't?
those split seconds can add up. The server is aleady overloaded, so
adding more is a big no no.


I am willing to take that chance. much better than the 6 hits I know
would happen using

I have a feeling if I have to create a file, we will go with plan B:
send the client a pdf and let the user deal with it. Not as nice and
slick, but won't bog the server.

I have the feeling you just go by your feelings. Which is always a bad
idea regarding performance bottlenecks.

http://en.wikipedia.org/wiki/Optimization_(computer_science)

So instead of jumping through hoops getting something done the hard way
without knowing how the easy solution affects performance, implement the
feature the easiest way. And SEE if it causes trouble.

Diez
 
B

Boris Borcic

Carl said:
Rob said:
Carl K said:
I need to take the take the pdf output from reportlab and create a
preview image for a web page. so png or something. I am sure
ghostscript will be involved. I am guessing PIL or ImageMagic ?

all sugestions welcome.

Did you try to use `reportPM` from rl_addons [1]_? This is an
extension of the reportlab package.

There is also PIL needed and on my linux box
I needed some additional fonts [2]_.

And then I could create PNG directly from reportlab, e.g:

<code>
from reportlab.graphics.shapes import Drawing, String
from reportlab.graphics import renderPM

d = Drawing(400, 200)
d.add(String(150, 100, 'Hello World', fontSize=18))
renderPM.drawToFile(d, 'test.png', 'PNG')
</code>

.. [1] http://www.reportlab.co.uk/svn/public/reportlab/trunk/rl_addons/
.. [2] http://www.reportlab.com/ftp/fonts/pfbfer.zip

This sounds like what I was looking for. some how this got missed when
I poked around reportlab land.

Thanks much.

Carl K

Beware... AFAIK this is only a backend for reportlab graphics drawings, IOW it
will render drawings and charts from the reportlab.graphics package but will not
render reportlab pdf canvas.
 
C

Carl K

Grant said:
Unless the file is really huge (or the server is overloaded),

The server is already overloaded,
the bytes will probably never even hit a platter. If you're
using any even remotely modern OS, short-lived tempfiles used
as you desdcribe are basically just memory-buffers with a
filesystem API.

Good point. Not that I am willing to risk it (just using the pdf is not such a
bad option) but I am wondering if it would make sense to create a ramdrive for
something like this. if memory is needed, swap would happen, which should be
better than creating files.

Carl K
 
P

Piet van Oostrum

Carl K said:
CK> Here is what the code looks like that generates the pdf:
CK> buffer = StringIO()
CK> rw = dReportWriter(OutputFile=buffer, ReportFormFile=xmlfile, Cursor=ds)
CK> rw.write()
CK> pdf = buffer.getvalue()
CK> return pdf

You can pipe the pdf through ghostscript and read the png back from
ghostscript's stdout. Like:

gs -q -sDEVICE=png16m -sOutputFile=- -

Use that command in subprocess with the stdin/stdout as pipes, send
your PDF data to the process and read the PNG output back.

However you must be aware that this can deadlock if the output is large
enough. So putting the input or the output in a real file is probably safer
anyway.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top