Thoughts on speeding up PDF::API2

Discussion in 'Perl Misc' started by Bill H, Sep 12, 2008.

  1. Bill H

    Bill H Guest

    In a recent post I asked about speeding up a perl script that uses
    PDF::API2. I did some profiling of the code and see that the vast
    majority of the time (about 90%) is used in going through all
    the .pm's in the PDF::API2 library. Once it gets past all of the
    initialization, my code that uses the api goes very fast, creating a
    20+ pdf document with seperate image thumbnail files of each page (via
    imagemagik) in less than 2 seconds.

    In a meeting we were having tonight we was tossing around the idea of
    having the program go through its initial setup and then "pause" to
    wait for a signal to create a pdf file, then create the pdf, images
    and then go back to the pause. Basically running all the time as a
    service. Anyone see any reason why this would be a bad idea?

    We further started wondering, instead of pausing, then running on a
    signal and then going back to pause for next signal to make a pdf,
    would it be possible to fork off a child at that point and have the
    child create the pdf / images and end, while the parent stayed at the
    pause position waiting for another signal to fork off a child. If we
    forked off a child, would it start from the begining of the script or
    would it start at the same place (probably next line) in the perl
    script it was forked off of?

    Any thoughts?

    Bill H
     
    Bill H, Sep 12, 2008
    #1
    1. Advertising

  2. Bill H

    Ben Morrow Guest

    Quoth Bill H <>:
    > In a recent post I asked about speeding up a perl script that uses
    > PDF::API2. I did some profiling of the code and see that the vast
    > majority of the time (about 90%) is used in going through all
    > the .pm's in the PDF::API2 library. Once it gets past all of the
    > initialization, my code that uses the api goes very fast, creating a
    > 20+ pdf document with seperate image thumbnail files of each page (via
    > imagemagik) in less than 2 seconds.
    >
    > In a meeting we were having tonight we was tossing around the idea of
    > having the program go through its initial setup and then "pause" to
    > wait for a signal to create a pdf file, then create the pdf, images
    > and then go back to the pause. Basically running all the time as a
    > service. Anyone see any reason why this would be a bad idea?


    No, it's a very good idea. This is exactly what systems like mod_perl
    and FastCGI do to speed things up. You do have to be careful to clear
    everything out between one run and the next...

    > We further started wondering, instead of pausing, then running on a
    > signal and then going back to pause for next signal to make a pdf,
    > would it be possible to fork off a child at that point and have the
    > child create the pdf / images and end, while the parent stayed at the
    > pause position waiting for another signal to fork off a child.


    ....which is something fork allows you to avoid :). fork does have some
    overhead, which is why programs like Apache go to some trouble to avoid
    forking a new process as each request comes in, but since your previous
    model was a whole new perl process for each run this probably isn't
    significant.

    If anyone suggests using threads from perl on a system that has a real
    fork, laugh :).

    > If we forked off a child, would it start from the begining of the
    > script or would it start at the same place (probably next line) in the
    > perl script it was forked off of?


    perldoc -f fork
    man 2 fork

    Basically, both old and new processes will return from the fork call,
    the only difference between them at that point being what is returned.

    Ben

    --
    Every twenty-four hours about 34k children die from the effects of poverty.
    Meanwhile, the latest estimate is that 2800 people died on 9/11, so it's like
    that image, that ghastly, grey-billowing, double-barrelled fall, repeated
    twelve times every day. Full of children. [Iain Banks]
     
    Ben Morrow, Sep 12, 2008
    #2
    1. Advertising

  3. Bill H

    Guest

    Bill H <> wrote:
    > In a recent post I asked about speeding up a perl script that uses
    > PDF::API2. I did some profiling of the code and see that the vast
    > majority of the time (about 90%) is used in going through all
    > the .pm's in the PDF::API2 library. Once it gets past all of the
    > initialization, my code that uses the api goes very fast, creating a
    > 20+ pdf document with seperate image thumbnail files of each page (via
    > imagemagik) in less than 2 seconds.


    If 10% of the time is spent doing something that takes 2 seconds,
    then 100% of the time is 20 seconds and the module loading must be taking
    almost 18 seconds. That is outrageous on anything modestly recent
    computer. On my machine, loading PDF::API2 takes ~0.5 seconds.

    One possible problem is if the PDF::API2 location install show up late in
    @INC, and the stuff earlier in @INC is on slow network drives. For each of
    the files it opens as part of loading PDF:API2, it has to "stat" its way
    through the entire @INC list before finally finding it.


    > In a meeting we were having tonight we was tossing around the idea of
    > having the program go through its initial setup and then "pause" to
    > wait for a signal to create a pdf file, then create the pdf, images
    > and then go back to the pause. Basically running all the time as a
    > service. Anyone see any reason why this would be a bad idea?


    Nope. Sounds like a good idea. Working out the "signal" could be tricky.

    > We further started wondering, instead of pausing, then running on a
    > signal and then going back to pause for next signal to make a pdf,
    > would it be possible to fork off a child at that point and have the
    > child create the pdf / images and end, while the parent stayed at the
    > pause position waiting for another signal to fork off a child.


    Yes, you can do that, but it probably wouldn't be worthwhile. Since the
    make a pdf part is fast, what is the point of parallelizing it? It would
    add complexity for probably little to no benefit.


    > If we
    > forked off a child, would it start from the begining of the script or
    > would it start at the same place (probably next line) in the perl
    > script it was forked off of?


    The new process and the old process start/continue at the same place. It
    isn't the next line, it is the" returning" of the fork.
    $x=fork();

    The fork itself only happens in the parent, but the assignment to $x
    happens in both the parent and the child.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    The costs of publication of this article were defrayed in part by the
    payment of page charges. This article must therefore be hereby marked
    advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    this fact.
     
    , Sep 12, 2008
    #3
  4. Bill H

    Bill H Guest

    On Sep 11, 11:27 pm, wrote:
    > Bill H <> wrote:
    > > In a recent post I asked about speeding up a perl script that uses
    > > PDF::API2. I did some profiling of the code and see that the vast
    > > majority of the time (about 90%) is used in going through all
    > > the .pm's in the PDF::API2 library. Once it gets past all of the
    > > initialization, my code that uses the api goes very fast, creating a
    > > 20+ pdf document with seperate image thumbnail files of each page (via
    > > imagemagik) in less than 2 seconds.

    >
    > If 10% of the time is spent doing something that takes 2 seconds,
    > then 100% of the time is 20 seconds and the module loading must be taking
    > almost 18 seconds.  That is outrageous on anything modestly recent
    > computer. On my machine, loading PDF::API2 takes ~0.5 seconds.
    >
    > One possible problem is if the PDF::API2 location install show up late in
    > @INC, and the stuff earlier in @INC is on slow network drives.  For each of
    > the files it opens as part of loading PDF:API2, it has to "stat" its way
    > through the entire @INC list before finally finding it.
    >
    > > In a meeting we were having tonight we was tossing around the idea of
    > > having the program go through its initial setup and then "pause" to
    > > wait for a signal to create a pdf file, then create the pdf, images
    > > and then go back to the pause. Basically running all the time as a
    > > service. Anyone see any reason why this would be a bad idea?

    >
    > Nope.  Sounds like a good idea.  Working out the "signal" could be tricky.
    >
    > > We further started wondering, instead of pausing, then running on a
    > > signal and then going back to pause for next signal to make a pdf,
    > > would it be possible to fork off a child at that point and have the
    > > child create the pdf / images and end, while the parent stayed at the
    > > pause position waiting for another signal to fork off a child.

    >
    > Yes, you can do that, but it probably wouldn't be worthwhile.  Since the
    > make a pdf part is fast, what is the point of parallelizing it?  It would
    > add complexity for probably little to no benefit.
    >
    > > If we
    > > forked off a child, would it start from the begining of the script or
    > > would it start at the same place (probably next line) in the perl
    > > script it was forked off of?

    >
    > The new process and the old process start/continue at the same place.  It
    > isn't the next line, it is the" returning" of the fork.
    > $x=fork();
    >
    > The fork itself only happens in the parent, but the assignment to $x
    > happens in both the parent and the child.
    >
    > Xho
    >
    > --
    > --------------------http://NewsReader.Com/--------------------
    > The costs of publication of this article were defrayed in part by the
    > payment of page charges. This article must therefore be hereby marked
    > advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    > this fact.


    Thanks Ben, Xho for your comments. I am glad to see the idea we had
    wasn't that far fetched.

    On the signal to do something, the part of the website that calls the
    perl program using PDF::API2 is in php and uses php sessions to talk
    back and forth to each other. I saw a perl module that let you access
    php sessions and wonder about using that method to send the signal.
    Has anyone had any experience using php sessions in perl? Are they
    continously updated? Or can anyone think of a better way of signaling
    the perl script from another program?

    Bill H
     
    Bill H, Sep 12, 2008
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. michael vernersen

    Perl::API2 PDF problem.

    michael vernersen, Feb 29, 2004, in forum: Perl Misc
    Replies:
    0
    Views:
    110
    michael vernersen
    Feb 29, 2004
  2. michael vernersen

    perl::api2 pdf problem...

    michael vernersen, Feb 29, 2004, in forum: Perl Misc
    Replies:
    5
    Views:
    133
    michael vernersen
    Mar 3, 2004
  3. Bruce Horrocks
    Replies:
    0
    Views:
    434
    Bruce Horrocks
    Jan 24, 2006
  4. Ted Byers
    Replies:
    3
    Views:
    159
    Peter J. Holzer
    Dec 17, 2008
  5. Jasper2000

    PDF::API2 (Creating PDF files)

    Jasper2000, Mar 11, 2010, in forum: Perl Misc
    Replies:
    1
    Views:
    324
    Jasper2000
    Mar 11, 2010
Loading...

Share This Page