More general programming than perl...

Discussion in 'Perl Misc' started by Justin C, Apr 30, 2014.

  1. Justin C

    Justin C Guest

    I will be coding this in perl, but I can't yet get my head
    around how I'm going to achieve what I want, maybe people
    here can offer suggestions on how I might proceed -
    obviously in broad terms, code is a way off at the moment
    I think.

    I need to prepare a "Latest Products" document, the
    contents are coming from a database, I've got to fill the
    document with the latest and stop when the document is 24
    pages, however, the document runs chronologically from
    oldest to newest. I'm trying to work out how I can decide
    which item/date to put at the start of the document, so I
    don't run out of data before 24 pages, or over-run 24

    Information is broken up into date sections (listing new
    products for that day), there are varying amounts of data
    for each date, from a few lines to more than a page. There
    is a section heading which is larger than a line of data,
    and there is a vertical space between sections, so how
    many lines I can fit on a page depends on how many
    sections there will be.

    Every page starts with a section/date heading regardless
    of whether it's a continuation of the section on the
    previous page or not.

    Any suggestions one how I might, programatically, decide
    where I should begin my document?

    Justin C, Apr 30, 2014
    1. Advertisements

  2. [...]
    [formatting details]
    Start with the last entry supposed to appear on the last page, ie, the
    most recent one, and work backwards from that until you either run out
    of data or have produced 24 pages.
    Rainer Weikusat, Apr 30, 2014
    1. Advertisements

  3. This is not sufficient on its own in case there is less than 24 pages
    worth of data, assuming that a partially filled page may appear at the
    end and must not appear at the beginning. This can be solved with a
    2-pass algorithm: First, move backward through the data (recording the
    space needed for each entry and meta-entry, ie, section header) until 24
    pages have been accumulated or there's no more data. Then, move forward
    through the entries in order to produce actual pages. This step can be
    avoided if there are 24 pages but that's probably not worth the effort.

    Possible gotcha: A situation where a lone 'date section heading' appears
    at the bottom of a page, followed by the first entry for that day should
    probably be avoided.
    Rainer Weikusat, Apr 30, 2014
  4. Justin C

    gamo Guest

    El 30/04/14 13:29, Justin C escribió:
    I assume you can produce 100 pages.

    Just produce 25 pages and then do 'intelligent' cuts, like
    based on section length, recentness, etc. until it fits in
    24 pages.
    gamo, Apr 30, 2014
  5. it does not look very difficult task.
    You need some sql queries through DBI and create the documents using e.g
    the html template .
    George Mpouras, Apr 30, 2014
  6. *SKIP*
    Please, define "document".

    Eric Pozharski, May 1, 2014
  7. Justin C

    Justin C Guest

    Thank you for not answering the question. Telling me what I already
    know is very profitable use of both your time and mine. Thank you
    for your lack of assistance in this matter.

    Justin C, May 1, 2014
  8. Justin C

    Justin C Guest

    Yes, I can see that's an option. It doesn't seem very
    economical though, I can foresee a time when I could
    produce 10,000 pages and my CPU and RAM would be occupied
    for an unnecessary length of time.

    I could run it this way once, and then record where the
    cut falls, and refer to that position as my start point
    for next time. Then remove the oldest and record a new
    cut/starting point.

    Thank you for the suggestion.

    Justin C, May 1, 2014
  9. Justin C

    Justin C Guest

    I am not sure it is relevant to the question I asked, but my
    document is actually an Excel spreadsheet which will be printed
    as a PDF.

    The reason is that historically this document has always come
    from an Excel file, and now we no longer wish to maintain the
    document and instead are putting the data into a DB and intend
    to work from that instead. Maybe, at a later date, I'll bypass
    the Excel file if it turns out no one actually wants it in that
    format, and go straight to PDF.

    Justin C, May 1, 2014
  10. Justin C

    Steve May Guest

    Seems like this approach might work:

    Fill a list with records (hash-refs) sorted new to old. Limit to some
    reasonable number knowing that you can always add a few more if needed.

    Trial print the list while tracking the record count and page count to
    determine how many records you can print from the list.

    Take a slice from the list based on how many records needed (above), and
    reverse it.



    Steve May, May 1, 2014
  11. [...]
    As I already wrote: The pages can be build backwards while accumlating
    records. That is, for each new record, the size including a possible
    leading 'day sections header' is calculated. If it still fits in front
    of the most recently added record on the current page, it is added to
    that, otherwise, it becomes the first entry on the next page. This
    process is repeated until 24 pages of output have been accumulated.

    In case there are less than 24 pages, a 2nd pass can be used to pull
    'entries to preceeding pages' so that the last page ends up being
    'partially filled' instead of the first page.

    This is not really difficult provide one can overcome the notion that
    'stuff has to happen forwards' (something even "well known OSS
    celebrities" reportedly find difficult :->) and that 'the size' must be
    calculated in one go instead of incrementally.
    Rainer Weikusat, May 1, 2014
  12. Justin C

    Justin C Guest

    That sounds quite reasonable. Yes, I like that. Thank you Steve.

    Justin C, May 1, 2014
  13. Justin C

    Steve May Guest

    Yes, though it was not immediately clear to me what you were suggesting
    (sometimes I'm a bit thick).

    Too, differing explanations/perspectives on solutions are often useful.

    At any rate, it seems the OP has some ideas to play with now.

    Steve May, May 1, 2014
  14. Justin C

    Steve May Guest


    Forgot one thing: depending of formatting and exactly how the line
    breaks work out, it IS possible that 24 pages forward will turn into 25
    (or 23) pages reversed. You might want to double check the final output
    page count.

    Steve May, May 1, 2014
  15. "Document", being defined, defines "page". From your conversations with
    others I'm glad to find out that your "document" looks more like
    text/plain (wrapped in application/pdf) then application/pdf itself.
    You don't need Excel for this. Or perl -- Excel would be closer to
    pages than perl.

    Eric Pozharski, May 2, 2014
  16. Justin C

    Justin C Guest

    Though we print the document for customers the original Excel
    file is also available to download from our web-site, and
    some customers are used to it that way, I wouldn't want to
    take that away from those that like it that way - who may
    have programs that read the file and 'do stuff' with the data.

    My preference would be for LaTeX to produce it, because I
    know it can do better typesetting than anything else in my
    toolbox, but that doesn't get around the desire to still
    support those who like the Excel file... maybe two files, a
    'pretty' PDF and a plain Excel file. Hmmm.

    I'm going to enjoy this!

    Justin C, May 2, 2014
  17. Justin C

    Justin C Guest


    Thank you to all who have made suggestions. I have some
    interesting things to think about, and you have all helped
    me better understand what I have to achieve.

    Justin C, May 2, 2014
  18. *SKIP*
    No, you won't. It will be terrible pain.

    p.s. I'm doing typesetting for living. And yes, it's texlive and
    Eric Pozharski, May 3, 2014
  19. Justin C

    ccc31807 Guest

    Is 'page' a physical page or a logical page?
    What format is your output?

    To begin with, I assume that you have your data in some kind of nested hash.. I assume that your keys might be some kind of date that could be sorted, like maybe a Julian date. I also assume that your SQL orders by date and returns a limit of 24. In that case, you would do this:

    foreach my $product (sort keys %products)

    This would work fine if your 'pages' were logical pages.

    If your pages were physical pages,and if PDF output was acceptable, it would be easy to use PDF::API2 or similar, which requires you to issue literal line feeds. Count down the lines until you reach the bottom, and start a new page. Quit when you hit 24.

    I've had great success recently using Perl to emit TEX source code and thencalling

    `pdflatex source.tex`;

    If you feel comfortable using LaTeX I think you will be pleasantly surprised at how easy Perl can produce TEX, and this gives you exquisite control over the presentation of your output.

    Pert is a great tool for this kind of job. For me, as long as you get your initial data structure right, this ought to be trivial.

    ccc31807, May 12, 2014
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.