Re: Fast forward-backward (write-read)

Discussion in 'Python' started by David Hutto, Oct 23, 2012.

  1. David Hutto

    David Hutto Guest

    On Tue, Oct 23, 2012 at 10:31 AM, Virgil Stokes <> wrote:
    > I am working with some rather large data files (>100GB) that contain time
    > series data. The data (t_k,y(t_k)), k = 0,1,...,N are stored in ASCII
    > format. I perform various types of processing on these data (e.g. moving
    > median, moving average, and Kalman-filter, Kalman-smoother) in a sequential
    > manner and only a small number of these data need be stored in RAM when
    > being processed. When performing Kalman-filtering (forward in time pass, k =
    > 0,1,...,N) I need to save to an external file several variables (e.g. 11*32
    > bytes) for each (t_k, y(t_k)). These are inputs to the Kalman-smoother
    > (backward in time pass, k = N,N-1,...,0). Thus, I will need to input these
    > variables saved to an external file from the forward pass, in reverse order
    > --- from last written to first written.
    >
    > Finally, to my question --- What is a fast way to write these variables to
    > an external file and then read them in backwards?


    Don't forget to use timeit for an average OS utilization.

    I'd suggest two list comprehensions for now, until I've reviewed it some more:

    forward = ["%i = %s" % (i,chr(i)) for i in range(33,126)]
    backward = ["%i = %s" % (i,chr(i)) for i in range(126,32,-1)]

    for var in forward:
    print var

    for var in backward:
    print var

    You could also use a dict, and iterate through a straight loop that
    assigned a front and back to a dict_one = {0 : [0.100], 1 : [1.99]}
    and the iterate through the loop, and call the first or second in the
    dict's var list for frontwards , or backwards calls.


    But there might be faster implementations, depending on other
    function's usage of certain lower level functions.


    --
    Best Regards,
    David Hutto
    CEO: http://www.hitwebdevelopment.com
     
    David Hutto, Oct 23, 2012
    #1
    1. Advertising

  2. On Tue, 23 Oct 2012 17:50:55 -0400, David Hutto wrote:

    > On Tue, Oct 23, 2012 at 10:31 AM, Virgil Stokes <> wrote:
    >> I am working with some rather large data files (>100GB)

    [...]
    >> Finally, to my question --- What is a fast way to write these variables
    >> to an external file and then read them in backwards?

    >
    > Don't forget to use timeit for an average OS utilization.


    Given that the data files are larger than 100 gigabytes, the time
    required to process each file is likely to be in hours, not microseconds.
    That being the case, timeit is the wrong tool for the job, it is
    optimized for timings tiny code snippets. You could use it, of course,
    but the added inconvenience doesn't gain you any added accuracy.

    Here's a neat context manager that makes timing long-running code simple:


    http://code.activestate.com/recipes/577896



    > I'd suggest two list comprehensions for now, until I've reviewed it some
    > more:


    I would be very surprised if the poster will be able to fit 100 gigabytes
    of data into even a single list comprehension, let alone two.

    This is a classic example of why the old external processing algorithms
    of the 1960s and 70s will never be obsolete. No matter how much memory
    you have, there will always be times when you want to process more data
    than you can fit into memory.



    --
    Steven
     
    Steven D'Aprano, Oct 23, 2012
    #2
    1. Advertising

  3. > This is a classic example of why the old external processing algorithms
    > of the 1960s and 70s will never be obsolete. No matter how much memory
    > you have, there will always be times when you want to process more data
    > than you can fit into memory.



    But surely nobody will *ever* need more than 640k…

    Right?

    Demian Brecht
    @demianbrecht
    http://demianbrecht.github.com
     
    Demian Brecht, Oct 23, 2012
    #3
  4. David Hutto

    David Hutto Guest

    On Tue, Oct 23, 2012 at 6:53 PM, Steven D'Aprano
    <> wrote:
    > On Tue, 23 Oct 2012 17:50:55 -0400, David Hutto wrote:
    >
    >> On Tue, Oct 23, 2012 at 10:31 AM, Virgil Stokes <> wrote:
    >>> I am working with some rather large data files (>100GB)

    > [...]
    >>> Finally, to my question --- What is a fast way to write these variables
    >>> to an external file and then read them in backwards?

    >>
    >> Don't forget to use timeit for an average OS utilization.

    >
    > Given that the data files are larger than 100 gigabytes, the time
    > required to process each file is likely to be in hours, not microseconds.
    > That being the case, timeit is the wrong tool for the job, it is
    > optimized for timings tiny code snippets. You could use it, of course,
    > but the added inconvenience doesn't gain you any added accuracy.


    It depends on the end result, and the fact that if the iterations
    themselves are about the same time, then just using a segment of the
    iterations could be scaled down, and a full run might be worth it, if
    you have a second computer running optimization.

    >
    > Here's a neat context manager that makes timing long-running code simple:
    >
    >
    > http://code.activestate.com/recipes/577896



    I'll test this out for big O notation later. For the OP:

    http://en.wikipedia.org/wiki/Big_O_notation





    >
    >
    >
    >> I'd suggest two list comprehensions for now, until I've reviewed it some
    >> more:

    >
    > I would be very surprised if the poster will be able to fit 100 gigabytes
    > of data into even a single list comprehension, let alone two.

    Again, these can be scaled depending on the operations of the function
    in question, and the average time of aforementioned function(s)

    >
    > This is a classic example of why the old external processing algorithms
    > of the 1960s and 70s will never be obsolete. No matter how much memory
    > you have, there will always be times when you want to process more data
    > than you can fit into memory


    This is a common misconception. You can engineer a device that
    accommodates this if it's a direct experimental necessity.
    >


    --
    Best Regards,
    David Hutto
    CEO: http://www.hitwebdevelopment.com
     
    David Hutto, Oct 24, 2012
    #4
  5. On 24-Oct-2012 00:57, Demian Brecht wrote:
    >> This is a classic example of why the old external processing algorithms
    >> of the 1960s and 70s will never be obsolete. No matter how much memory
    >> you have, there will always be times when you want to process more data
    >> than you can fit into memory.

    >
    > But surely nobody will *ever* need more than 640k…
    >
    > Right?
    >
    > Demian Brecht
    > @demianbrecht
    > http://demianbrecht.github.com
    >
    >
    >
    >

    Yes, I can still remember such quotes --- thanks for jogging my memory, Demian :)
     
    Virgil Stokes, Oct 24, 2012
    #5
  6. On 24-Oct-2012 00:53, Steven D'Aprano wrote:
    > On Tue, 23 Oct 2012 17:50:55 -0400, David Hutto wrote:
    >
    >> On Tue, Oct 23, 2012 at 10:31 AM, Virgil Stokes <> wrote:
    >>> I am working with some rather large data files (>100GB)

    > [...]
    >>> Finally, to my question --- What is a fast way to write these variables
    >>> to an external file and then read them in backwards?

    >> Don't forget to use timeit for an average OS utilization.

    > Given that the data files are larger than 100 gigabytes, the time
    > required to process each file is likely to be in hours, not microseconds.
    > That being the case, timeit is the wrong tool for the job, it is
    > optimized for timings tiny code snippets. You could use it, of course,
    > but the added inconvenience doesn't gain you any added accuracy.
    >
    > Here's a neat context manager that makes timing long-running code simple:
    >
    >
    > http://code.activestate.com/recipes/577896

    Thanks for this link
    >
    >
    >
    >> I'd suggest two list comprehensions for now, until I've reviewed it some
    >> more:

    > I would be very surprised if the poster will be able to fit 100 gigabytes
    > of data into even a single list comprehension, let alone two.

    You are correct and I have been looking at working with blocks that are sized to
    the RAM available for processing.
    >
    > This is a classic example of why the old external processing algorithms
    > of the 1960s and 70s will never be obsolete. No matter how much memory
    > you have, there will always be times when you want to process more data
    > than you can fit into memory.
    >
    >
    >

    Thanks for your insights :)
     
    Virgil Stokes, Oct 24, 2012
    #6
  7. David Hutto

    David Hutto Guest

    On Wed, Oct 24, 2012 at 3:17 AM, Virgil Stokes <> wrote:
    > On 24-Oct-2012 00:57, Demian Brecht wrote:
    >>>
    >>> This is a classic example of why the old external processing algorithms
    >>> of the 1960s and 70s will never be obsolete. No matter how much memory
    >>> you have, there will always be times when you want to process more data
    >>> than you can fit into memory.

    >>
    >>
    >> But surely nobody will *ever* need more than 640k…
    >>
    >> Right?
    >>
    >> Demian Brecht
    >> @demianbrecht
    >> http://demianbrecht.github.com
    >>
    >>
    >>
    >>

    > Yes, I can still remember such quotes --- thanks for jogging my memory,
    > Demian :)



    This is only on equipment designed by others, otherwise, you could
    engineer the hardware yourself to perfom just certain functions for
    you(RISC), and pass that back to the CISC(from a PCB design).


    --
    Best Regards,
    David Hutto
    CEO: http://www.hitwebdevelopment.com
     
    David Hutto, Oct 24, 2012
    #7
  8. On 2012-10-23, Steven D'Aprano <> wrote:

    > I would be very surprised if the poster will be able to fit 100
    > gigabytes of data into even a single list comprehension, let alone
    > two.
    >
    > This is a classic example of why the old external processing
    > algorithms of the 1960s and 70s will never be obsolete. No matter how
    > much memory you have, there will always be times when you want to
    > process more data than you can fit into memory.


    Too true. One of the projects I did in grad school about 20 years ago
    was a plugin for some fancy data visualization software (I think it
    was DX: http://www.research.ibm.com/dx/). My plugin would subsample
    "on the fly" a selected section of a huge 2D array of data in a file.
    IBM and SGI had all sorts of widgets you could use to sample,
    transform and visualize data, but they all assumed that the input data
    would fit into virtual memory.

    --
    Grant Edwards grant.b.edwards Yow! I Know A Joke!!
    at
    gmail.com
     
    Grant Edwards, Oct 24, 2012
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Virgil Stokes

    Fast forward-backward (write-read)

    Virgil Stokes, Oct 23, 2012, in forum: Python
    Replies:
    7
    Views:
    150
    Virgil Stokes
    Oct 25, 2012
  2. Tim Chase
    Replies:
    0
    Views:
    181
    Tim Chase
    Oct 23, 2012
  3. Dennis Lee Bieber

    Re: Fast forward-backward (write-read)

    Dennis Lee Bieber, Oct 23, 2012, in forum: Python
    Replies:
    0
    Views:
    138
    Dennis Lee Bieber
    Oct 23, 2012
  4. Virgil Stokes

    Re: Fast forward-backward (write-read)

    Virgil Stokes, Oct 23, 2012, in forum: Python
    Replies:
    4
    Views:
    193
    Tim Golden
    Oct 24, 2012
  5. Virgil Stokes

    Re: Fast forward-backward (write-read)

    Virgil Stokes, Oct 23, 2012, in forum: Python
    Replies:
    0
    Views:
    112
    Virgil Stokes
    Oct 23, 2012
Loading...

Share This Page