Large data array (operations) via disk disk files

Discussion in 'C Programming' started by geerrxin@gmail.com, Nov 13, 2006.

  1. Guest

    Hi,

    I have a need to manipulate a large matrix, say, A(N,N) (of real) > 8GB
    which can't fit in physical memory (2 BG). But the nature of
    computation
    requires the operation on only a portion of the data, e.g. 500 MB (0.5
    GB)
    at a time.

    The procedure is as follows:

    1. Generate data and store the data in array A(N,N), N is HUGE.

    2. Do computation on A in loops, e.g.

    for i = 1, N
    for j = 1, N
    compute something using A (a portion)
    end
    end

    How can I implement the procedure to accommodate the large memory
    needs?

    Thanks,
    Zin
     
    , Nov 13, 2006
    #1
    1. Advertising

  2. Ian Collins Guest

    wrote:
    > Hi,
    >
    > I have a need to manipulate a large matrix, say, A(N,N) (of real) > 8GB
    > which can't fit in physical memory (2 BG). But the nature of
    > computation
    > requires the operation on only a portion of the data, e.g. 500 MB (0.5
    > GB)
    > at a time.
    >
    > The procedure is as follows:
    >
    > 1. Generate data and store the data in array A(N,N), N is HUGE.
    >
    > 2. Do computation on A in loops, e.g.
    >
    > for i = 1, N
    > for j = 1, N
    > compute something using A (a portion)
    > end
    > end
    >
    > How can I implement the procedure to accommodate the large memory
    > needs?
    >

    Two solutions:

    If performance is an issue, use a box with enough memory.

    Otherwise use the memory mapped file support provided by your operating
    environment and map the required portion of the matrix into memory. How
    you do this will be OS specific and best asked on an OS group.

    --
    Ian Collins.
     
    Ian Collins, Nov 13, 2006
    #2
    1. Advertising

  3. Terence Guest

    wrote:
    > Hi,
    >
    > I have a need to manipulate a large matrix, say, A(N,N) (of real) > 8GB
    > which can't fit in physical memory (2 BG). But the nature of
    > computation
    > requires the operation on only a portion of the data, e.g. 500 MB (0.5
    > GB)
    > at a time.
    >
    > The procedure is as follows:
    >
    > 1. Generate data and store the data in array A(N,N), N is HUGE.
    >
    > 2. Do computation on A in loops, e.g.
    >
    > for i = 1, N
    > for j = 1, N
    > compute something using A (a portion)
    > end
    > end
    >
    > How can I implement the procedure to accommodate the large memory
    > needs?
    >
    > Thanks,
    > Zin


    Two possibilities
    1) if the data has a lot of missing elements or inferred constants
    (like zero) as elements, then you could use sparse matrix processing,
    where you have a marked link to each next element. Marking the element
    means you identify the value with :-
    either the previous and last element coordinates
    or just the element's row and column number.
    This needs the value of the cell and the coordinates of the cell (three
    values) per cell.
    Sometimes you can get by with just the column number and process by row
    and so only need to note when a column index "1" appears for the next
    counted row .
    This problem is often attacked with generised linked list processing
    routines.

    These methods will use less memory space if the needed elements occupy
    less than one third of the theoretical maximum (N*N), (or one half in
    the linear case) but only will be really useful if the prortion is far
    less, like one fifth or lower.

    2) rework the algorithm you wish to use, so that it needs less elements
    in memory at one time than the available memory, for the operation to
    proceed.

    If that doesn't work then use virtual memory by treating the disk as a
    random access file by row, with all of a row in each "record" and try
    to use an algorthm that processes on a column-within-row basis.
     
    Terence, Nov 13, 2006
    #3
  4. > 2. Do computation on A in loops, e.g.
    >
    > for i = 1, N
    > for j = 1, N
    > compute something using A (a portion)
    > end
    > end
    >
    > How can I implement the procedure to accommodate the large memory
    > needs?


    I suspect you need to look up the keyword "blocking", perhaps together with
    the words "array" or "matrix". That does require breaking up the naive
    sequence of operations of your double loop above, and depending on the nature
    of your "compute something", might require some careful thinking to get this
    reordered sequence do the correct thing.

    Jan
     
    =?ISO-8859-1?Q?Jan_Vorbr=FCggen?=, Nov 14, 2006
    #4
  5. Guest

    >
    > for i = 1, N
    > for j = 1, N
    > compute something using A (a portion)
    > end
    > end

    You need to say more about how exactly
    you want to "manipulate" your matrix ---
    simplest manipulations (e.g. scaling) require exactly one
    matrix element at a time.

    If your algorithm really requires more than one matrix block
    at a time --- split your matrix in manageable blocks that could
    be addressed separately (swapped in or out). These blocks dont need
    to reside within a single file, of course.

    Alexei
     
    , Nov 14, 2006
    #5
  6. <> wrote in message
    news:...
    > Hi,
    >
    > I have a need to manipulate a large matrix, say, A(N,N) (of real) > 8GB
    > which can't fit in physical memory (2 BG). But the nature of
    > computation
    > requires the operation on only a portion of the data, e.g. 500 MB (0.5
    > GB)
    > at a time.
    >
    > The procedure is as follows:
    >
    > 1. Generate data and store the data in array A(N,N), N is HUGE.
    >
    > 2. Do computation on A in loops, e.g.
    >
    > for i = 1, N
    > for j = 1, N
    > compute something using A (a portion)
    > end
    > end
    >
    > How can I implement the procedure to accommodate the large memory
    > needs?


    Use a Windows's 98 or XP's virtual memory/pagefile of 10 GB or so ;)

    Windows will do the rest.

    Bye,
    Skybuck.
     
    Skybuck Flying, Nov 14, 2006
    #6
  7. Skybuck Flying wrote:
    > <> wrote in message
    > news:...
    >
    >>Hi,
    >>
    >>I have a need to manipulate a large matrix, say, A(N,N) (of real) > 8GB
    >>which can't fit in physical memory (2 BG). But the nature of
    >>computation
    >>requires the operation on only a portion of the data, e.g. 500 MB (0.5
    >>GB)
    >>at a time.
    >>
    >>The procedure is as follows:
    >>
    >>1. Generate data and store the data in array A(N,N), N is HUGE.
    >>
    >>2. Do computation on A in loops, e.g.
    >>
    >>for i = 1, N
    >> for j = 1, N
    >> compute something using A (a portion)
    >> end
    >>end
    >>
    >>How can I implement the procedure to accommodate the large memory
    >>needs?

    >
    >
    > Use a Windows's 98 or XP's virtual memory/pagefile of 10 GB or so ;)
    >
    > Windows will do the rest.
    >
    > Bye,
    > Skybuck.
    >
    >

    That is a spectacularly useless suggestion if the program is not
    intended to run on a Windows machine... The OP doesn't say either way,
    so you have a 50/50 chance at best.

    Jim
     
    J. F. Cornwall, Nov 14, 2006
    #7
  8. santosh Guest

    J. F. Cornwall wrote:
    > Skybuck Flying wrote:
    > > <> wrote in message
    > > news:...
    > >
    > >>Hi,
    > >>
    > >>I have a need to manipulate a large matrix, say, A(N,N) (of real) > 8GB
    > >>which can't fit in physical memory (2 BG). But the nature of
    > >>computation
    > >>requires the operation on only a portion of the data, e.g. 500 MB (0.5
    > >>GB)
    > >>at a time.
    > >>
    > >>The procedure is as follows:
    > >>
    > >>1. Generate data and store the data in array A(N,N), N is HUGE.
    > >>
    > >>2. Do computation on A in loops, e.g.
    > >>
    > >>for i = 1, N
    > >> for j = 1, N
    > >> compute something using A (a portion)
    > >> end
    > >>end
    > >>
    > >>How can I implement the procedure to accommodate the large memory
    > >>needs?

    > >
    > >
    > > Use a Windows's 98 or XP's virtual memory/pagefile of 10 GB or so ;)
    > >
    > > Windows will do the rest.
    > >
    > > Bye,
    > > Skybuck.
    > >
    > >

    > That is a spectacularly useless suggestion if the program is not
    > intended to run on a Windows machine... The OP doesn't say either way,
    > so you have a 50/50 chance at best.


    Besides, AFAIK, 32 bit versions of Windows don't support a 10 GB paging
    file. Also only Windows has access to it.
     
    santosh, Nov 14, 2006
    #8
  9. "santosh" <> wrote in message
    news:...
    > J. F. Cornwall wrote:
    >> Skybuck Flying wrote:
    >> > <> wrote in message
    >> > news:...
    >> >
    >> >>Hi,
    >> >>
    >> >>I have a need to manipulate a large matrix, say, A(N,N) (of real) > 8GB
    >> >>which can't fit in physical memory (2 BG). But the nature of
    >> >>computation
    >> >>requires the operation on only a portion of the data, e.g. 500 MB (0.5
    >> >>GB)
    >> >>at a time.
    >> >>
    >> >>The procedure is as follows:
    >> >>
    >> >>1. Generate data and store the data in array A(N,N), N is HUGE.
    >> >>
    >> >>2. Do computation on A in loops, e.g.
    >> >>
    >> >>for i = 1, N
    >> >> for j = 1, N
    >> >> compute something using A (a portion)
    >> >> end
    >> >>end
    >> >>
    >> >>How can I implement the procedure to accommodate the large memory
    >> >>needs?
    >> >
    >> >
    >> > Use a Windows's 98 or XP's virtual memory/pagefile of 10 GB or so ;)
    >> >
    >> > Windows will do the rest.
    >> >
    >> > Bye,
    >> > Skybuck.
    >> >
    >> >

    >> That is a spectacularly useless suggestion if the program is not
    >> intended to run on a Windows machine... The OP doesn't say either way,
    >> so you have a 50/50 chance at best.


    Neh more 99.9 procent chance ;)

    Windows dominates >D =D

    >
    > Besides, AFAIK, 32 bit versions of Windows don't support a 10 GB paging
    > file. Also only Windows has access to it.


    Get a new PC.

    I am using Windows XP 64 bit <- it totally sucks but hey it's the future ;)

    Strangly enough I have two pagefiles each 8 GB.

    One one each harddisk. (I have two harddisks)

    I think Windows XP moved the pagefile to one of the disks.. or maybe it was
    me lol.

    So I allocated two pagefiles one on each disk just in case ;)

    If I can allocate 8 GB I can probably allocate 10 GB's as well or even more
    ;)

    Surely other operating systems have the same virtual memory future ? <- if
    not gjez get a decent OS lol.

    Bye,
    Skybuck.
     
    Skybuck Flying, Nov 15, 2006
    #9
  10. Randy Howard Guest

    Skybuck Flying wrote
    (in article <ejdtut$4gk$1.ov.home.nl>):

    > Get a new PC.


    A Mac Pro would be a nice choice.

    > I am using Windows XP 64 bit <- it totally sucks but hey it's the future ;)


    One of the rarest of all Skybuck statements, a correct one.
    There are very few of these in the wild. You are correct, the
    future for Windows users really does suck.


    --
    Randy Howard (2reply remove FOOBAR)
    "The power of accurate observation is called cynicism by those
    who have not got it." - George Bernard Shaw
     
    Randy Howard, Nov 15, 2006
    #10
  11. "Randy Howard" <> wrote in message
    news:...
    > Skybuck Flying wrote
    > (in article <ejdtut$4gk$1.ov.home.nl>):
    >
    >> Get a new PC.

    >
    > A Mac Pro would be a nice choice.
    >
    >> I am using Windows XP 64 bit <- it totally sucks but hey it's the future
    >> ;)

    >
    > One of the rarest of all Skybuck statements, a correct one.
    > There are very few of these in the wild. You are correct, the
    > future for Windows users really does suck.


    It sucks now, it will improve in the future :D

    So is the way of Microsoft, Software and Hardware manufacturers and there
    new, still buggy, Windows drivers and components ;)

    Bye,
    Skybuck.
     
    Skybuck Flying, Nov 15, 2006
    #11
  12. Guest

    Specifically, the problem needs to be addressed on Linux, 64-bit. It
    looks to me that one needs to deal with virtual memory directly.

    Ideally, the solution will look like the following:

    1. Map A to an addressible space.

    2. Generate entries of A(i,j), i = 1..N, j = 1..N.

    3. Compute something in the loops with reference to A(i,j), i = 1..N, j
    = 1..N.

    The fact that only a portion of A is used at a time during the
    computation makes the use of memory mapped file a sound solution. But
    one seems to have to keep track of the offset in the file in the
    subsequent calls to mmap() when accessing to different portion of A,
    which doesn't sound convenient, no?

    Is there any better solution, such that one can do something as simple
    as

    A = vm_create(...)

    in 1 above without worrying about memory limit as in the call to
    malloc() so that the rest of the code can be implemented without extra
    programming efforts in memory manipulation?

    Thanks,
    Zin

    Ian Collins wrote:
    > wrote:
    > > Hi,
    > >
    > > I have a need to manipulate a large matrix, say, A(N,N) (of real) > 8GB
    > > which can't fit in physical memory (2 BG). But the nature of
    > > computation
    > > requires the operation on only a portion of the data, e.g. 500 MB (0.5
    > > GB)
    > > at a time.
    > >
    > > The procedure is as follows:
    > >
    > > 1. Generate data and store the data in array A(N,N), N is HUGE.
    > >
    > > 2. Do computation on A in loops, e.g.
    > >
    > > for i = 1, N
    > > for j = 1, N
    > > compute something using A (a portion)
    > > end
    > > end
    > >
    > > How can I implement the procedure to accommodate the large memory
    > > needs?
    > >

    > Two solutions:
    >
    > If performance is an issue, use a box with enough memory.
    >
    > Otherwise use the memory mapped file support provided by your operating
    > environment and map the required portion of the matrix into memory. How
    > you do this will be OS specific and best asked on an OS group.
    >
    > --
    > Ian Collins.
     
    , Nov 15, 2006
    #12
  13. santosh Guest

    wrote:
    > Specifically, the problem needs to be addressed on Linux, 64-bit. It
    > looks to me that one needs to deal with virtual memory directly.


    What is the value of SIZE_MAX for your C implementation?

    > Ideally, the solution will look like the following:
    >
    > 1. Map A to an addressible space.
    >
    > 2. Generate entries of A(i,j), i = 1..N, j = 1..N.
    >
    > 3. Compute something in the loops with reference to A(i,j), i = 1..N, j
    > = 1..N.
    >
    > The fact that only a portion of A is used at a time during the
    > computation makes the use of memory mapped file a sound solution. But
    > one seems to have to keep track of the offset in the file in the
    > subsequent calls to mmap() when accessing to different portion of A,
    > which doesn't sound convenient, no?
    >
    > Is there any better solution, such that one can do something as simple
    > as
    >
    > A = vm_create(...)
    >
    > in 1 above without worrying about memory limit as in the call to
    > malloc() so that the rest of the code can be implemented without extra
    > programming efforts in memory manipulation?


    On a properly implemented C compiler for a 64 bit environment, SIZE_MAX
    should be sufficient for your needs. Have you tried using plain
    malloc().

    Generally, if your working set will fit within physical memory, then
    just malloc() the whole amount and let the OS do the grunt work. Often
    under Linux, memory won't actually be allocated until you write to it.
    Also having fast a HDD for swap space will improve matters somewhat.

    But if don't mind somewhat more involvement mmap() would probably be a
    more suited solution, as with it, you can provide the OS with more
    information of your actual memory needs and let it optimise itself
    accordingly.
     
    santosh, Nov 15, 2006
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jas Shultz
    Replies:
    0
    Views:
    987
    Jas Shultz
    Dec 3, 2003
  2. Jesus M. Salvo Jr.
    Replies:
    2
    Views:
    4,305
    robert
    Feb 11, 2006
  3. rbt
    Replies:
    19
    Views:
    652
    John Machin
    Jun 17, 2005
  4. CMOS
    Replies:
    15
    Views:
    488
    James Kanze
    May 17, 2007
  5. Ketchup
    Replies:
    1
    Views:
    273
    Jan Tielens
    May 25, 2004
Loading...

Share This Page