dealing with huge data

Discussion in 'C Programming' started by pereges, Apr 23, 2008.

  1. pereges

    pereges Guest

    ok so i have written a program in C where I am dealing with huge
    data(millions and lots of iterations involved) and for some reason the
    screen tends to freeze and I get no output every time I execute it.
    However, I have tried to reduce the amount of data and the program
    runs fine.
    What could possibly be done to resolve this ?
     
    pereges, Apr 23, 2008
    #1
    1. Advertising

  2. pereges

    pereges Guest

    I forgot to mention this happened while I was trying to print data.

    I have seen it can't work for extremely huge data.
     
    pereges, Apr 23, 2008
    #2
    1. Advertising

  3. pereges wrote:
    >
    > ok so i have written a program in C where I am dealing with huge
    > data(millions and lots of iterations involved) and for some reason the
    > screen tends to freeze and I get no output every time I execute it.
    > However, I have tried to reduce the amount of data and the program
    > runs fine.
    > What could possibly be done to resolve this ?


    There's a bug on line 42.

    --
    +-------------------------+--------------------+-----------------------+
    | Kenneth J. Brody | www.hvcomputer.com | #include |
    | kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
    +-------------------------+--------------------+-----------------------+
    Don't e-mail me at: <mailto:>
     
    Kenneth Brody, Apr 23, 2008
    #3
  4. pereges

    santosh Guest

    pereges wrote:

    <program "freezing" on "huge data" and millions of iterations>

    > I forgot to mention this happened while I was trying to print data.


    Print where? To a disk file? To a flash drive? To a screen? Some other
    device? To memory? What's the code for the print function? What are
    the data structures involved? Did you try compiler optimisations? Did
    you try implementation specific I/O routines (which are sometimes
    faster than standard C ones)? Did you profile the program?

    > I have seen it can't work for extremely huge data.


    Can't work or works too slowly for your taste?

    Unless you show us your current code and where exactly it's performance
    is not meeting your expectations, there's absolutely nothing that can
    be said other than the generic advice to buy faster storage devices and
    faster, more powerful hardware.
     
    santosh, Apr 23, 2008
    #4
  5. pereges

    user923005 Guest

    On Apr 23, 5:00 pm, Richard Heathfield <> wrote:
    > CBFalconer said:
    >
    > > pereges wrote:

    >
    > >> ok so i have written a program in C where I am dealing with huge
    > >> data (millions and lots of iterations involved) and for some
    > >> reason the screen tends to freeze and I get no output every time
    > >> I execute it. However,  I have tried to reduce the amount of data
    > >> and the program runs fine.

    >
    > >> What could possibly be done to resolve this ?

    >
    > > On the information supplied, I suspect that simply reducing the
    > > amount of data will fix the problem.  I am unable to estimate how
    > > much it should be reduced.

    >
    > In a similar vein, it was reported a few years ago that a computer program,
    > on being told that 90% of accidents in the home involved either the top
    > stair or the bottom stair and being asked what to do to reduce accidents,
    > suggested removing the top and bottom stairs.
    >
    > C programs regularly have to deal with very large amounts of data, and many
    > of them do so with admirable efficiency. The large amount of data, then,
    > is *not* the cause of the problem. Rather, it is when large amounts of
    > data are being processed that the problem manifests itself. Therefore,
    > reducing the amount of data will not only *not* fix the problem, but will
    > actually hide it, making it *harder* to fix.
    >
    > The proper solution is to find and fix the bug that is causing the problem..
    > The way to do /that/ is to reduce, not the amount of *data*, but the
    > amount of *code* - until the OP has the smallest compilable program that
    > reproduces the problem. It is often the case that, in preparing such a
    > program, the author of the code will find the problem. But if not, at
    > least he or she now has a minimal program that can be presented for
    > analysis by C experts, such as those who regularly haunt the corridors of
    > comp.lang.c. I commend this strategy to the OP.


    I don't think we can give good advice until the OP actually states
    what his exact problem is.
    This:
    > >> ok so i have written a program in C where I am dealing with huge
    > >> data (millions and lots of iterations involved) and for some
    > >> reason the screen tends to freeze and I get no output every time
    > >> I execute it. However, I have tried to reduce the amount of data
    > >> and the program runs fine.


    Does not really tell us anything.

    Millions of records? In what format? What operations are performed
    against the data? What is the actual underlying problem that is being
    solved?

    Probably, there is a good, inexpensive and compact solution and likely
    there are prebuilt tools that will already accomplish the job (or get
    most of the way there).

    "Big data" that "seems to freeze" doesn't mean anything.
     
    user923005, Apr 24, 2008
    #5
  6. pereges

    pereges Guest

    On Apr 23, 10:25 pm, santosh <> wrote:
    > pereges wrote:
    >
    > <program "freezing" on "huge data" and millions of iterations>
    >
    > > I forgot to mention this happened while I was trying to print data.

    >
    > Print where? To a disk file? To a flash drive? To a screen? Some other
    > device? To memory? What's the code for the print function? What are
    > the data structures involved? Did you try compiler optimisations? Did
    > you try implementation specific I/O routines (which are sometimes
    > faster than standard C ones)? Did you profile the program?
    >
    > > I have seen it can't work for extremely huge data.

    >
    > Can't work or works too slowly for your taste?
    >
    > Unless you show us your current code and where exactly it's performance
    > is not meeting your expectations, there's absolutely nothing that can
    > be said other than the generic advice to buy faster storage devices and
    > faster, more powerful hardware.



    There are ~ 500 lines in the code. If you don't mind reading it I will
    definetely post it.
    I didn't post it for a reason.
     
    pereges, Apr 24, 2008
    #6
  7. pereges

    Bartc Guest

    "pereges" <> wrote in message
    news:...
    > ok so i have written a program in C where I am dealing with huge
    > data(millions and lots of iterations involved) and for some reason the
    > screen tends to freeze and I get no output every time I execute it.
    > However, I have tried to reduce the amount of data and the program
    > runs fine.
    > What could possibly be done to resolve this ?


    Do you expect the execution time to increase in proportion to the amount of
    data?

    What are the timings for N=10 (where N is some measure of the amount of
    data)?. N=100, 1000, 10K, 1M, etc? What do you mean by huge anyway, how much
    data are we talking about?

    At what level of N does it stop working? What did you expect the execution
    time to be? Does the machine make noises like lots of disk activity
    (assuming you are not dealing with disk i/o anyway)? Sometimes when you
    exceed machine memory everything gets a lot slower.

    Can you measure what resources are being used at each point, like memory?

    Your code is only 500 lines. Can you put print statements in to show what's
    happening? Not for every iteration, but maybe only when N>X, some limit
    above which you know it fails. Or after 100ms have passed since the last
    output, etc.

    (You mentioned you are printing to the screen anyway; so maybe you can tell
    from the output, what point in the execution it has reached and can put in
    extra debug output.)

    It sounds like above a certain level of data, some limit or resource is
    being exceeded, causing it to hang, or perhaps entering an endless loop
    (those are a little different, I think..).

    --
    Bartc
     
    Bartc, Apr 24, 2008
    #7
  8. On 24 Apr, 06:11, (Gordon Burditt) wrote:

    > >ok so i have written a program in C where I am dealing with huge
    > >data(millions and lots of iterations involved) and for some reason the
    > >screen tends to freeze and I get no output every time I execute it.
    > >However,  I have tried to reduce the amount of data and the program
    > >runs fine.
    > >What could possibly be done to resolve this ?


    > Are you SURE that the screen freezes, and it's not just taking
    > a long time?  (When in doubt, let it run over a weekend.)


    sounds like it's just very slow


    > You don't give a very good idea of what your program is doing, but
    > some hints that might apply:
    >
    > Your program almost certainly has at least one bug.


    just on the principle that all programs have at least one bug?


    > Make sure that every call to malloc() is checked, and that you
    > report any calls that run out of memory.  Also check if the behavior
    > changes if you change limits on the amount of memory the process
    > can allocate (e.g. 'ulimit').
    >
    > Use any tools (like 'ps') you might have to see how large the program
    > is and whether it's swapping so much little CPU gets used but much
    > swapping is done.
    >
    > If it's a multi-process program, you might be deadlocking on
    > allocation of swap/page space.
    >
    > Make sure that you do not use more memory than you allocated (often
    > called "buffer overflow", although this problem is a bit more general
    > than a buffer overflow).  This can be difficult to find.  If you
    > corrupt the data malloc() uses to keep track of free memory,
    > subsequent calls to malloc() or free() might infinite loop.
    >
    > Add some output statements to the program so you can see how far
    > it gets.  Include something at the start of the program, and, say,
    > after you have read all the input but before you begin processing it.


    maybe even consider a profiler


    --
    Nick Keighley

    I'd rather write programs to write programs than write programs
     
    Nick Keighley, Apr 24, 2008
    #8
  9. pereges

    arnuld Guest

    > On Thu, 24 Apr 2008 00:00:04 +0000, Richard Heathfield wrote:


    > In a similar vein, it was reported a few years ago that a computer
    > program, on being told that 90% of accidents in the home involved either
    > the top stair or the bottom stair and being asked what to do to reduce
    > accidents, suggested removing the top and bottom stairs.
    >
    > C programs regularly have to deal with very large amounts of data, and
    > many of them do so with admirable efficiency. The large amount of data,
    > then, is *not* the cause of the problem. Rather, it is when large
    > amounts of data are being processed that the problem manifests itself.
    > Therefore, reducing the amount of data will not only *not* fix the
    > problem, but will actually hide it, making it *harder* to fix.
    >
    > The proper solution is to find and fix the bug that is causing the
    > problem. The way to do /that/ is to reduce, not the amount of *data*,
    > but the amount of *code* - until the OP has the smallest compilable
    > program that reproduces the problem. It is often the case that, in
    > preparing such a program, the author of the code will find the problem.
    > But if not, at least he or she now has a minimal program that can be
    > presented for analysis by C experts, such as those who regularly haunt
    > the corridors of comp.lang.c. I commend this strategy to the OP.



    OMG, I am sure this is one of the best advices of
    doing Software-Construction.


    --
    http://lispmachine.wordpress.com/
    my email ID is at the above address
     
    arnuld, Apr 24, 2008
    #9
  10. pereges

    arnuld Guest

    > On Wed, 23 Apr 2008 20:16:25 -0700, pereges wrote:


    > There are ~ 500 lines in the code. If you don't mind reading it I will
    > definetely post it.


    > I didn't post it for a reason.



    I know that. As Richard Heathfield said find and post the smallest
    compilable unit.





    --
    http://lispmachine.wordpress.com/
    my email ID is at the above address
     
    arnuld, Apr 24, 2008
    #10
  11. pereges

    pereges Guest

    freeing (using free) the memory allocated(using malloc()) has
    certainly improved the performance of my program and now gives output
    for even larger data. but still there are issues. i will post a
    minimal version of my code later.
     
    pereges, Apr 24, 2008
    #11
  12. pereges

    santosh Guest

    pereges wrote:

    > freeing (using free) the memory allocated(using malloc()) has
    > certainly improved the performance of my program and now gives output
    > for even larger data. but still there are issues. i will post a
    > minimal version of my code later.


    This suggests that the slowdown was due to insufficient free memory and
    the consequent "thrashing" that most OSes suffer under such conditions.
    It may be that you could improve overall efficiency by using mmap
    instead of malloc for your data file. Note that mmap is not part of
    standard C (though it's functionally implemented under most of the
    major mainstream OSes). For help with it please ask in a system
    specific group like comp.unix.programmer.
     
    santosh, Apr 24, 2008
    #12
  13. pereges

    user923005 Guest

    On Apr 24, 7:30 am, santosh <> wrote:
    > pereges wrote:
    > > freeing (using free) the memory allocated(using malloc()) has
    > > certainly improved the performance of my program and now gives output
    > > for even larger data. but still there are issues. i will post a
    > > minimal version of my code later.

    >
    > This suggests that the slowdown was due to insufficient free memory and
    > the consequent "thrashing" that most OSes suffer under such conditions.
    > It may be that you could improve overall efficiency by using mmap
    > instead of malloc for your data file. Note that mmap is not part of
    > standard C (though it's functionally implemented under most of the
    > major mainstream OSes). For help with it please ask in a system
    > specific group like comp.unix.programmer.


    I think it is a mistake to offer advice before clearly understanding
    the problem.

    There may be a triply nested loop that makes the problem O(N^3) in
    which case it is scale of calculation that is the problem and almost
    certainly the solution will be to modify the algorithm.

    Besides, mmap() will not make any real difference if the file is
    already completely loaded into memory. It will only be a convenience
    if we need to page portions of it. If we are just reading a file
    serially, the operating system buffers (assuming buffered I/O) will
    have the same effect as paging through a memory map with less fuss.
    If random access is needed in blocky chunks, then mmap() is ideal, but
    we don't know that yet.

    IMO-YMMV.
     
    user923005, Apr 24, 2008
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jeff Kish
    Replies:
    5
    Views:
    388
    Oliver Wong
    Apr 28, 2006
  2. seeCoolGuy
    Replies:
    1
    Views:
    336
    Andy Dingley
    Aug 3, 2006
  3. Paul Moore
    Replies:
    0
    Views:
    290
    Paul Moore
    Nov 21, 2006
  4. Replies:
    3
    Views:
    514
  5. Nicholas Wieland
    Replies:
    3
    Views:
    99
    Nicholas Wieland
    Dec 30, 2008
Loading...

Share This Page