dealing with huge data

P

pereges

ok so i have written a program in C where I am dealing with huge
data(millions and lots of iterations involved) and for some reason the
screen tends to freeze and I get no output every time I execute it.
However, I have tried to reduce the amount of data and the program
runs fine.
What could possibly be done to resolve this ?
 
P

pereges

I forgot to mention this happened while I was trying to print data.

I have seen it can't work for extremely huge data.
 
K

Kenneth Brody

pereges said:
ok so i have written a program in C where I am dealing with huge
data(millions and lots of iterations involved) and for some reason the
screen tends to freeze and I get no output every time I execute it.
However, I have tried to reduce the amount of data and the program
runs fine.
What could possibly be done to resolve this ?

There's a bug on line 42.

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:[email protected]>
 
S

santosh

pereges wrote:

I forgot to mention this happened while I was trying to print data.

Print where? To a disk file? To a flash drive? To a screen? Some other
device? To memory? What's the code for the print function? What are
the data structures involved? Did you try compiler optimisations? Did
you try implementation specific I/O routines (which are sometimes
faster than standard C ones)? Did you profile the program?
I have seen it can't work for extremely huge data.

Can't work or works too slowly for your taste?

Unless you show us your current code and where exactly it's performance
is not meeting your expectations, there's absolutely nothing that can
be said other than the generic advice to buy faster storage devices and
faster, more powerful hardware.
 
U

user923005

CBFalconer said:



In a similar vein, it was reported a few years ago that a computer program,
on being told that 90% of accidents in the home involved either the top
stair or the bottom stair and being asked what to do to reduce accidents,
suggested removing the top and bottom stairs.

C programs regularly have to deal with very large amounts of data, and many
of them do so with admirable efficiency. The large amount of data, then,
is *not* the cause of the problem. Rather, it is when large amounts of
data are being processed that the problem manifests itself. Therefore,
reducing the amount of data will not only *not* fix the problem, but will
actually hide it, making it *harder* to fix.

The proper solution is to find and fix the bug that is causing the problem..
The way to do /that/ is to reduce, not the amount of *data*, but the
amount of *code* - until the OP has the smallest compilable program that
reproduces the problem. It is often the case that, in preparing such a
program, the author of the code will find the problem. But if not, at
least he or she now has a minimal program that can be presented for
analysis by C experts, such as those who regularly haunt the corridors of
comp.lang.c. I commend this strategy to the OP.

I don't think we can give good advice until the OP actually states
what his exact problem is.
This:
Does not really tell us anything.

Millions of records? In what format? What operations are performed
against the data? What is the actual underlying problem that is being
solved?

Probably, there is a good, inexpensive and compact solution and likely
there are prebuilt tools that will already accomplish the job (or get
most of the way there).

"Big data" that "seems to freeze" doesn't mean anything.
 
P

pereges

pereges wrote:

I forgot to mention this happened while I was trying to print data.

Print where? To a disk file? To a flash drive? To a screen? Some other
device? To memory? What's the code for the print function? What are
the data structures involved? Did you try compiler optimisations? Did
you try implementation specific I/O routines (which are sometimes
faster than standard C ones)? Did you profile the program?
I have seen it can't work for extremely huge data.

Can't work or works too slowly for your taste?

Unless you show us your current code and where exactly it's performance
is not meeting your expectations, there's absolutely nothing that can
be said other than the generic advice to buy faster storage devices and
faster, more powerful hardware.



There are ~ 500 lines in the code. If you don't mind reading it I will
definetely post it.
I didn't post it for a reason.
 
B

Bartc

pereges said:
ok so i have written a program in C where I am dealing with huge
data(millions and lots of iterations involved) and for some reason the
screen tends to freeze and I get no output every time I execute it.
However, I have tried to reduce the amount of data and the program
runs fine.
What could possibly be done to resolve this ?

Do you expect the execution time to increase in proportion to the amount of
data?

What are the timings for N=10 (where N is some measure of the amount of
data)?. N=100, 1000, 10K, 1M, etc? What do you mean by huge anyway, how much
data are we talking about?

At what level of N does it stop working? What did you expect the execution
time to be? Does the machine make noises like lots of disk activity
(assuming you are not dealing with disk i/o anyway)? Sometimes when you
exceed machine memory everything gets a lot slower.

Can you measure what resources are being used at each point, like memory?

Your code is only 500 lines. Can you put print statements in to show what's
happening? Not for every iteration, but maybe only when N>X, some limit
above which you know it fails. Or after 100ms have passed since the last
output, etc.

(You mentioned you are printing to the screen anyway; so maybe you can tell
from the output, what point in the execution it has reached and can put in
extra debug output.)

It sounds like above a certain level of data, some limit or resource is
being exceeded, causing it to hang, or perhaps entering an endless loop
(those are a little different, I think..).
 
N

Nick Keighley

Are you SURE that the screen freezes, and it's not just taking
a long time?  (When in doubt, let it run over a weekend.)

sounds like it's just very slow

You don't give a very good idea of what your program is doing, but
some hints that might apply:

Your program almost certainly has at least one bug.

just on the principle that all programs have at least one bug?

Make sure that every call to malloc() is checked, and that you
report any calls that run out of memory.  Also check if the behavior
changes if you change limits on the amount of memory the process
can allocate (e.g. 'ulimit').

Use any tools (like 'ps') you might have to see how large the program
is and whether it's swapping so much little CPU gets used but much
swapping is done.

If it's a multi-process program, you might be deadlocking on
allocation of swap/page space.

Make sure that you do not use more memory than you allocated (often
called "buffer overflow", although this problem is a bit more general
than a buffer overflow).  This can be difficult to find.  If you
corrupt the data malloc() uses to keep track of free memory,
subsequent calls to malloc() or free() might infinite loop.

Add some output statements to the program so you can see how far
it gets.  Include something at the start of the program, and, say,
after you have read all the input but before you begin processing it.

maybe even consider a profiler
 
A

arnuld

In a similar vein, it was reported a few years ago that a computer
program, on being told that 90% of accidents in the home involved either
the top stair or the bottom stair and being asked what to do to reduce
accidents, suggested removing the top and bottom stairs.

C programs regularly have to deal with very large amounts of data, and
many of them do so with admirable efficiency. The large amount of data,
then, is *not* the cause of the problem. Rather, it is when large
amounts of data are being processed that the problem manifests itself.
Therefore, reducing the amount of data will not only *not* fix the
problem, but will actually hide it, making it *harder* to fix.

The proper solution is to find and fix the bug that is causing the
problem. The way to do /that/ is to reduce, not the amount of *data*,
but the amount of *code* - until the OP has the smallest compilable
program that reproduces the problem. It is often the case that, in
preparing such a program, the author of the code will find the problem.
But if not, at least he or she now has a minimal program that can be
presented for analysis by C experts, such as those who regularly haunt
the corridors of comp.lang.c. I commend this strategy to the OP.


OMG, I am sure this is one of the best advices of
doing Software-Construction.
 
P

pereges

freeing (using free) the memory allocated(using malloc()) has
certainly improved the performance of my program and now gives output
for even larger data. but still there are issues. i will post a
minimal version of my code later.
 
S

santosh

pereges said:
freeing (using free) the memory allocated(using malloc()) has
certainly improved the performance of my program and now gives output
for even larger data. but still there are issues. i will post a
minimal version of my code later.

This suggests that the slowdown was due to insufficient free memory and
the consequent "thrashing" that most OSes suffer under such conditions.
It may be that you could improve overall efficiency by using mmap
instead of malloc for your data file. Note that mmap is not part of
standard C (though it's functionally implemented under most of the
major mainstream OSes). For help with it please ask in a system
specific group like comp.unix.programmer.
 
U

user923005

This suggests that the slowdown was due to insufficient free memory and
the consequent "thrashing" that most OSes suffer under such conditions.
It may be that you could improve overall efficiency by using mmap
instead of malloc for your data file. Note that mmap is not part of
standard C (though it's functionally implemented under most of the
major mainstream OSes). For help with it please ask in a system
specific group like comp.unix.programmer.

I think it is a mistake to offer advice before clearly understanding
the problem.

There may be a triply nested loop that makes the problem O(N^3) in
which case it is scale of calculation that is the problem and almost
certainly the solution will be to modify the algorithm.

Besides, mmap() will not make any real difference if the file is
already completely loaded into memory. It will only be a convenience
if we need to page portions of it. If we are just reading a file
serially, the operating system buffers (assuming buffered I/O) will
have the same effect as paging through a memory map with less fuss.
If random access is needed in blocky chunks, then mmap() is ideal, but
we don't know that yet.

IMO-YMMV.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,007
Latest member
obedient dusk

Latest Threads

Top