100k X 100k data processing summary

A

a

By previous replies, it seems that the following method somehow solves the
problem up to 1000 * 1000 2D data, but when I try 10k * 10k, the
segmentation fault problem appears again.

Richard Tobin told me there is a system limit that can be changed. But I
don't know which file is to be changed.

I have modified again and again and hope to find out a solution that can
handle 100k * 100k data.

float** array_to_matrix(float* m, int rows, int cols) {
int i,j;

float** r;

r = (float**)calloc(rows,sizeof(float*));

for(i=0;i<rows;i++)
{
r = (float*)calloc(cols,sizeof(float));

for(j=0;j<cols;j++)
r[j] = m[i*cols+j];
}
return r;

}
 
I

Ian Collins

a said:
By previous replies, it seems that the following method somehow solves the
problem up to 1000 * 1000 2D data, but when I try 10k * 10k, the
segmentation fault problem appears again.
System memory is finite, if you attempt to allocate more than there is
available, you will fail.
Richard Tobin told me there is a system limit that can be changed. But I
don't know which file is to be changed.

I have modified again and again and hope to find out a solution that can
handle 100k * 100k data.
Which is 10GB*sizeof(float), do you have that much (virtual) memory to
play with?

It sound like you have more of an algorithm problem than a C one.
float** array_to_matrix(float* m, int rows, int cols) {
int i,j;

float** r;

r = (float**)calloc(rows,sizeof(float*));
Drop the cast.
 
I

Ian Collins

Ian said:
System memory is finite, if you attempt to allocate more than there is
available, you will fail.

Which is 10GB*sizeof(float), do you have that much (virtual) memory to
play with?

It sound like you have more of an algorithm problem than a C one.

Drop the cast.
And always check the return of [mc]alloc isn't null.
 
R

Richard Tobin

Richard Tobin told me there is a system limit that can be changed. But I
don't know which file is to be changed.

As I said, ask your system administrator. Or read the manual.
We can't tell you, because you haven't even told us what system
you're using.
I have modified again and again and hope to find out a solution that can
handle 100k * 100k data.

100k * 100k is 10g, and if floats are 4 bytes that's 40 gigabytes.
You really will need a supercomputer for that. Perhaps you should
reconsider your algorithm, or wait several years.

-- Richard
 
I

Ian Collins

100k * 100k is 10g, and if floats are 4 bytes that's 40 gigabytes.
You really will need a supercomputer for that. Perhaps you should
reconsider your algorithm, or wait several years.
<OT> Workstation boards with support for 64GB of RAM are available from
several vendors! All it takes is a spare $10K for the RAM... </OT>
 
J

Johannes Bauer

Richard said:
100k * 100k is 10g, and if floats are 4 bytes that's 40 gigabytes.
You really will need a supercomputer for that. Perhaps you should
reconsider your algorithm, or wait several years.

It might work if enough virtual memory is available - serious thrashing
implied. Suitable for *some* problems, however, if access pattern to
this matrix are not arbitrary (which they aren't for most algorithms).

Usually data in such huge matrices is sparse anyways - so, I fully
second your statement the OP should reconsider his algorithm. If it
fails in the early stage of memory allocation he/she probably hasn't
thourhgt about it at all.

Greetings,
Johannes
 
G

Gordon Burditt

100k * 100k is 10g, and if floats are 4 bytes that's 40 gigabytes.
It might work if enough virtual memory is available - serious thrashing
implied.

On a 32-bit machine (say, Pentium with PAE36) you could have 64G
of physical memory, and a lot more physical swap/page space, but
with a 32-bit address space for an individual process, you're limited
to 4G (and sometimes much less). So you need a machine with 64-bit
addressing and an OS that supports it for individual processes.
Simply adding lots of memory and swap/page space isn't enough.
 
A

a

Thanks Richard. It's red hat enterprise, or Fedora related. The biggest
problem I'm having is that I don't know the keyword because compilation
memory program memory alike doesn't give me good results in google search.
 
A

a

Modifying in /etc/security/limits.conf seems doesn't help solve the problem

* soft data 100000
* soft stack 100000
 
A

a

a said:
Modifying in /etc/security/limits.conf seems doesn't help solve the
problem

* soft data 100000
* soft stack 100000

Furthermore, running as root should be unlimited....
 
I

Ian Collins

a said:
Furthermore, running as root should be unlimited....
That is seldom a good idea.

As everyone who has responded points out, you should reconsider your
algorithm if you are attempting to allocate more memory than your system
can provide.
 
J

Johannes Bauer

Gordon said:
So you need a machine with 64-bit
addressing and an OS that supports it for individual processes.
Simply adding lots of memory and swap/page space isn't enough.

Absolutely a prerequisite.

joe joe [~]: uname -a
Linux joeserver 2.6.22.2 #2 PREEMPT Thu Sep 27 14:06:16 CEST 2007 x86_64
AMD Athlon(tm) 64 Processor 3700+ AuthenticAMD GNU/Linux

Took that one for granted :)

Greetings,
Johannes
 
J

Johannes Bauer

a said:
Furthermore, running as root should be unlimited....

"My car ran out of gas so it won't drive. I've already tried to turn up
the volume of the radio, but it didn't do the trick. Has anybody a clue?"

Did you actually understand what the real problem is? Of *course*
fiddling with the system limits won't work, becuase with that insane
amount of memory your application requires you hit a *hard* limit.
You're modifiying the *soft* limits, however.

Greetings,
Johannes
 
A

a

The problem is I don't know where the algorithm gets wrong. And more
unfortunate is, i have to use other algorithms to put into my program and
I've changed a lot from 2D array to 1D array to pointer. There is no
technical support, no sys admin.

My strategy right now is to work with 1k * 1k first. Yet, I find when there
is only one such 1D array, it's fine. When more is declared, seg. fault
appears again. I just don't know why allocating 1MB memory to an array will
create such a problem, when we are talking about >500M memory today.

Anyway, thanks for your advice.
 
C

cr88192

a said:
The problem is I don't know where the algorithm gets wrong. And more
unfortunate is, i have to use other algorithms to put into my program and
I've changed a lot from 2D array to 1D array to pointer. There is no
technical support, no sys admin.

My strategy right now is to work with 1k * 1k first. Yet, I find when
there is only one such 1D array, it's fine. When more is declared, seg.
fault appears again. I just don't know why allocating 1MB memory to an
array will create such a problem, when we are talking about >500M memory
today.

Anyway, thanks for your advice.

1000*1000 is about 4MB of floats.
10000*10000 is about 400MB of floats.

consider:
10000*10000, 100000000, *4 = 400000000 (approx 400MB)
100000*100000, 10000000000, *4 = 40000000000 (approx 40GB)

now, you can only allocate a few such arrays, before you are out of address
space.
on a 32 bit arch, this is 4GB (of which 2 or 3GB is usually available for
the app).


now, here are a few possible solutions:
go over to a 64 bit linux (an x86-64 install, not x86), in which case you
have a far larger address space.

or, better yet, don't allocate such huge things in memory.
what do you need that is so huge anyways?...
are you sure it is not something better done with a much more compact
representation, such as a sparse array, or a key/value mapping system?...

failing that, have you considered using, files?...

even then, one is still far better off finding a more compact representation
if possible...

for example, for many uses, it is a very effective process, to RLE-compress
the arrays, and design their algos as such to work on compressed forms
(slightly more complicated, but not impossible, if the data is mutable...).
 
C

CBFalconer

a wrote: *** and top-posted. Fixed ***
The problem is I don't know where the algorithm gets wrong. And
more unfortunate is, i have to use other algorithms to put into
my program and I've changed a lot from 2D array to 1D array to
pointer. There is no technical support, no sys admin.

There is NO algorithm involved. You are asking for a memory block
of (10e5 * 10e5 * sizeof item) bytes. It doesn't exist. If it did
exist your OS couldn't arrange to point within it.

Please do not top-post. Your answer belongs after (or intermixed
with) the quoted material to which you reply, after snipping all
irrelevant material. I fixed this one. See the following links:

--
<http://www.catb.org/~esr/faqs/smart-questions.html>
<http://www.caliburn.nl/topposting.html>
<http://www.netmeister.org/news/learn2quote.html>
<http://cfaj.freeshell.org/google/> (taming google)
<http://members.fortunecity.com/nnqweb/> (newusers)
 
B

Barry Schwarz

By previous replies, it seems that the following method somehow solves the
problem up to 1000 * 1000 2D data, but when I try 10k * 10k, the
segmentation fault problem appears again.

Richard Tobin told me there is a system limit that can be changed. But I
don't know which file is to be changed.

I have modified again and again and hope to find out a solution that can
handle 100k * 100k data.

float** array_to_matrix(float* m, int rows, int cols) {
int i,j;

float** r;

r = (float**)calloc(rows,sizeof(float*));

The cast is worse than useless. It can actually have a negative
impact on your development process

You are aware that calloc sets the allocated area to all bits zero and
this need not be suitable for pointers.
for(i=0;i<rows;i++)
{
r = (float*)calloc(cols,sizeof(float));


All bits zero need not be suitable for float either.
for(j=0;j<cols;j++)
r[j] = m[i*cols+j];


Since whatever m points to takes up the same amount of space as
whatever r and the r point to, how did you get m to work?
}
return r;

}


Remove del for email
 
R

Richard

Barry Schwarz said:
The cast is worse than useless. It can actually have a negative
impact on your development process

How? Certainly during debugging it makes perfect sense to zero a new
block if for nothing else than examining the memory. In the real world
that is.
You are aware that calloc sets the allocated area to all bits zero and
this need not be suitable for pointers.

I need this explaining once again.

ptr = (float*) *fltPointer++;

If its all bit 0s then surely assignment of 0 will cast to the "real
null for that pointer type".

Or would you actually advocate writing your own loop applying a "null"
for float * to the memory block?
 
S

santosh

Richard said:
How? Certainly during debugging it makes perfect sense to zero a new
block if for nothing else than examining the memory. In the real world
that is.


I need this explaining once again.

ptr = (float*) *fltPointer++;

If its all bit 0s then surely assignment of 0 will cast to the "real
null for that pointer type".

A runtime value of all bits zero need not necessarily be translated to a
null pointer value when written to pointers. A source code literal zero
must however be interpreted in a pointer context as a null pointer
value, implicitly converted to the appropriate type.
Or would you actually advocate writing your own loop applying a "null"
for float * to the memory block?

Yes. This is the only way.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top