large and sparse matrices

M

mediratta

Hi,

I want to allocate memory for a large matrix, whose size will be
around 2.5 million x 17000. Three fourth of its rows will have all
zeroes, but it is not known which will be those rows. If I try to
allocate memory for this huge array, then I get a segmentation fault
saying:

Program received signal SIGSEGV, Segmentation fault.
0xb7dd5226 in mallopt () from /lib/tls/i686/cmov/libc.so.6

I have not given any compiler options. I think that the error is
because I am allocating too big a size ?

Can anyone please, suggest me how to store/manage such a matrix in C/C+
+ ? Can mmap be useful ?

I am using linux on i386 with gcc (GCC) 4.1.2 20060928 (prerelease)
(Ubuntu 4.1.1-13ubuntu5)

thanks
anupam
 
J

jacob navia

(e-mail address removed) a écrit :
Hi,

I want to allocate memory for a large matrix, whose size will be
around 2.5 million x 17000. Three fourth of its rows will have all
zeroes, but it is not known which will be those rows. If I try to
allocate memory for this huge array, then I get a segmentation fault
saying:

Program received signal SIGSEGV, Segmentation fault.
0xb7dd5226 in mallopt () from /lib/tls/i686/cmov/libc.so.6

I have not given any compiler options. I think that the error is
because I am allocating too big a size ?

Can anyone please, suggest me how to store/manage such a matrix in C/C+
+ ? Can mmap be useful ?

I am using linux on i386 with gcc (GCC) 4.1.2 20060928 (prerelease)
(Ubuntu 4.1.1-13ubuntu5)

thanks
anupam

You want memory of 2 500 000 * 17 000 * sizeof(double). This is
340 000 000 000 bytes, around 316.6 GB.

Even if you do not use double but just a char for your data you need
more than 42 GB.

There are several alternatives. For instance,
typedef struct matRow {
int index; // 0 to 2 500 000
struct matRow *NextRow; // Pointer to next row
double data[17000];
} MATROW;

Then, you store just the rows with some data in it. When you
want to index the matrix, if the row is not in the list of rows
the dta is zero. Of course this is naive and very inefficient.
This is just the idea. To see more advanced sparse matrix software
google for "sparse matrix implementation" or similar.
 
E

Eric Sosman

Hi,

I want to allocate memory for a large matrix, whose size will be
around 2.5 million x 17000. Three fourth of its rows will have all
zeroes, but it is not known which will be those rows. If I try to
allocate memory for this huge array, then I get a segmentation fault
saying:

Program received signal SIGSEGV, Segmentation fault.
0xb7dd5226 in mallopt () from /lib/tls/i686/cmov/libc.so.6

I have not given any compiler options. I think that the error is
because I am allocating too big a size ?

Quite likely: As Jacob Navia points out, the number of
cells is huge. But there may be other causes; you didn't
show any of your code, so we can only guess what it's doing.
(And things like `mallopt', which is not a C library function,
suggest that it's doing something out of the ordinary.)
Can anyone please, suggest me how to store/manage such a matrix in C/C+
+ ?

The basic idea is to store only the cells that contain
non-zero values and simply infer that the value of any cell
not stored is zero. (More generally, change "zero" to "special.")

Beyond that, the choice of representation should be driven
by the operations you intend to perform on the matrix. If you
just want to let it sit there in its pristine glory, the data
structure doesn't matter much. If you want to bounce around in
a "random access" mode, a hash table keyed on row and column
indices might work. If you want to make organized row-by-row
sweeps, a sorted singly-linked list of rows (as Jacob showed)
might work. If both row and column traversals are important,
consider a doubly-linked structure where each node holds a
value, its row and column indices, and two links pointing down
and to the right. Or dream up something else that suits your
access patterns.
Can mmap be useful ?

Not in comp.lang.c, because it isn't part of C.

<off-topic>

Probably not, or at any rate not in support of a plain two-
dimensional array. The disks where this 316GB (?) file lives
will thrash until their bearings crack into smoking ruin. Let's
see: If you can arrange things on disk very cleverly so there's
essentially no seek time, and if you can sustain a 100MB/s data
rate, reading once through the file will take just under an hour.
Use the general-purpose virtual memory paging mechanism instead
of a highly-optimized purpose-built raw-device custom-filesystem
solution, and it'll probably take a wee bit longer ...

</off-topic>

Where is this colossal load of data coming from, where will
it go, and what do you intend to do with it in the meantime?
That is, instead of "How can I manage a gigondous array," which
is a tactical question, try asking the strategic question "How
do I accomplish ____," where ____ is your ultimate purpose. Maybe
someone will have a better idea.
 
K

Kenneth Brody

Hi,

I want to allocate memory for a large matrix, whose size will be
around 2.5 million x 17000. Three fourth of its rows will have all
zeroes, but it is not known which will be those rows. If I try to
allocate memory for this huge array, then I get a segmentation fault
saying:

Program received signal SIGSEGV, Segmentation fault.
0xb7dd5226 in mallopt () from /lib/tls/i686/cmov/libc.so.6

I have not given any compiler options. I think that the error is
because I am allocating too big a size ?
[...]

Rather than attempting to allocate one huge array, allocate an
array of pointers to each row. (Are there 2.5 million rows, or
17,000 rows?) Initialize this array with NULL pointers. Then
allocate each row only as they're used.

However, even then, how much memory is needed for one fourth of
the rows? You are talking about 42.5 billion entries, so one
fourth of them occupied is still over 10 billion entries. Given
that your system (i386 Linux) has only 32 bits of address, that
means 4 billion addresses. Obviously, you cannot address 10
billion entries in 32 bits.

It sounds like you'll need to do some bookkeeping on your end to
keep rows on disk (in separate files -- again for the 32-bit
limit on your system) and swap them in and out as needed.

And you better hope that you have enough disk space for all of
this data as well.

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:[email protected]>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,577
Members
45,054
Latest member
LucyCarper

Latest Threads

Top