100k X 100k data processing summary

Discussion in 'C Programming' started by a, Nov 24, 2007.

  1. a

    a Guest

    By previous replies, it seems that the following method somehow solves the
    problem up to 1000 * 1000 2D data, but when I try 10k * 10k, the
    segmentation fault problem appears again.

    Richard Tobin told me there is a system limit that can be changed. But I
    don't know which file is to be changed.

    I have modified again and again and hope to find out a solution that can
    handle 100k * 100k data.

    float** array_to_matrix(float* m, int rows, int cols) {
    int i,j;

    float** r;

    r = (float**)calloc(rows,sizeof(float*));

    for(i=0;i<rows;i++)
    {
    r = (float*)calloc(cols,sizeof(float));

    for(j=0;j<cols;j++)
    r[j] = m[i*cols+j];
    }
    return r;

    }
    a, Nov 24, 2007
    #1
    1. Advertising

  2. a

    Ian Collins Guest

    a wrote:
    > By previous replies, it seems that the following method somehow solves the
    > problem up to 1000 * 1000 2D data, but when I try 10k * 10k, the
    > segmentation fault problem appears again.
    >

    System memory is finite, if you attempt to allocate more than there is
    available, you will fail.

    > Richard Tobin told me there is a system limit that can be changed. But I
    > don't know which file is to be changed.
    >
    > I have modified again and again and hope to find out a solution that can
    > handle 100k * 100k data.
    >

    Which is 10GB*sizeof(float), do you have that much (virtual) memory to
    play with?

    It sound like you have more of an algorithm problem than a C one.

    > float** array_to_matrix(float* m, int rows, int cols) {
    > int i,j;
    >
    > float** r;
    >
    > r = (float**)calloc(rows,sizeof(float*));
    >

    Drop the cast.

    --
    Ian Collins.
    Ian Collins, Nov 24, 2007
    #2
    1. Advertising

  3. a

    Ian Collins Guest

    Ian Collins wrote:
    > a wrote:
    >> By previous replies, it seems that the following method somehow solves the
    >> problem up to 1000 * 1000 2D data, but when I try 10k * 10k, the
    >> segmentation fault problem appears again.
    >>

    > System memory is finite, if you attempt to allocate more than there is
    > available, you will fail.
    >
    >> Richard Tobin told me there is a system limit that can be changed. But I
    >> don't know which file is to be changed.
    >>
    >> I have modified again and again and hope to find out a solution that can
    >> handle 100k * 100k data.
    >>

    > Which is 10GB*sizeof(float), do you have that much (virtual) memory to
    > play with?
    >
    > It sound like you have more of an algorithm problem than a C one.
    >
    >> float** array_to_matrix(float* m, int rows, int cols) {
    >> int i,j;
    >>
    >> float** r;
    >>
    >> r = (float**)calloc(rows,sizeof(float*));
    >>

    > Drop the cast.
    >

    And always check the return of [mc]alloc isn't null.

    --
    Ian Collins.
    Ian Collins, Nov 24, 2007
    #3
  4. In article <fia513$rvc$>, a <> wrote:

    >Richard Tobin told me there is a system limit that can be changed. But I
    >don't know which file is to be changed.


    As I said, ask your system administrator. Or read the manual.
    We can't tell you, because you haven't even told us what system
    you're using.

    >I have modified again and again and hope to find out a solution that can
    >handle 100k * 100k data.


    100k * 100k is 10g, and if floats are 4 bytes that's 40 gigabytes.
    You really will need a supercomputer for that. Perhaps you should
    reconsider your algorithm, or wait several years.

    -- Richard
    --
    "Consideration shall be given to the need for as many as 32 characters
    in some alphabets" - X3.4, 1963.
    Richard Tobin, Nov 24, 2007
    #4
  5. a

    Ian Collins Guest


    >
    > 100k * 100k is 10g, and if floats are 4 bytes that's 40 gigabytes.
    > You really will need a supercomputer for that. Perhaps you should
    > reconsider your algorithm, or wait several years.
    >

    <OT> Workstation boards with support for 64GB of RAM are available from
    several vendors! All it takes is a spare $10K for the RAM... </OT>

    --
    Ian Collins.
    Ian Collins, Nov 24, 2007
    #5
  6. Richard Tobin schrieb:

    > 100k * 100k is 10g, and if floats are 4 bytes that's 40 gigabytes.
    > You really will need a supercomputer for that. Perhaps you should
    > reconsider your algorithm, or wait several years.


    It might work if enough virtual memory is available - serious thrashing
    implied. Suitable for *some* problems, however, if access pattern to
    this matrix are not arbitrary (which they aren't for most algorithms).

    Usually data in such huge matrices is sparse anyways - so, I fully
    second your statement the OP should reconsider his algorithm. If it
    fails in the early stage of memory allocation he/she probably hasn't
    thourhgt about it at all.

    Greetings,
    Johannes

    --
    "Viele der Theorien der Mathematiker sind falsch und klar
    Gotteslästerlich. Ich vermute, dass diese falschen Theorien genau
    deshalb so geliebt werden." -- Prophet und Visionär Hans Joss aka
    HJP in de.sci.mathematik <4740ad67$0$3811$>
    Johannes Bauer, Nov 24, 2007
    #6
  7. >> 100k * 100k is 10g, and if floats are 4 bytes that's 40 gigabytes.
    >> You really will need a supercomputer for that. Perhaps you should
    >> reconsider your algorithm, or wait several years.

    >
    >It might work if enough virtual memory is available - serious thrashing
    >implied.


    On a 32-bit machine (say, Pentium with PAE36) you could have 64G
    of physical memory, and a lot more physical swap/page space, but
    with a 32-bit address space for an individual process, you're limited
    to 4G (and sometimes much less). So you need a machine with 64-bit
    addressing and an OS that supports it for individual processes.
    Simply adding lots of memory and swap/page space isn't enough.

    >Suitable for *some* problems, however, if access pattern to
    >this matrix are not arbitrary (which they aren't for most algorithms).
    >
    >Usually data in such huge matrices is sparse anyways - so, I fully
    >second your statement the OP should reconsider his algorithm. If it
    >fails in the early stage of memory allocation he/she probably hasn't
    >thourhgt about it at all.
    Gordon Burditt, Nov 25, 2007
    #7
  8. a

    a Guest

    Thanks Richard. It's red hat enterprise, or Fedora related. The biggest
    problem I'm having is that I don't know the keyword because compilation
    memory program memory alike doesn't give me good results in google search.


    "Richard Tobin" <> wrote in message
    news:fiaav3$973$...
    > In article <fia513$rvc$>, a <> wrote:
    >
    >>Richard Tobin told me there is a system limit that can be changed. But I
    >>don't know which file is to be changed.

    >
    > As I said, ask your system administrator. Or read the manual.
    > We can't tell you, because you haven't even told us what system
    > you're using.
    >
    > -- Richard
    > --
    > "Consideration shall be given to the need for as many as 32 characters
    > in some alphabets" - X3.4, 1963.
    a, Nov 25, 2007
    #8
  9. a

    a Guest

    Modifying in /etc/security/limits.conf seems doesn't help solve the problem

    * soft data 100000
    * soft stack 100000
    a, Nov 25, 2007
    #9
  10. a

    a Guest

    "a" <> wrote in message news:fiaj1m$ue6$...
    > Modifying in /etc/security/limits.conf seems doesn't help solve the
    > problem
    >
    > * soft data 100000
    > * soft stack 100000
    >


    Furthermore, running as root should be unlimited....
    a, Nov 25, 2007
    #10
  11. a

    Ian Collins Guest

    a wrote:
    > "a" <> wrote in message news:fiaj1m$ue6$...
    >> Modifying in /etc/security/limits.conf seems doesn't help solve the
    >> problem
    >>
    >> * soft data 100000
    >> * soft stack 100000
    >>

    >
    > Furthermore, running as root should be unlimited....
    >

    That is seldom a good idea.

    As everyone who has responded points out, you should reconsider your
    algorithm if you are attempting to allocate more memory than your system
    can provide.

    --
    Ian Collins.
    Ian Collins, Nov 25, 2007
    #11
  12. Gordon Burditt schrieb:

    > So you need a machine with 64-bit
    > addressing and an OS that supports it for individual processes.
    > Simply adding lots of memory and swap/page space isn't enough.


    Absolutely a prerequisite.

    joe joe [~]: uname -a
    Linux joeserver 2.6.22.2 #2 PREEMPT Thu Sep 27 14:06:16 CEST 2007 x86_64
    AMD Athlon(tm) 64 Processor 3700+ AuthenticAMD GNU/Linux

    Took that one for granted :)

    Greetings,
    Johannes

    --
    "Viele der Theorien der Mathematiker sind falsch und klar
    Gotteslästerlich. Ich vermute, dass diese falschen Theorien genau
    deshalb so geliebt werden." -- Prophet und Visionär Hans Joss aka
    HJP in de.sci.mathematik <4740ad67$0$3811$>
    Johannes Bauer, Nov 25, 2007
    #12
  13. a schrieb:
    > "a" <> wrote in message news:fiaj1m$ue6$...
    >> Modifying in /etc/security/limits.conf seems doesn't help solve the
    >> problem
    >>
    >> * soft data 100000
    >> * soft stack 100000
    >>

    > Furthermore, running as root should be unlimited....


    "My car ran out of gas so it won't drive. I've already tried to turn up
    the volume of the radio, but it didn't do the trick. Has anybody a clue?"

    Did you actually understand what the real problem is? Of *course*
    fiddling with the system limits won't work, becuase with that insane
    amount of memory your application requires you hit a *hard* limit.
    You're modifiying the *soft* limits, however.

    Greetings,
    Johannes

    --
    "Viele der Theorien der Mathematiker sind falsch und klar
    Gotteslästerlich. Ich vermute, dass diese falschen Theorien genau
    deshalb so geliebt werden." -- Prophet und Visionär Hans Joss aka
    HJP in de.sci.mathematik <4740ad67$0$3811$>
    Johannes Bauer, Nov 25, 2007
    #13
  14. a

    a Guest

    The problem is I don't know where the algorithm gets wrong. And more
    unfortunate is, i have to use other algorithms to put into my program and
    I've changed a lot from 2D array to 1D array to pointer. There is no
    technical support, no sys admin.

    My strategy right now is to work with 1k * 1k first. Yet, I find when there
    is only one such 1D array, it's fine. When more is declared, seg. fault
    appears again. I just don't know why allocating 1MB memory to an array will
    create such a problem, when we are talking about >500M memory today.

    Anyway, thanks for your advice.



    "Johannes Bauer" <> wrote in message
    news:...
    >a schrieb:
    >> "a" <> wrote in message
    >> news:fiaj1m$ue6$...
    >>> Modifying in /etc/security/limits.conf seems doesn't help solve the
    >>> problem
    >>>
    >>> * soft data 100000
    >>> * soft stack 100000
    >>>

    >> Furthermore, running as root should be unlimited....

    >
    > "My car ran out of gas so it won't drive. I've already tried to turn up
    > the volume of the radio, but it didn't do the trick. Has anybody a clue?"
    >
    > Did you actually understand what the real problem is? Of *course*
    > fiddling with the system limits won't work, becuase with that insane
    > amount of memory your application requires you hit a *hard* limit.
    > You're modifiying the *soft* limits, however.
    >
    > Greetings,
    > Johannes
    >
    > --
    > "Viele der Theorien der Mathematiker sind falsch und klar
    > Gottesl?sterlich. Ich vermute, dass diese falschen Theorien genau
    > deshalb so geliebt werden." -- Prophet und Vision?r Hans Joss aka
    > HJP in de.sci.mathematik <4740ad67$0$3811$>
    a, Nov 25, 2007
    #14
  15. a

    cr88192 Guest

    "a" <> wrote in message news:fialru$uv2$...
    > The problem is I don't know where the algorithm gets wrong. And more
    > unfortunate is, i have to use other algorithms to put into my program and
    > I've changed a lot from 2D array to 1D array to pointer. There is no
    > technical support, no sys admin.
    >
    > My strategy right now is to work with 1k * 1k first. Yet, I find when
    > there is only one such 1D array, it's fine. When more is declared, seg.
    > fault appears again. I just don't know why allocating 1MB memory to an
    > array will create such a problem, when we are talking about >500M memory
    > today.
    >
    > Anyway, thanks for your advice.
    >
    >


    1000*1000 is about 4MB of floats.
    10000*10000 is about 400MB of floats.

    consider:
    10000*10000, 100000000, *4 = 400000000 (approx 400MB)
    100000*100000, 10000000000, *4 = 40000000000 (approx 40GB)

    now, you can only allocate a few such arrays, before you are out of address
    space.
    on a 32 bit arch, this is 4GB (of which 2 or 3GB is usually available for
    the app).


    now, here are a few possible solutions:
    go over to a 64 bit linux (an x86-64 install, not x86), in which case you
    have a far larger address space.

    or, better yet, don't allocate such huge things in memory.
    what do you need that is so huge anyways?...
    are you sure it is not something better done with a much more compact
    representation, such as a sparse array, or a key/value mapping system?...

    failing that, have you considered using, files?...

    even then, one is still far better off finding a more compact representation
    if possible...

    for example, for many uses, it is a very effective process, to RLE-compress
    the arrays, and design their algos as such to work on compressed forms
    (slightly more complicated, but not impossible, if the data is mutable...).
    cr88192, Nov 25, 2007
    #15
  16. a

    CBFalconer Guest

    a wrote: *** and top-posted. Fixed ***
    > "Johannes Bauer" <> wrote in message
    >> a schrieb:
    >>> "a" <> wrote:
    >>>
    >>>> Modifying in /etc/security/limits.conf seems doesn't help solve
    >>>> the problem
    >>>>
    >>>> * soft data 100000
    >>>> * soft stack 100000
    >>>
    >>> Furthermore, running as root should be unlimited....

    >>
    >> "My car ran out of gas so it won't drive. I've already tried to
    >> turn up the volume of the radio, but it didn't do the trick. Has
    >> anybody a clue?"
    >>
    >> Did you actually understand what the real problem is? Of *course*
    >> fiddling with the system limits won't work, becuase with that
    >> insane amount of memory your application requires you hit a
    >> *hard* limit. You're modifiying the *soft* limits, however.

    >
    > The problem is I don't know where the algorithm gets wrong. And
    > more unfortunate is, i have to use other algorithms to put into
    > my program and I've changed a lot from 2D array to 1D array to
    > pointer. There is no technical support, no sys admin.


    There is NO algorithm involved. You are asking for a memory block
    of (10e5 * 10e5 * sizeof item) bytes. It doesn't exist. If it did
    exist your OS couldn't arrange to point within it.

    Please do not top-post. Your answer belongs after (or intermixed
    with) the quoted material to which you reply, after snipping all
    irrelevant material. I fixed this one. See the following links:

    --
    <http://www.catb.org/~esr/faqs/smart-questions.html>
    <http://www.caliburn.nl/topposting.html>
    <http://www.netmeister.org/news/learn2quote.html>
    <http://cfaj.freeshell.org/google/> (taming google)
    <http://members.fortunecity.com/nnqweb/> (newusers)



    --
    Posted via a free Usenet account from http://www.teranews.com
    CBFalconer, Nov 25, 2007
    #16
  17. a

    Default User Guest

    Re: 100k X 100k data processing summary -TPA

    a wrote:

    > The problem is I don't know



    Please don't top-post. Your replies belong following or interspersed
    with properly trimmed quotes. See the majority of other posts in the
    newsgroup, or:
    <http://www.caliburn.nl/topposting.html>
    Default User, Nov 25, 2007
    #17
  18. On Sun, 25 Nov 2007 05:28:00 +0800, "a" <> wrote:

    >By previous replies, it seems that the following method somehow solves the
    >problem up to 1000 * 1000 2D data, but when I try 10k * 10k, the
    >segmentation fault problem appears again.
    >
    >Richard Tobin told me there is a system limit that can be changed. But I
    >don't know which file is to be changed.
    >
    >I have modified again and again and hope to find out a solution that can
    >handle 100k * 100k data.
    >
    >float** array_to_matrix(float* m, int rows, int cols) {
    > int i,j;
    >
    > float** r;
    >
    > r = (float**)calloc(rows,sizeof(float*));


    The cast is worse than useless. It can actually have a negative
    impact on your development process

    You are aware that calloc sets the allocated area to all bits zero and
    this need not be suitable for pointers.

    >
    > for(i=0;i<rows;i++)
    > {
    > r = (float*)calloc(cols,sizeof(float));


    All bits zero need not be suitable for float either.

    >
    > for(j=0;j<cols;j++)
    > r[j] = m[i*cols+j];


    Since whatever m points to takes up the same amount of space as
    whatever r and the r point to, how did you get m to work?

    > }
    > return r;
    >
    >}
    >



    Remove del for email
    Barry Schwarz, Nov 25, 2007
    #18
  19. a

    Richard Guest

    Barry Schwarz <> writes:

    > On Sun, 25 Nov 2007 05:28:00 +0800, "a" <> wrote:
    >
    >>By previous replies, it seems that the following method somehow solves the
    >>problem up to 1000 * 1000 2D data, but when I try 10k * 10k, the
    >>segmentation fault problem appears again.
    >>
    >>Richard Tobin told me there is a system limit that can be changed. But I
    >>don't know which file is to be changed.
    >>
    >>I have modified again and again and hope to find out a solution that can
    >>handle 100k * 100k data.
    >>
    >>float** array_to_matrix(float* m, int rows, int cols) {
    >> int i,j;
    >>
    >> float** r;
    >>
    >> r = (float**)calloc(rows,sizeof(float*));

    >
    > The cast is worse than useless. It can actually have a negative
    > impact on your development process


    How? Certainly during debugging it makes perfect sense to zero a new
    block if for nothing else than examining the memory. In the real world
    that is.

    >
    > You are aware that calloc sets the allocated area to all bits zero and
    > this need not be suitable for pointers.


    I need this explaining once again.

    ptr = (float*) *fltPointer++;

    If its all bit 0s then surely assignment of 0 will cast to the "real
    null for that pointer type".

    Or would you actually advocate writing your own loop applying a "null"
    for float * to the memory block?
    Richard, Nov 25, 2007
    #19
  20. a

    santosh Guest

    In article <>, Richard
    <> wrote on Monday 26 Nov 2007 5:04 am:

    > Barry Schwarz <> writes:
    >
    >> On Sun, 25 Nov 2007 05:28:00 +0800, "a" <> wrote:
    >>
    >>>By previous replies, it seems that the following method somehow
    >>>solves the problem up to 1000 * 1000 2D data, but when I try 10k *
    >>>10k, the segmentation fault problem appears again.
    >>>
    >>>Richard Tobin told me there is a system limit that can be changed.
    >>>But I don't know which file is to be changed.
    >>>
    >>>I have modified again and again and hope to find out a solution that
    >>>can handle 100k * 100k data.
    >>>
    >>>float** array_to_matrix(float* m, int rows, int cols) {
    >>> int i,j;
    >>>
    >>> float** r;
    >>>
    >>> r = (float**)calloc(rows,sizeof(float*));

    >>
    >> The cast is worse than useless. It can actually have a negative
    >> impact on your development process

    >
    > How? Certainly during debugging it makes perfect sense to zero a new
    > block if for nothing else than examining the memory. In the real world
    > that is.
    >
    >>
    >> You are aware that calloc sets the allocated area to all bits zero
    >> and this need not be suitable for pointers.

    >
    > I need this explaining once again.
    >
    > ptr = (float*) *fltPointer++;
    >
    > If its all bit 0s then surely assignment of 0 will cast to the "real
    > null for that pointer type".


    A runtime value of all bits zero need not necessarily be translated to a
    null pointer value when written to pointers. A source code literal zero
    must however be interpreted in a pointer context as a null pointer
    value, implicitly converted to the appropriate type.

    > Or would you actually advocate writing your own loop applying a "null"
    > for float * to the memory block?


    Yes. This is the only way.
    santosh, Nov 26, 2007
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. kin
    Replies:
    0
    Views:
    283
  2. kin
    Replies:
    0
    Views:
    226
  3. kin
    Replies:
    0
    Views:
    810
  4. AspDotNetDeveloper

    File Attachments MDAC Not Working Past 100K

    AspDotNetDeveloper, Jul 23, 2003, in forum: ASP General
    Replies:
    3
    Views:
    153
    AspDotNetDeveloper
    Jul 23, 2003
  5. MS Guru
    Replies:
    0
    Views:
    118
    MS Guru
    Sep 5, 2003
Loading...

Share This Page