Good or best way to allocate a large array

Ralph Shnelvar · Nov 10, 2009

Florian,

FG> The thing is that an Array of size (300_000*20) already has as many
FG> slots for references. So it's not "initializing references". It just
FG> makes every field of the Array referencing that one BigNum. There is
FG> no allocation or initialization happening in this step.

There has to be initialization, doesn't there?

irb(main):001:0> a=Array.new(10)
=> [nil, nil, nil, nil, nil, nil, nil, nil, nil, nil]
irb(main):002:0> a[5]
=> nil

Florian Gilcher · Nov 10, 2009

Well, but this is expected: the second example has more need for
memory (every time a new BigNum must be created), and if memory is
needed it is reasonable to first start GC before asking the OS for
more memory. GC cannot know that it will not find anything
collectible.

Kind regards

robert

It was not meant as a complaint, more as an explanation why the behavior
in my first mail happens (~~40% performance gain by switching off the GC
before allocating the array). It is just a sample where the heuristic
applied by the GC somewhat fails.

The GC behavior is reasonable, but not efficient in that case. In most
cases,
this one second does not really matter, so I would recommend against
tampering with it. Thats why I wrote that you should handle this with
care.

It is, in the end, a nice example illustrating how the GC behaves and
that's what I used it for.

Regards,
Florian

Jonathan Schmidt-Dominé - Developer · Nov 10, 2009

What's about using C or C++ to allocate and deallocate the array? I do not
think Ruby was made to create such Arrays.

Marnen Laibow-Koser · Nov 10, 2009

Jonathan said:
What's about using C or C++ to allocate and deallocate the array?

Unnecessary. Just use a database.

I do
not
think Ruby was made to create such Arrays.

What gives you that idea?

Best,

Justin Collins · Nov 10, 2009

Is there any way to tell what architecture Ruby thinks I'm using?

RK> Cheers

RK> robert

Check the size of an integer.

$ ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]
$ irb
irb(main):001:0> 1.size
=> 8

$ ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-linux]
$ irb
irb(main):001:0> 1.size
=> 4

-Justin

Florian Gilcher · Nov 10, 2009

What's about using C or C++ to allocate and deallocate the array? I =20=

do not
think Ruby was made to create such Arrays.

All basic Methods in Array (especially #new) are implemented in C, so
I don't think you will gain a lot. Implementing the whole algorithm in
C might be worthwhile, because you would bypass using a Ruby object for
every slot, but the Array is not the issue here.

As others have shown, it works reasonably quick.

Regards,
Florian=

Jonathan Schmidt-Dominé - Developer · Nov 10, 2009

Hi!

I thaught about allocation and deallocation from C. And I think there you have
more control over GC and the heap.

The User

Florian Gilcher · Nov 10, 2009

Unnecessary. Just use a database.

Why are you so insisting on a database? In contrast to your opinion,
there are valid cases where an in-memory array is just what you want.

Varnish for example is an interesting example that uses data structures
in that magnitude.

Regards,
Florian Gilcher

Ralph Shnelvar · Nov 10, 2009

Jonathan,

JSDD> What's about using C or C++ to allocate and deallocate the array? I do not
JSDD> think Ruby was made to create such Arrays.

Yes, I am beginning to come to such a conclusion. Now I have to
figure out how to interface the two. The Ruby book is good ... but a
Windows-based example of adding two numbers together and returning the
result would be even better.

Ralph Shnelvar · Nov 10, 2009

Justin,

JC> Check the size of an integer.

JC> $ ruby -v
JC> ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]
JC> $ irb
irb(main):001:0>> 1.size
=>> 8

JC> $ ruby -v
JC> ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-linux]
JC> $ irb
irb(main):001:0>> 1.size
=>> 4

What a lovely and elegant solution!

Thanks!

Ralph

Marnen Laibow-Koser · Nov 10, 2009

Florian said:
Why are you so insisting on a database?

Because almost any array that large is better dealt with through a DB
interface.

In contrast to your opinion,
there are valid cases where an in-memory array is just what you want.

How about an in-memory DB? SQLite can function that way.

Varnish for example is an interesting example that uses data structures
in that magnitude.

Varnish is a caching package, not what most people will want to
implement. Besides, if the OP were implementing something like Varnish,
I suspect he'd be knowledgeable enough that he wouldn't have asked the
question in the first place.

Best,

Simon Krahnke · Nov 12, 2009

* Rick DeNatale said:
Mark/Sweep is a fairly primitive GC technique, it was probably the
second technique applied in the long history of GC. There are more
recent techniques such as generation scavenging which makes use of the
observance that most objects in a uniformly object-oriented language
like Ruby either live a very short, or a reasonably long life time
with the preponderance being short-lived. Generation scavenging
collects the short-lived objects very efficiently, and typically uses
mark-sweep less frequently to clean up the older ones.

In my understanding these are orthogonal concepts. How to find out what
to collect, and how to do the collection.

Mark and sweep is a technique to find collectable objects just like
reference counting. And a copying gc (like a generational) is a
way of collecting these objects, more efficient than calling free().

mfg, simon .... l

Rick DeNatale · Nov 12, 2009

In my understanding these are orthogonal concepts. How to find out what
to collect, and how to do the collection.

Mark and sweep is a technique to find collectable objects just like
reference counting. And a copying gc (like a generational) is a
way of collecting these objects, more efficient than calling free().

No, a copying GC doesn't substitute for the sweep phase of mark and sweep.

Instead it works by tracking references to new objects from old
objects, and copying live objects to a new generation space, leaving
new objects which are not reachable behind. After an object has
survived a number of such scavenging cycles it gets moved to old space
which is then usually gc'd using another method such as mark and
sweep. A copying GC does a bit of marking via a write barrier which
notes when a reference to a 'young' object has been stored in an 'old'
object, and combines the traversal of the young object space with
collection by moving surviving objects.

It's efficient because the overhead of eliminating young dead objects
is quite a bit lower, and typically there is a lot of infant mortality
in a highly object oriented system.
--=20
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale

Caleb Clausen · Nov 12, 2009

Newbie here:

I will want/need to create an array of approximately 300K rows and 20
columns. The array will be an array of either 32-bit quantities or 64-bit
quantities.

Does Ruby support native 64-bit quantities?

What's a good/best way to to create such an array. Is there a way to
tell Ruby, "OK, I'm done with this array."

Can one create such an array such that the garbage collector can
free the memory in one fast operation?

I'm surprised that no-one has yet mentioned narray; I'd think it's the
ideal data structure for this kind of problem. (I've never used it,
tho.)

I'd like to point out that while the memory cost of a Fixnum is small
(4 or 8 bytes), the memory cost of a Bignum is considerably larger....
something like 32 bytes or so, even if it's only storing a 64bit
value. If you expect a significant number of Bignums to be stored in
this array, you should be aware of this. Yet another reason to
consider narray, to my mind.

Robert Klemme · Nov 13, 2009

I'm surprised that no-one has yet mentioned narray; I'd think it's the
ideal data structure for this kind of problem. (I've never used it,
tho.)

Actually everybody waited for you to finally come up with your reply.
Shame on you that it took you so long. ;-)

I'd like to point out that while the memory cost of a Fixnum is small
(4 or 8 bytes), the memory cost of a Bignum is considerably larger....
something like 32 bytes or so, even if it's only storing a 64bit
value. If you expect a significant number of Bignums to be stored in
this array, you should be aware of this. Yet another reason to
consider narray, to my mind.

Certainly. However, if Bignums work as well, why worry to use another
lib? Whether the allocation overhead is OK or not depends of course on
the use case. Btw, did we see a more concrete description of the
problem? I cannot remember having seen it.

Kind regards

robert

Ralph Shnelvar · Nov 13, 2009

CC> I'm surprised that no-one has yet mentioned narray; I'd think it's the
CC> ideal data structure for this kind of problem. (I've never used it,
CC> tho.)

CC> I'd like to point out that while the memory cost of a Fixnum is small
CC> (4 or 8 bytes), the memory cost of a Bignum is considerably larger....
CC> something like 32 bytes or so, even if it's only storing a 64bit
CC> value. If you expect a significant number of Bignums to be stored in
CC> this array, you should be aware of this. Yet another reason to
CC> consider narray, to my mind.

Could someone point me at documentation on narray, please.

Marnen Laibow-Koser · Nov 14, 2009

Ralph Shnelvar wrote:
[...]

Could someone point me at documentation on narray, please.

You could have found it in about 5 seconds with Google. Please try that
before posting in future.

Best,

Caleb Clausen · Nov 14, 2009

Could someone point me at documentation on narray, please.

http://narray.rubyforge.org/

Now that I'm looking at it, I see that it says:

Supported element types are 1/2/4-byte Integer

No 8-byte integers... maybe you could use two parallel arrays, one for
low half and one for high half of your data. Depends on what you're
using this for.

Simon Krahnke · Nov 14, 2009

* Caleb Clausen said:
http://narray.rubyforge.org/

Now that I'm looking at it, I see that it says:

No 8-byte integers... maybe you could use two parallel arrays, one for
low half and one for high half of your data. Depends on what you're
using this for.

Or look into the source, and eventually patch it. It might already
support 8 bytes on 64bit, and just not document that, or it might
be an easy change.

mfg, simon .... l

Ralph Shnelvar · Nov 14, 2009

Friday, November 13, 2009, 9:45:10 AM, you wrote:

RK> Actually everybody waited for you to finally come up with your reply.
RK> Shame on you that it took you so long. ;-)

RK> Certainly. However, if Bignums work as well, why worry to use another
RK> lib? Whether the allocation overhead is OK or not depends of course on
RK> the use case. Btw, did we see a more concrete description of the
RK> problem? I cannot remember having seen it.

I am the OP and there really isn't much more to the problem.

Basically, each row of the array is a list of integers. The first
element of the array is an id. The remaining elements of the array are
a list of ids.

Thus, if
MyMatrix = [ [100, [200, 300, 400]],
[200, [300, 400, 500, 600]],
[300, [400, 600, 555, 1902, 1708]],
[400, 101, 200, 100] ]

If I am processing the last vector (i.e. [400, 101, 200, 100]) then I
want to be able to note that

101 has no corresponding element
200 was found and its remaining content is [300, 400, 500, 600]
100 was found and its remaining content is [200, 300, 400]

Speed is critical. The array is more-or-less static. That is, access
to this array swamps any changes to the array.

The order of the rows is irrelevant ... and for speed reasons the
Matrix will be sorted by the first column so that I can do a binary
search.

Of course, I could use a hash ... but I doubt that a hash would beat
my binary search ... I could be wrong about this.

RK> Kind regards

RK> robert

Is this a good way to learn programming?	2	Sep 16, 2018
best way to distribute?	22	Dec 19, 2007
Fast way to move C array to ruby (rb_ary_new4)?	6	Apr 27, 2011
Best way to search through an array for an item matching a s	6	Nov 23, 2009
[ANN] Multidimensional Array - MDArray 0.5.5	0	Nov 19, 2013
'failed to allocate memory' with Mp3Info module	0	Mar 22, 2006
efficient access to a large int32 array	6	Jan 16, 2007
What's the best way to extract 2 values from a CSV file from each row systematically?	6	Sep 23, 2013

Good or best way to allocate a large array

Ralph Shnelvar

Florian Gilcher

Jonathan Schmidt-Dominé - Developer

Marnen Laibow-Koser

Justin Collins

Florian Gilcher

Jonathan Schmidt-Dominé - Developer

Florian Gilcher

Ralph Shnelvar

Ralph Shnelvar

Marnen Laibow-Koser

Simon Krahnke

Rick DeNatale

Caleb Clausen

Robert Klemme

Ralph Shnelvar

Marnen Laibow-Koser

Caleb Clausen

Simon Krahnke

Ralph Shnelvar

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads