Good or best way to allocate a large array

R

Ralph Shnelvar

Florian,

FG> The thing is that an Array of size (300_000*20) already has as many
FG> slots for references. So it's not "initializing references". It just
FG> makes every field of the Array referencing that one BigNum. There is
FG> no allocation or initialization happening in this step.

There has to be initialization, doesn't there?

irb(main):001:0> a=Array.new(10)
=> [nil, nil, nil, nil, nil, nil, nil, nil, nil, nil]
irb(main):002:0> a[5]
=> nil
 
F

Florian Gilcher

Well, but this is expected: the second example has more need for
memory (every time a new BigNum must be created), and if memory is
needed it is reasonable to first start GC before asking the OS for
more memory. GC cannot know that it will not find anything
collectible.

Kind regards

robert


It was not meant as a complaint, more as an explanation why the behavior
in my first mail happens (~~40% performance gain by switching off the GC
before allocating the array). It is just a sample where the heuristic
applied by the GC somewhat fails.

The GC behavior is reasonable, but not efficient in that case. In most
cases,
this one second does not really matter, so I would recommend against
tampering with it. Thats why I wrote that you should handle this with
care.

It is, in the end, a nice example illustrating how the GC behaves and
that's what I used it for.

Regards,
Florian
 
J

Jonathan Schmidt-Dominé - Developer

What's about using C or C++ to allocate and deallocate the array? I do not
think Ruby was made to create such Arrays.
 
J

Justin Collins

Is there any way to tell what architecture Ruby thinks I'm using?

RK> Cheers

RK> robert

Check the size of an integer.

$ ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]
$ irb
irb(main):001:0> 1.size
=> 8


$ ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-linux]
$ irb
irb(main):001:0> 1.size
=> 4


-Justin
 
F

Florian Gilcher

What's about using C or C++ to allocate and deallocate the array? I =20=
do not
think Ruby was made to create such Arrays.


All basic Methods in Array (especially #new) are implemented in C, so
I don't think you will gain a lot. Implementing the whole algorithm in
C might be worthwhile, because you would bypass using a Ruby object for
every slot, but the Array is not the issue here.

As others have shown, it works reasonably quick.

Regards,
Florian=
 
J

Jonathan Schmidt-Dominé - Developer

Hi!

I thaught about allocation and deallocation from C. And I think there you have
more control over GC and the heap.

The User
 
F

Florian Gilcher

Unnecessary. Just use a database.


Why are you so insisting on a database? In contrast to your opinion,
there are valid cases where an in-memory array is just what you want.

Varnish for example is an interesting example that uses data structures
in that magnitude.

Regards,
Florian Gilcher
 
R

Ralph Shnelvar

Jonathan,

JSDD> What's about using C or C++ to allocate and deallocate the array? I do not
JSDD> think Ruby was made to create such Arrays.

Yes, I am beginning to come to such a conclusion. Now I have to
figure out how to interface the two. The Ruby book is good ... but a
Windows-based example of adding two numbers together and returning the
result would be even better. :)
 
R

Ralph Shnelvar

Justin,


JC> Check the size of an integer.

JC> $ ruby -v
JC> ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]
JC> $ irb
irb(main):001:0>> 1.size
=>> 8


JC> $ ruby -v
JC> ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-linux]
JC> $ irb
irb(main):001:0>> 1.size
=>> 4

What a lovely and elegant solution!

Thanks!

Ralph
 
M

Marnen Laibow-Koser

Florian said:
Why are you so insisting on a database?

Because almost any array that large is better dealt with through a DB
interface.
In contrast to your opinion,
there are valid cases where an in-memory array is just what you want.

How about an in-memory DB? SQLite can function that way.
Varnish for example is an interesting example that uses data structures
in that magnitude.

Varnish is a caching package, not what most people will want to
implement. Besides, if the OP were implementing something like Varnish,
I suspect he'd be knowledgeable enough that he wouldn't have asked the
question in the first place.

Best,
 
S

Simon Krahnke

* Rick DeNatale said:
Mark/Sweep is a fairly primitive GC technique, it was probably the
second technique applied in the long history of GC. There are more
recent techniques such as generation scavenging which makes use of the
observance that most objects in a uniformly object-oriented language
like Ruby either live a very short, or a reasonably long life time
with the preponderance being short-lived. Generation scavenging
collects the short-lived objects very efficiently, and typically uses
mark-sweep less frequently to clean up the older ones.

In my understanding these are orthogonal concepts. How to find out what
to collect, and how to do the collection.

Mark and sweep is a technique to find collectable objects just like
reference counting. And a copying gc (like a generational) is a
way of collecting these objects, more efficient than calling free().

mfg, simon .... l
 
R

Rick DeNatale

In my understanding these are orthogonal concepts. How to find out what
to collect, and how to do the collection.

Mark and sweep is a technique to find collectable objects just like
reference counting. And a copying gc (like a generational) is a
way of collecting these objects, more efficient than calling free().

No, a copying GC doesn't substitute for the sweep phase of mark and sweep.

Instead it works by tracking references to new objects from old
objects, and copying live objects to a new generation space, leaving
new objects which are not reachable behind. After an object has
survived a number of such scavenging cycles it gets moved to old space
which is then usually gc'd using another method such as mark and
sweep. A copying GC does a bit of marking via a write barrier which
notes when a reference to a 'young' object has been stored in an 'old'
object, and combines the traversal of the young object space with
collection by moving surviving objects.

It's efficient because the overhead of eliminating young dead objects
is quite a bit lower, and typically there is a lot of infant mortality
in a highly object oriented system.
--=20
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale
 
C

Caleb Clausen

Newbie here:

I will want/need to create an array of approximately 300K rows and 20
columns. The array will be an array of either 32-bit quantities or 64-bit
quantities.

Does Ruby support native 64-bit quantities?

What's a good/best way to to create such an array. Is there a way to
tell Ruby, "OK, I'm done with this array."

Can one create such an array such that the garbage collector can
free the memory in one fast operation?

I'm surprised that no-one has yet mentioned narray; I'd think it's the
ideal data structure for this kind of problem. (I've never used it,
tho.)

I'd like to point out that while the memory cost of a Fixnum is small
(4 or 8 bytes), the memory cost of a Bignum is considerably larger....
something like 32 bytes or so, even if it's only storing a 64bit
value. If you expect a significant number of Bignums to be stored in
this array, you should be aware of this. Yet another reason to
consider narray, to my mind.
 
R

Robert Klemme

I'm surprised that no-one has yet mentioned narray; I'd think it's the
ideal data structure for this kind of problem. (I've never used it,
tho.)

Actually everybody waited for you to finally come up with your reply.
Shame on you that it took you so long. ;-)
I'd like to point out that while the memory cost of a Fixnum is small
(4 or 8 bytes), the memory cost of a Bignum is considerably larger....
something like 32 bytes or so, even if it's only storing a 64bit
value. If you expect a significant number of Bignums to be stored in
this array, you should be aware of this. Yet another reason to
consider narray, to my mind.

Certainly. However, if Bignums work as well, why worry to use another
lib? Whether the allocation overhead is OK or not depends of course on
the use case. Btw, did we see a more concrete description of the
problem? I cannot remember having seen it.

Kind regards

robert
 
R

Ralph Shnelvar

CC> I'm surprised that no-one has yet mentioned narray; I'd think it's the
CC> ideal data structure for this kind of problem. (I've never used it,
CC> tho.)

CC> I'd like to point out that while the memory cost of a Fixnum is small
CC> (4 or 8 bytes), the memory cost of a Bignum is considerably larger....
CC> something like 32 bytes or so, even if it's only storing a 64bit
CC> value. If you expect a significant number of Bignums to be stored in
CC> this array, you should be aware of this. Yet another reason to
CC> consider narray, to my mind.

Could someone point me at documentation on narray, please.
 
M

Marnen Laibow-Koser

Ralph Shnelvar wrote:
[...]
Could someone point me at documentation on narray, please.

You could have found it in about 5 seconds with Google. Please try that
before posting in future.

Best,
 
S

Simon Krahnke

* Caleb Clausen said:
http://narray.rubyforge.org/

Now that I'm looking at it, I see that it says:


No 8-byte integers... maybe you could use two parallel arrays, one for
low half and one for high half of your data. Depends on what you're
using this for.

Or look into the source, and eventually patch it. It might already
support 8 bytes on 64bit, and just not document that, or it might
be an easy change.

mfg, simon .... l
 
R

Ralph Shnelvar

Friday, November 13, 2009, 9:45:10 AM, you wrote:


RK> Actually everybody waited for you to finally come up with your reply.
RK> Shame on you that it took you so long. ;-)

RK> Certainly. However, if Bignums work as well, why worry to use another
RK> lib? Whether the allocation overhead is OK or not depends of course on
RK> the use case. Btw, did we see a more concrete description of the
RK> problem? I cannot remember having seen it.

I am the OP and there really isn't much more to the problem.

Basically, each row of the array is a list of integers. The first
element of the array is an id. The remaining elements of the array are
a list of ids.

Thus, if
MyMatrix = [ [100, [200, 300, 400]],
[200, [300, 400, 500, 600]],
[300, [400, 600, 555, 1902, 1708]],
[400, 101, 200, 100] ]

If I am processing the last vector (i.e. [400, 101, 200, 100]) then I
want to be able to note that

101 has no corresponding element
200 was found and its remaining content is [300, 400, 500, 600]
100 was found and its remaining content is [200, 300, 400]

Speed is critical. The array is more-or-less static. That is, access
to this array swamps any changes to the array.

The order of the rows is irrelevant ... and for speed reasons the
Matrix will be sorted by the first column so that I can do a binary
search.

Of course, I could use a hash ... but I doubt that a hash would beat
my binary search ... I could be wrong about this.

RK> Kind regards

RK> robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,060
Latest member
BuyKetozenseACV

Latest Threads

Top