Good or best way to allocate a large array

R

Ralph Shnelvar

Newbie here:

I will want/need to create an array of approximately 300K rows and 20
columns. The array will be an array of either 32-bit quantities or 64-bit quantities.

Does Ruby support native 64-bit quantities?

What's a good/best way to to create such an array. Is there a way to
tell Ruby, "OK, I'm done with this array."

Can one create such an array such that the garbage collector can
free the memory in one fast operation?

Ralph
 
A

Aldric Giacomoni

Ralph said:
Newbie here:

I will want/need to create an array of approximately 300K rows and 20
columns. The array will be an array of either 32-bit quantities or
64-bit quantities.

Does Ruby support native 64-bit quantities?

What's a good/best way to to create such an array. Is there a way to
tell Ruby, "OK, I'm done with this array."

Can one create such an array such that the garbage collector can
free the memory in one fast operation?

Ralph

I don't know when arrays stop being as performant as one could wish. For
a problem of this size, it seems to me that one may want instead to use
a database, such as sqlite, or a stronger beast? I expect it'll make
your code more readable, too.
Is there a particular reason to not make the jump to a database?
 
M

Marnen Laibow-Koser

Ralph said:
Newbie here:

I will want/need to create an array of approximately 300K rows and 20
columns. The array will be an array of either 32-bit quantities or
64-bit quantities.

Dude, you want a database. There's little reason in any language to
have an array that big.
Does Ruby support native 64-bit quantities?

I'm not sure, but with transparent Bignum support, it doesn't matter
that much.
What's a good/best way to to create such an array. Is there a way to
tell Ruby, "OK, I'm done with this array."

Can one create such an array such that the garbage collector can
free the memory in one fast operation?

The answer to both questions: as far as I know, it will be deallocated
when it goes out of scope.

Best,
 
R

Ralph Shnelvar

Marnen,

I am a Ruby newbie but as far as I know, memory gets freed when the
garbage collector gets around to doing it.

Ralph


MLK> The answer to both questions: as far as I know, it will be deallocated
MLK> when it goes out of scope.
 
M

Marnen Laibow-Koser

Ralph said:
Marnen,

I am a Ruby newbie but as far as I know, memory gets freed when the
garbage collector gets around to doing it.

Right. And as far as I know, it will do that when the object goes out
of scope.
Ralph


MLK> The answer to both questions: as far as I know, it will be
deallocated
MLK> when it goes out of scope.
Best,
 
R

Ralph Shnelvar

Marnen,

Unless "scope" means something different in Ruby than it does in other
languages, it would appear that the garbage collector cannot possibly
free memory when something goes out of scope but, instead, when the
UseCount goes to zero.

Ralph



Monday, November 9, 2009, 12:31:21 PM, you wrote:


MLK> Right. And as far as I know, it will do that when the object goes out
MLK> of scope.
MLK> Best,
MLK> --
MLK> Marnen Laibow-Koser
MLK> http://www.marnen.org
MLK> (e-mail address removed)
 
R

Robert Klemme

Newbie here:

I will want/need to create an array of approximately 300K rows and 20
columns. The array will be an array of either 32-bit quantities or 64-bit quantities.

So we are talking about 300,000 * 20 * 8 ~ 48MB which does not sound too
large.

robert@fussel:~$ time ruby1.9 -e 'x=1<<63;Array.new(300_000*20) {x}'

real 0m1.922s
user 0m1.288s
sys 0m0.032s
robert@fussel:~$ time ruby1.9 -e 'x=1<<63;Array.new(300_000*20) {1<<63}'

real 0m7.143s
user 0m6.668s
sys 0m0.204s

Does Ruby support native 64-bit quantities?

Yes. You can try it out

ruby1.9 -e '100.times {|i| x = 1 << i;p x, x.class}'
What's a good/best way to to create such an array. Is there a way to
tell Ruby, "OK, I'm done with this array."

No need to: if there are no more live references to that Array it will
get collected eventually.
Can one create such an array such that the garbage collector can
free the memory in one fast operation?

Probably not because of the number of objects in the Array (if they are
all Fixnums it will probably a bit faster.

The question is, what do you do with that huge Array? Chances are that
you need to allocate more memory during processing. What's your use
case? A database might be an option or a flat file - it all depends.

Kind regards

robert
 
F

Florian Frank

Ralph said:
Marnen,

Unless "scope" means something different in Ruby than it does in other
languages, it would appear that the garbage collector cannot possibly
free memory when something goes out of scope but, instead, when the
UseCount goes to zero.
Except that uses aren't counted in Ruby. Ruby has a mark-and-sweep
garbage collector. When Ruby runs out of memory a garbage collection run
is triggered. If there aren't any references to the objects in question
left, they aren't reachable from the root set and thus not marked in the
first phase of GC, in the second phase they will then (too be
conservative: most likely) be freed.
 
R

Ralph Shnelvar

Robert,

Doesn't that just move things into a BiNum ... which is not native 64?

Ralph


RK> Yes. You can try it out

RK> ruby1.9 -e '100.times {|i| x = 1 << i;p x, x.class}'
 
R

Ralph Shnelvar

Florian,

If there is no use count, how does Ruby know that there are no
references to the object that should be freed?

Ralph

Monday, November 9, 2009, 3:02:37 PM, you wrote:

FF> Except that uses aren't counted in Ruby. Ruby has a mark-and-sweep
FF> garbage collector. When Ruby runs out of memory a garbage collection run
FF> is triggered. If there aren't any references to the objects in question
FF> left, they aren't reachable from the root set and thus not marked in the
FF> first phase of GC, in the second phase they will then (too be
FF> conservative: most likely) be freed.
 
R

Rick DeNatale

Florian,

If there is no use count, how does Ruby know that there are no
references to the object that should be freed?

As Florian said, MRI ruby uses a mark-sweep GC algorithm, when it
needs to GC it traces out references from a set of root objects, and
marks any objects which are reachable recursively.

Then it frees any objects which aren't marked.

Reference counting might seem simpler, but it is expensive because the
count needs to be maintained everytime a reference changes, and it
can't reclaim cyclic garbage which can't be reached from a root:


a -> b -> c
^ |
*------------+

Mark/Sweep is a fairly primitive GC technique, it was probably the
second technique applied in the long history of GC. There are more
recent techniques such as generation scavenging which makes use of the
observance that most objects in a uniformly object-oriented language
like Ruby either live a very short, or a reasonably long life time
with the preponderance being short-lived. Generation scavenging
collects the short-lived objects very efficiently, and typically uses
mark-sweep less frequently to clean up the older ones.

--
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale
 
R

Ralph Shnelvar

Rick,

Ok ... but the point is that the objects are not gc'd when things go
out of scope but, instead, when Ruby needs memory, right?

And if you have a huge array with gobs of objects, the gc is gonna
take a while?

Ralph


Monday, November 9, 2009, 4:06:23 PM, you wrote:


RD> As Florian said, MRI ruby uses a mark-sweep GC algorithm, when it
RD> needs to GC it traces out references from a set of root objects, and
RD> marks any objects which are reachable recursively.

RD> Then it frees any objects which aren't marked.

RD> Reference counting might seem simpler, but it is expensive because the
RD> count needs to be maintained everytime a reference changes, and it
RD> can't reclaim cyclic garbage which can't be reached from a root:


RD> a -> b -> c
RD> ^ |
RD> *------------+

RD> Mark/Sweep is a fairly primitive GC technique, it was probably the
RD> second technique applied in the long history of GC. There are more
RD> recent techniques such as generation scavenging which makes use of the
RD> observance that most objects in a uniformly object-oriented language
RD> like Ruby either live a very short, or a reasonably long life time
RD> with the preponderance being short-lived. Generation scavenging
RD> collects the short-lived objects very efficiently, and typically uses
RD> mark-sweep less frequently to clean up the older ones.
 
F

Florian Gilcher

Hi,

You can manually kickstart the GC by using GC.start.

http://ruby-doc.org/core/classes/GC.html#M003682

Otherwise, AFAIK, the MRI GCs before breaking certain memory barriers,
but
don't quote me on that. (I think it does that by counting mallocs and
starting the GC once a certain number is hit)

This actually explains this behavior:

$ time ruby19 -e 'x=1<<63;Array.new(300_000*20) {1<<63}'
real 0m4.091s
user 0m3.430s
sys 0m0.361s

$ time ruby19 -e 'GC.disable; x=1<<63;Array.new(300_000*20) {1<<63}'

real 0m2.489s
user 0m1.845s
sys 0m0.565s

But be aware though, that this is not caused by the allocation
of the array:

$ time ruby19 -e 'x=1<<63;Array.new(300_000*20) {x}'

real 0m0.846s
user 0m0.735s
sys 0m0.049s

$ time ruby19 -e 'GC.disable; x=1<<63;Array.new(300_000*20) {x}'

real 0m0.778s
user 0m0.710s
sys 0m0.052s


So, before (knowingly) breaking those limits, it might be an option
to disable the GC. Handle with care, though.

Regards,
Florian
 
M

Marnen Laibow-Koser

Ralph said:
Rick,

Ok ... but the point is that the objects are not gc'd when things go
out of scope but, instead, when Ruby needs memory, right?

And if you have a huge array with gobs of objects, the gc is gonna
take a while?

Perhaps. But there's no reason to do this. Use a database. Even a
file-based DB like SQLite will make your life easier.

Best,
 
R

Ralph Shnelvar

Florian,

Again, I am a newbie but ...

What I _think_ you are saying is that the difference between:


$ time ruby19 -e 'x=1<<63;Array.new(300_000*20) {1<<63}'
real 0m4.091s

and

$ time ruby19 -e 'x=1<<63;Array.new(300_000*20) {x}'
real 0m0.846s


is that the first initializes 300_000*20 BigNums and the second initializes 300_000*20 references to a BigNum.

Ralph




Monday, November 9, 2009, 7:32:06 PM, you wrote:

FG> Hi,

FG> You can manually kickstart the GC by using GC.start.

FG> http://ruby-doc.org/core/classes/GC.html#M003682

FG> Otherwise, AFAIK, the MRI GCs before breaking certain memory barriers,
FG> but
FG> don't quote me on that. (I think it does that by counting mallocs and
FG> starting the GC once a certain number is hit)

FG> This actually explains this behavior:

FG> $ time ruby19 -e 'x=1<<63;Array.new(300_000*20) {1<<63}'
FG> real 0m4.091s
FG> user 0m3.430s
FG> sys 0m0.361s

FG> $ time ruby19 -e 'GC.disable; x=1<<63;Array.new(300_000*20) {1<<63}'

FG> real 0m2.489s
FG> user 0m1.845s
FG> sys 0m0.565s

FG> But be aware though, that this is not caused by the allocation
FG> of the array:

FG> $ time ruby19 -e 'x=1<<63;Array.new(300_000*20) {x}'

FG> real 0m0.846s
FG> user 0m0.735s
FG> sys 0m0.049s

FG> $ time ruby19 -e 'GC.disable; x=1<<63;Array.new(300_000*20) {x}'

FG> real 0m0.778s
FG> user 0m0.710s
FG> sys 0m0.052s


FG> So, before (knowingly) breaking those limits, it might be an option
FG> to disable the GC. Handle with care, though.

FG> Regards,
FG> Florian

 
R

Robert Klemme

Please do not top post.

2009/11/9 Ralph Shnelvar said:
Doesn't that just move things into a BiNum ... which is not native 64?

Well, define "native". I chose to interpret it as "does Ruby come
with built in, C implemented support for 64 integers" - which it does.
:)

Cheers

robert
 
R

Ralph Shnelvar

Robert,

Tuesday, November 10, 2009, 1:23:48 AM, you wrote:

RK> Please do not top post.

Yup, you are right. The culture here is to do it this way.

Thanks for pointing it out.

While I'm at it, does this list like/dislike HTML-based posts?


RK> Well, define "native". I chose to interpret it as "does Ruby come
RK> with built in, C implemented support for 64 integers" - which it does.
RK> :)

There is a footnote on page 829 of the 1.9 Pickaxe book that reads:

"Fixnums are stored as 31-bit numbers ... Or 63 bit on wider CPU
architectures."

Is there any way to tell what architecture Ruby thinks I'm using?

RK> Cheers

RK> robert
 
B

Brian Candler

Ralph said:
I will want/need to create an array of approximately 300K rows and 20
columns. The array will be an array of either 32-bit quantities or
64-bit quantities.

Does Ruby support native 64-bit quantities?

If you want to hold this all in RAM, then I think your most
space-efficient way would be to pack each row of 20 columns into a
String (of 80 or 160 bytes).

You can use Array#pack and String#unpack to convert when required.

val1 = 0x0102030405060708
val2 = 0xf0e0d0c0b0a09080
row = [val1, val2].pack("Q*")
p row.unpack("Q*")

Q* is for unsigned 64-bit. Use q* for signed 64, L* or l* for
unsigned/signed 32 bit. With 32-bit there are also options for using a
fixed byte ordering (little or big endian) rather than native byte
ordering.

Warning: take care if using ruby 1.9, because 1.9 Strings are of
"characters" and not necessarily of "bytes". This may become critical if
you decide to write these structures out to disk and read them back in
again. Details at
http://github.com/candlerb/string19/blob/master/string19.rb
 
F

Florian Gilcher

Ralph,

Well, that's a minor point. I just reused Roberts examples because
I expected that to be rather clear.

The thing is that an Array of size (300_000*20) already has as many
slots for references. So it's not "initializing references". It just
makes every field of the Array referencing that one BigNum. There is
no allocation or initialization happening in this step.

As you correctly stated, the first example allocs (300_000*20 + 1)
objects. The interesting point in this is that it triggers the
GC more than once, although there is nothing to collect.

Regards,
Florian

Florian,

Again, I am a newbie but ...

What I _think_ you are saying is that the difference between:


$ time ruby19 -e 'x=1<<63;Array.new(300_000*20) {1<<63}'
real 0m4.091s

and

$ time ruby19 -e 'x=1<<63;Array.new(300_000*20) {x}'
real 0m0.846s


is that the first initializes 300_000*20 BigNums and the second
initializes 300_000*20 references to a BigNum.

Ralph




Monday, November 9, 2009, 7:32:06 PM, you wrote:

FG> Hi,

FG> You can manually kickstart the GC by using GC.start.

FG> http://ruby-doc.org/core/classes/GC.html#M003682

FG> Otherwise, AFAIK, the MRI GCs before breaking certain memory
barriers,
FG> but
FG> don't quote me on that. (I think it does that by counting
mallocs and
FG> starting the GC once a certain number is hit)

FG> This actually explains this behavior:

FG> $ time ruby19 -e 'x=1<<63;Array.new(300_000*20) {1<<63}'
FG> real 0m4.091s
FG> user 0m3.430s
FG> sys 0m0.361s

FG> $ time ruby19 -e 'GC.disable; x=1<<63;Array.new(300_000*20)
{1<<63}'

FG> real 0m2.489s
FG> user 0m1.845s
FG> sys 0m0.565s

FG> But be aware though, that this is not caused by the allocation
FG> of the array:

FG> $ time ruby19 -e 'x=1<<63;Array.new(300_000*20) {x}'

FG> real 0m0.846s
FG> user 0m0.735s
FG> sys 0m0.049s

FG> $ time ruby19 -e 'GC.disable; x=1<<63;Array.new(300_000*20)
{x}'

FG> real 0m0.778s
FG> user 0m0.710s
FG> sys 0m0.052s


FG> So, before (knowingly) breaking those limits, it might be an
option
FG> to disable the GC. Handle with care, though.

FG> Regards,
FG> Florian

--
Florian Gilcher

smtp: (e-mail address removed)
jabber: (e-mail address removed)
gpg: 533148E2
 
R

Robert Klemme

2009/11/10 Florian Gilcher said:
The thing is that an Array of size (300_000*20) already has as many
slots for references. So it's not "initializing references". It just
makes every field of the Array referencing that one BigNum. There is
no allocation or initialization happening in this step.

Right. Only the single BigNum and the Array need to be allocated.
As you correctly stated, the first example allocs (300_000*20 + 1)
objects. The interesting point in this is that it triggers the
GC more than once, although there is nothing to collect.

Well, but this is expected: the second example has more need for
memory (every time a new BigNum must be created), and if memory is
needed it is reasonable to first start GC before asking the OS for
more memory. GC cannot know that it will not find anything
collectible.

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top