Manipulating big hunks of data

S

Seebs

Is there any kind of container-like-thing in Ruby for the special case
of a large number of fixed-type-datums? (As opposed to objects.)

Basically, imagine that I have, for instance, a huge matrix of data.
So far as I can tell, that really means a huge selection of individual
objects, all allocated possibly using separate chunks of memory, because
they all have to be, well, objects -- they can't just be raw data. Obviously,
using any given item in the matrix is convenient if they're all already
objects, but the storage looks like it'd be ridiculously large.

Is this just not idiomatic in Ruby? Is there some base class or object
type I haven't spotted yet which handles cases like this?

-s
 
B

Bill Kelly

Seebs said:
Is there any kind of container-like-thing in Ruby for the special case
of a large number of fixed-type-datums? (As opposed to objects.)

Basically, imagine that I have, for instance, a huge matrix of data.
So far as I can tell, that really means a huge selection of individual
objects, all allocated possibly using separate chunks of memory, because
they all have to be, well, objects -- they can't just be raw data. Obviously,
using any given item in the matrix is convenient if they're all already
objects, but the storage looks like it'd be ridiculously large.

Is this just not idiomatic in Ruby? Is there some base class or object
type I haven't spotted yet which handles cases like this?

Not sure how closely these fit your requirements, but a couple
seeming possibilities that come to mind are Guy Decoux' mmap
module, and Tokyo Cabinet's array-of-fixed-length-elements database
option:

http://github.com/knu/ruby-mmap

http://1978th.net/tokyocabinet/
http://1978th.net/tokyocabinet/rubydoc/


Hope this helps,

Bill
 
J

James Edward Gray II

=85and Tokyo Cabinet's array-of-fixed-length-elements database
option:
=20
http://github.com/knu/ruby-mmap
=20
http://1978th.net/tokyocabinet/
http://1978th.net/tokyocabinet/rubydoc/

I wrote a bit about Tokyo Cabinet's Fixed-length Database recently, in =
case it helps:

=
http://blog.grayproductions.net/articles/tokyo_cabinets_keyvalue_database_=
types

I kept thinking of the fantastic NArray library while reading the =
initial message, but it's just for in memory work.

James Edward Gray II=
 
S

Seebs

I kept thinking of the fantastic NArray library while reading
the initial message, but it's just for in memory work.

That'd be fine in my case. I'm sort of messing with thoughts about
doing a roguelike game, and somewhere in there, there's nearly always
a level grid, which is typically a fairly large array of something...
But it's lareg enough, and regenerated/reused/etc. enough, that having
thousands upon thousands of objects created and destroyed when messing
with it feels inefficient to me.

Disclaimer: My sense of what kinds of tasks are "too inefficient" was
developed back when a 5MHz system was intended for time sharing among
many users.

But it's nice to know that a solution for this problem exists -- if I
need something like that, NArray would solve the cases I most often have
to deal with.

-s
 
R

Robert Klemme

2010/1/17 Seebs said:
Is there any kind of container-like-thing in Ruby for the special case
of a large number of fixed-type-datums? =A0(As opposed to objects.)

Basically, imagine that I have, for instance, a huge matrix of data.
So far as I can tell, that really means a huge selection of individual
objects, all allocated possibly using separate chunks of memory, because
they all have to be, well, objects -- they can't just be raw data. =A0Obv= iously,
using any given item in the matrix is convenient if they're all already
objects, but the storage looks like it'd be ridiculously large.

Is this just not idiomatic in Ruby? =A0Is there some base class or object
type I haven't spotted yet which handles cases like this?

Since everything is an object in Ruby (well, almost) I am not sure
what to make of your distinction. What I often do in these situations
is this

unifier =3D Hash.new {|h,k| h[k.freeze] =3D k}
# read or create lots of stuff
item =3D unifier[item]

That way you keep at least only one version of a set of equivalent
objects. This helps of course only if you have repetitive values.

Depending on your use case another approach could be to store in a
single (or multiple) Strings and use #pack and #unpack. I guess this
works best if your data is uniformly sized.

Btw, what volume of data are we talking about?

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
S

Seebs

Btw, what volume of data are we talking about?

Well, as an example, say I were doing number-crunching, and I wanted to
have a block of, say, twenty million doubles.

It looks like NArray is the right tool for the job -- it can give me
array-like semantics on things which have the behavior of doubles, but
without me having to keep 20M objects wrapping my 20M double values.

-s
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,151
Latest member
JaclynMarl
Top