M
Martin Pirker
Hi...
short version:
What's your prefered way to handle large data sets used by Ruby scripts?
long version:
Some time ago I had a problem where Rubys text munching capabilities
helped very much. Unfortunately, the data set was quite large, an array
of ~40000 entries, whereas every entry is itself an array of ~8 data
(String, Int,... whatever) points.
All in all the final solution ran in ~2min time and iterated several
times over the whole data set -> a brute force approach.
What surprised me, a first "more efficient" implementation which took
"notes" in temp structures while munching, but did require less full
iterations, was much slower.
Now I got to do such a thing again, but the data set will be in the
upper 6-digit count.
Brute force again?
Is the GC the significant speed problem with temp data?
Is the GC suited for xxxMb data at all?
Is "mixing in" an external database maybe faster?
....
Is there a reference for non source divers how Rubys basic
datastructures (Array, Hash, String,...) perform with basic
ops - insert, delete, append, search, ...
Can be tested one for one, but if I know some things run at order n^2
I won't even try?
Any experiences to tell?
Martin
short version:
What's your prefered way to handle large data sets used by Ruby scripts?
long version:
Some time ago I had a problem where Rubys text munching capabilities
helped very much. Unfortunately, the data set was quite large, an array
of ~40000 entries, whereas every entry is itself an array of ~8 data
(String, Int,... whatever) points.
All in all the final solution ran in ~2min time and iterated several
times over the whole data set -> a brute force approach.
What surprised me, a first "more efficient" implementation which took
"notes" in temp structures while munching, but did require less full
iterations, was much slower.
Now I got to do such a thing again, but the data set will be in the
upper 6-digit count.
Brute force again?
Is the GC the significant speed problem with temp data?
Is the GC suited for xxxMb data at all?
Is "mixing in" an external database maybe faster?
....
Is there a reference for non source divers how Rubys basic
datastructures (Array, Hash, String,...) perform with basic
ops - insert, delete, append, search, ...
Can be tested one for one, but if I know some things run at order n^2
I won't even try?
Any experiences to tell?
Martin