G
Gavin Sinclair
Folks,
As an intermediate step in a small software system that performs a large
amount of data gathering, I am using Marshal to store processed results,
so that other programs can use this information.
At first, this was an excellent solution: couldn't be easier, and it
demonstrated good performance. However, as I am collecting more and more
data, the period of time required to serialise the data to disk is
balooning - it seems quadratically. The current size of my data is: an
array with 10000 moderately-sized objects. The size on disk is only
around 900K, and it took at least 10 minutes to write it.
Part of the problem is my approach:
- read the marshalled data
- gather some more data into the array
- write the marshalled data
Each iteration produces a few thousand more data elements, but obviously
has to write _all_ of the data back to disk.
Can anyone suggest a better approach to storing the data? I am open to
all suggestions, but I am hoping for a very easy solution. At work, where
the use of Ruby is not admired, I don't want to expand the software
dependencies if I can avoid it. I don't have much time, either, so I need
a simple solution.
This is using ruby 1.6.5 (I know I'm bad...) on Cygwin.
Thanks,
Gavin
As an intermediate step in a small software system that performs a large
amount of data gathering, I am using Marshal to store processed results,
so that other programs can use this information.
At first, this was an excellent solution: couldn't be easier, and it
demonstrated good performance. However, as I am collecting more and more
data, the period of time required to serialise the data to disk is
balooning - it seems quadratically. The current size of my data is: an
array with 10000 moderately-sized objects. The size on disk is only
around 900K, and it took at least 10 minutes to write it.
Part of the problem is my approach:
- read the marshalled data
- gather some more data into the array
- write the marshalled data
Each iteration produces a few thousand more data elements, but obviously
has to write _all_ of the data back to disk.
Can anyone suggest a better approach to storing the data? I am open to
all suggestions, but I am hoping for a very easy solution. At work, where
the use of Ruby is not admired, I don't want to expand the software
dependencies if I can avoid it. I don't have much time, either, so I need
a simple solution.
This is using ruby 1.6.5 (I know I'm bad...) on Cygwin.
Thanks,
Gavin