Speed Freak

D

Dave Bee

This is a conceptual question rather than a specific coding one, but
hopefully someone might have played around with something similar. In
a nutshell, I have around 10 million information entries with lots of
data points. My current script has two stages - the first organises
certain data points of the data into large (huge) hashes, and the
second stage forks off lots of children and does the subsequent
processing to produce ldifs, using the information in the hashes
(thanks to copy-on-write, and the fact that the children don't need to
update the hashes this doesn't use a great deal of memory).

My current problem is with stage one - it is, by current necessity, a
single process, since it needs to refer to information within the
hashes as it builds them, and the processing required by the single
processor is the choke point here. I would like to cut down the
current time it takes to do the first stage processing (~50 minutes)
and I am at liberty to use any interesting techniques in order to do
so - my hardware is somewhat above spec (24 CPU 6800, 48G RAM etc),
and can be dedicated 100% to the script when it runs, so unusual and
incredibly memory / CPU wasteful techniques are more than welcome.

I've thought of threading (no real experience, but I could probably
figure something out), parent hash-controller with multiple forked
children etc, I'm just curious if anyone has done something similar
and already knows the most efficient way of doing this.

Dave
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top