multi-threaded access to shared memory space

Discussion in 'Ruby' started by Greg Willits, Jun 30, 2008.

  1. Greg Willits

    Greg Willits Guest

    I have a pure Ruby project (no Rails) where I would like multiple
    "tasks" (ruby processes more or less) to run in parallel (collectively
    taking advantage of multiple CPU cores) while accessing a shared memory
    space of data structures.

    OK, that's a mouthful.

    - single machine, multiple cores (4 or 8)

    - step one: pre-load a number of arrays and hashes (could be a couple GB
    worth in total) into memory

    - step two: launch several independent Ruby scripts to search and read
    from the data pool in order to aggregate data in new sets to be written
    to text files.

    Ruby 1.8's threading would seem poorly suited to this. Can 1.9 run
    multiple threads each accesing the same RAM-space while using all cores
    of the machine?

    I've looked at memcache, but it seems like it could store and retrieve
    one of my pool's arrays, but it cannot look inside that array and
    retrieve just a single row of it? It would want to return the whole
    array, yes? (not good if that array is 100MB).

    -- gw
    --
    Posted via http://www.ruby-forum.com/.
    Greg Willits, Jun 30, 2008
    #1
    1. Advertising

  2. Greg Willits

    Eric Hodel Guest

    On Jun 29, 2008, at 23:58 PM, Greg Willits wrote:
    > I have a pure Ruby project (no Rails) where I would like multiple
    > "tasks" (ruby processes more or less) to run in parallel (collectively
    > taking advantage of multiple CPU cores) while accessing a shared
    > memory
    > space of data structures.
    >
    > OK, that's a mouthful.
    >
    > - single machine, multiple cores (4 or 8)
    >
    > - step one: pre-load a number of arrays and hashes (could be a
    > couple GB
    > worth in total) into memory
    >
    > - step two: launch several independent Ruby scripts to search and read
    > from the data pool in order to aggregate data in new sets to be
    > written
    > to text files.
    >
    > Ruby 1.8's threading would seem poorly suited to this. Can 1.9 run
    > multiple threads each accesing the same RAM-space while using all
    > cores
    > of the machine?


    At present, 1.9 has a global VM lock, so only one C thread can be
    running ruby code at a time.

    > I've looked at memcache, but it seems like it could store and retrieve
    > one of my pool's arrays, but it cannot look inside that array and
    > retrieve just a single row of it? It would want to return the whole
    > array, yes? (not good if that array is 100MB).


    memcache is just a cache and not designed to be used as a persistent
    store. It may loose your data if you are not careful.

    You're probably looking for something mmap and several forked
    cooperative processes.
    Eric Hodel, Jun 30, 2008
    #2
    1. Advertising

  3. Greg Willits

    Tim Pease Guest

    On Jun 30, 2008, at 12:58 AM, Greg Willits wrote:

    > I have a pure Ruby project (no Rails) where I would like multiple
    > "tasks" (ruby processes more or less) to run in parallel (collectively
    > taking advantage of multiple CPU cores) while accessing a shared
    > memory
    > space of data structures.
    >
    > OK, that's a mouthful.
    >
    > - single machine, multiple cores (4 or 8)
    >
    > - step one: pre-load a number of arrays and hashes (could be a
    > couple GB
    > worth in total) into memory
    >
    > - step two: launch several independent Ruby scripts to search and read
    > from the data pool in order to aggregate data in new sets to be
    > written
    > to text files.
    >
    > Ruby 1.8's threading would seem poorly suited to this. Can 1.9 run
    > multiple threads each accesing the same RAM-space while using all
    > cores
    > of the machine?
    >
    > I've looked at memcache, but it seems like it could store and retrieve
    > one of my pool's arrays, but it cannot look inside that array and
    > retrieve just a single row of it? It would want to return the whole
    > array, yes? (not good if that array is 100MB).
    >


    Take a look at mmap

    <http://raa.ruby-lang.org/project/mmap/>

    Blessings,
    TwP
    Tim Pease, Jun 30, 2008
    #3
  4. On 30 Jun 2008, at 07:58, Greg Willits wrote:
    > I have a pure Ruby project (no Rails) where I would like multiple
    > "tasks" (ruby processes more or less) to run in parallel (collectively
    > taking advantage of multiple CPU cores) while accessing a shared
    > memory
    > space of data structures.
    >
    > OK, that's a mouthful.
    >
    > - single machine, multiple cores (4 or 8)
    >
    > - step one: pre-load a number of arrays and hashes (could be a
    > couple GB
    > worth in total) into memory
    >
    > - step two: launch several independent Ruby scripts to search and read
    > from the data pool in order to aggregate data in new sets to be
    > written
    > to text files.
    >
    > Ruby 1.8's threading would seem poorly suited to this. Can 1.9 run
    > multiple threads each accesing the same RAM-space while using all
    > cores
    > of the machine?
    >
    > I've looked at memcache, but it seems like it could store and retrieve
    > one of my pool's arrays, but it cannot look inside that array and
    > retrieve just a single row of it? It would want to return the whole
    > array, yes? (not good if that array is 100MB).


    If you want to stay in pure Ruby, take a look at DRb and Rinda. Even
    if not directly applicable they should give you some inspiration.


    Ellie

    Eleanor McHugh
    Games With Brains
    http://slides.games-with-brains.net
    ----
    raise ArgumentError unless @reality.responds_to? :reason
    Eleanor McHugh, Jun 30, 2008
    #4
  5. Greg Willits wrote:
    > Ruby 1.8's threading would seem poorly suited to this. Can 1.9 run
    > multiple threads each accesing the same RAM-space while using all cores
    > of the machine?


    No, but JRuby's threads can.

    - Charlie
    Charles Oliver Nutter, Jul 1, 2008
    #5
  6. Greg Willits

    ara.t.howard Guest

    On Jun 30, 2008, at 12:58 AM, Greg Willits wrote:

    > I have a pure Ruby project (no Rails) where I would like multiple
    > "tasks" (ruby processes more or less) to run in parallel (collectively
    > taking advantage of multiple CPU cores) while accessing a shared
    > memory
    > space of data structures.
    >
    > OK, that's a mouthful.
    >
    > - single machine, multiple cores (4 or 8)
    >
    > - step one: pre-load a number of arrays and hashes (could be a
    > couple GB
    > worth in total) into memory
    >
    > - step two: launch several independent Ruby scripts to search and read
    > from the data pool in order to aggregate data in new sets to be
    > written
    > to text files.
    >
    > Ruby 1.8's threading would seem poorly suited to this. Can 1.9 run
    > multiple threads each accesing the same RAM-space while using all
    > cores
    > of the machine?
    >
    > I've looked at memcache, but it seems like it could store and retrieve
    > one of my pool's arrays, but it cannot look inside that array and
    > retrieve just a single row of it? It would want to return the whole
    > array, yes? (not good if that array is 100MB).
    >
    > -- gw
    > --
    > Posted via http://www.ruby-forum.com/.
    >


    tim is right i think, mmap is a great approach. i've used the
    following paradigm many times for processing large datasets:

    mmap in the file
    decide the chunk size
    fork n processes working on each chunk

    because mmap is carried across the fork you don't do any data
    copying. actually the memory won't even be paged in until the
    children read them.

    this is really ideal if the children can write the output - in
    otherwords if the children don't have to return data to the parent
    since returning a huge chunk of data can be expensive.

    you might easily end up being IO bound and not CPU bound - in the
    similar processing i've done i've often found that the work scales
    best with the number of disk controllers, not the number of cpus -
    something worth considering

    another approach to consider is to put all the input (or pathnames to
    it) into an sqlite database and then launch processes to work on it.
    this may not seem sexy but it has some huge advantages: namely that
    you'll be able to maintain state across runs which will allow you to
    make programming errors but still be making forward progress. this
    isn't glamerous but it's very powerful as it allows incremental
    development and even coordination of ruby with other languages - like c.

    one last suggestion if you have a stack of linux machines available

    . install rq
    . submit a bunch of jobs that process a chunk of data

    go home for the day ;-)

    with rq you should be able to setup a linux cluster in a few minutes
    and just submit a slow ruby script to 10 machines running 4 jobs each
    no problem. you could also use rq on an 8 core machine to manage the
    jobs for you

    food for thought.

    ref:

    http://www.linuxjournal.com/article/7922
    http://codeforpeople.com/lib/ruby/rq/rq-3.1.0/README
    (rq 3.4.0 has a bug in it so use 3.1 if you decide to try that route)

    a @ http://codeforpeople.com/
    --
    we can deny everything, except that we have the possibility of being
    better. simply reflect on that.
    h.h. the 14th dalai lama
    ara.t.howard, Jul 1, 2008
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Shuo Xiang

    Stack space, global space, heap space

    Shuo Xiang, Jul 9, 2003, in forum: C Programming
    Replies:
    10
    Views:
    2,872
    Bryan Bullard
    Jul 11, 2003
  2. Christian Seberino
    Replies:
    21
    Views:
    1,632
    Stephen Horne
    Oct 27, 2003
  3. ian douglas
    Replies:
    2
    Views:
    972
    Randy Howard
    Jul 30, 2004
  4. Alfonso Morra
    Replies:
    3
    Views:
    435
    Joe Seigh
    Jul 20, 2005
  5. Replies:
    8
    Views:
    1,051
    Diez B. Roggisch
    Jan 4, 2010
Loading...

Share This Page