A Real World example for Ruby to "compiled" version discussion

Discussion in 'Ruby' started by Philip Rhoades, Oct 6, 2010.

  1. People,

    The "how can we make a ruby compiler" thread has been very interesting -
    I really like hearing from serious and competent programmers about the
    theoretical problems involved with this issue.

    William Rutiser asked for an expansion on the details of my C/C++
    population genetics simulation program as a specific example of how one
    might proceed depending on a particular situation. I am happy to
    elaborate - not least because it would be good to get input from
    experienced Ruby programmers before I just try to replicate the same
    program in Ruby - I'm sure there will be more sensible/efficient ways of
    doing things than what I would attempt first off and so comparisons
    between my C/C++ version and a dodgy Ruby program might be even more
    unfair . .

    I will summarise the C/C++ program as it exists now (it has gone through
    a number of versions and has a lot of code that does not need
    replicating for the present comparison) with a general overview (I can
    add more detail later if people are interested):

    - A population is represented by a number of sub-populations which
    occupy cells of a two dimensional array

    - each cell has pointers to three ordered lists - representing the
    parental population, the offspring population and a temporary population
    and each element of a list represents an individual

    - at each generation (of potentially hundreds), the whole array is
    iterated through, new offspring are produced, migrants move to adjacent
    cells, parents die off etc

    As well as this main simulation program, I have already replaced all the
    original shell scripts and some of the statistical processing with Ruby
    scripts but the main simulation program is where I couldn't afford an
    order or two increase in running times by rewriting in Ruby.

    The main problem I see with some of the Ruby conversions that I have
    looked at (eg RubyInline) is that the performance problem comes in in
    repeating the WHOLE simulation with different starting parameters which
    is done thousands of times - so it is not like you have a single
    recursive algorithm which is a bottleneck that can be optimised or
    rewritten in C or something. There are lots of little steps that happen
    millions of times . .

    I hope that is sort of clear?

    Regards,

    Phil.
    --
    Philip Rhoades

    GPO Box 3411
    Sydney NSW 2001
    Australia
    E-mail:
     
    Philip Rhoades, Oct 6, 2010
    #1
    1. Advertising

  2. Philip Rhoades

    Ryan Davis Guest

    On Oct 6, 2010, at 09:55 , Philip Rhoades wrote:

    > The main problem I see with some of the Ruby conversions that I have =

    looked at (eg RubyInline) is that the performance problem comes in in =
    repeating the WHOLE simulation with different starting parameters which =
    is done thousands of times - so it is not like you have a single =
    recursive algorithm which is a bottleneck that can be optimised or =
    rewritten in C or something. There are lots of little steps that happen =
    millions of times . .

    Could you expand on this??
     
    Ryan Davis, Oct 7, 2010
    #2
    1. Advertising

  3. On Thu, Oct 7, 2010 at 2:57 AM, Ryan Davis <> wrote=
    :
    >
    > On Oct 6, 2010, at 09:55 , Philip Rhoades wrote:
    >
    >> The main problem I see with some of the Ruby conversions that I have loo=

    ked at (eg RubyInline) is that the performance problem comes in in repeatin=
    g the WHOLE simulation with different starting parameters which is done tho=
    usands of times - so it is not like you have a single recursive algorithm w=
    hich is a bottleneck that can be optimised or rewritten in C or something. =
    =A0There are lots of little steps that happen millions of times . .
    >
    > Could you expand on this??


    Specifically

    - What properties (conceptually) does an individual / a population have?

    - What calculations are done for each individual / population?

    - What kind of iteration is applied?

    Kind regards

    robert

    --=20
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Oct 7, 2010
    #3
  4. Philip Rhoades wrote:
    > People,
    >
    > The "how can we make a ruby compiler" thread has been very interesting
    > - I really like hearing from serious and competent programmers about
    > the theoretical problems involved with this issue.
    >
    > William Rutiser asked for an expansion on the details of my C/C++
    > population genetics simulation program as a specific example of how
    > one might proceed depending on a particular situation. I am happy to
    > elaborate - not least because it would be good to get input from
    > experienced Ruby programmers before I just try to replicate the same
    > program in Ruby - I'm sure there will be more sensible/efficient ways
    > of doing things than what I would attempt first off and so comparisons
    > between my C/C++ version and a dodgy Ruby program might be even more
    > unfair . .
    >
    > I will summarise the C/C++ program as it exists now (it has gone
    > through a number of versions and has a lot of code that does not need
    > replicating for the present comparison) with a general overview (I can
    > add more detail later if people are interested):
    >
    > - A population is represented by a number of sub-populations which
    > occupy cells of a two dimensional array
    >
    > - each cell has pointers to three ordered lists - representing the
    > parental population, the offspring population and a temporary
    > population and each element of a list represents an individual
    >
    > - at each generation (of potentially hundreds), the whole array is
    > iterated through, new offspring are produced, migrants move to
    > adjacent cells, parents die off etc
    >
    > As well as this main simulation program, I have already replaced all
    > the original shell scripts and some of the statistical processing with
    > Ruby scripts but the main simulation program is where I couldn't
    > afford an order or two increase in running times by rewriting in Ruby.
    >
    > The main problem I see with some of the Ruby conversions that I have
    > looked at (eg RubyInline) is that the performance problem comes in in
    > repeating the WHOLE simulation with different starting parameters
    > which is done thousands of times - so it is not like you have a single
    > recursive algorithm which is a bottleneck that can be optimised or
    > rewritten in C or something. There are lots of little steps that
    > happen millions of times . .
    >
    > I hope that is sort of clear?
    >
    > Regards,
    >
    > Phil.

    Some questions with my guesses at possible answers:

    How much data is carried for each individual? a few alleles? a lot? the
    whole human genome?

    What is done in the innermost loop? I imagine you select a pair of
    individuals, cross them, etc to produce more individuals. Or perhaps an
    individual dies or migrated to another population.

    What do you do to run a new experiment? generate the initial population,
    specify the rules for combination, depth, migration, etc? gather data
    and process it at the end?

    How much code do you need to write or rewrite for an experiment? How
    many parameters are involved?

    Can you give some examples of aspects of the C++ program that you hope
    to improve by moving the sim to Ruby?

    Can you say something about the context of your modeling? How is your
    simulator different from others used in the field?

    -- Bill
     
    William Rutiser, Oct 7, 2010
    #4
  5. People,

    I have accumulated the questions from different people into this response:


    On 2010-10-07 23:12, Robert Klemme wrote:
    > On Thu, Oct 7, 2010 at 2:57 AM, Ryan Davis<>
    > wrote:
    >>
    >> On Oct 6, 2010, at 09:55 , Philip Rhoades wrote:
    >>
    >>> The main problem I see with some of the Ruby conversions that I
    >>> have looked at (eg RubyInline) is that the performance problem
    >>> comes in in repeating the WHOLE simulation with different
    >>> starting parameters which is done thousands of times - so it is
    >>> not like you have a single recursive algorithm which is a
    >>> bottleneck that can be optimised or rewritten in C or something.
    >>> There are lots of little steps that happen millions of times . .

    >>
    >> Could you expand on this??

    >
    > Specifically
    >
    > - What properties (conceptually) does an individual / a population
    > have?



    This is an Individual Based Model (IBM) as opposed to populations that
    can be modelled as a whole with equations etc - so the simulation
    reflects this. In the C/C++ version an individual organism is an object
    with a number of attributes: ID, parent ID(s), (microsatellite) allele
    repeat length, randomizer int etc


    > - What calculations are done for each individual / population?



    The genetics of the population changes as the individuals change because
    of mutations which affect repeat length of microsatellites ie they are
    not selected against/for (neutral mutations).


    > - What kind of iteration is applied?



    For this particular discussion:

    for k in 1..K # generations
    for x in 1..X # array width
    for y in 1..Y # array height
    destroy parent ordered list
    destroy tmp ordered list
    offspring list becomes parent list
    for i in 1..I # organism in parent ordered list
    asexually reproduce offspring with possible mutation
    possibly migrate offspring to neighbouring array cell
    end
    end
    end
    end

    data is written to disk after each generation and genetics statistics
    are done later.


    > How much data is carried for each individual? a few alleles? a lot?
    > the whole human genome?



    For this exercise, one loci and allele (asexual reproduction) - multiple
    loci can be modelled by multiple runs.


    > What is done in the innermost loop? I imagine you select a pair of
    > individuals, cross them, etc to produce more individuals. Or perhaps
    > an individual dies or migrated to another population.



    Asexual reproduction in this case and see above.


    > What do you do to run a new experiment? generate the initial
    > population, specify the rules for combination, depth, migration, etc?
    > gather data and process it at the end?



    A script calls the simulation program hundreds of times with different
    starting populations. These starting populations are the output of the
    same simulation program which has previously been run, starting with
    different random number seeds, until they reach "quasi-equilibrium"
    (typically 500 generations). Also see above.


    > How much code do you need to write or rewrite for an experiment? How
    > many parameters are involved?



    The code doesn't change - a configuration file and the starting
    (initial) generation can be changed.


    > Can you give some examples of aspects of the C++ program that you
    > hope to improve by moving the sim to Ruby?



    Not really - it was a while ago now that I did the actual coding (I
    don't need to touch the code much these days) and I just remember it was
    a major pain trying to learn about the eccentricities of OOP with C++ -
    I'm sure it would have been a much more pleasant experience doing it in
    Ruby . .

    I actually don't need to change to Ruby now - I just need to write a
    thesis . . What would be nice would be to further develop the program,
    make it more generalised, more friendly for other users, be able to
    respond to requests for enhancements/changes more quickly etc.


    > Can you say something about the context of your modeling? How is your
    > simulator different from others used in the field?



    There are a couple of particular issues we wanted to investigate and my
    supervisor thought that the existing stuff that was around then (about
    nine years ago now) didn't conveniently allow us to do what we wanted -
    maybe it is has changed now but I haven't had much exposure to other
    programs. Also, a lot of them then were written in some sort of
    graphical environment for Windows or Macs and I wanted to have a CLI
    interface on Linux and have more direct control over what was happening . .

    Thanks,

    Phil.
    --
    Philip Rhoades

    GPO Box 3411
    Sydney NSW 2001
    Australia
    E-mail:
     
    Philip Rhoades, Oct 8, 2010
    #5
  6. On Fri, Oct 8, 2010 at 10:57 AM, Phillip Gawlowski
    <> wrote:
    > This sounds like something JRuby would be good at it, since it can JIT
    > the loops, and the Java VM reaches nearly-native speeds these days.
    >
    > So, yes, improved technology could make this faster these days.


    Re JRuby: Eventually, yes...the current compiler is largely unchanged
    from JRuby 1.1, however, and so we're not taking as good advantage of
    the JVM as you might hope. Lots of opportunity to improve it, and
    obviously we'd like these loops to be as fast as Java loops if
    possible...

    Soonish, perhaps! Help us fix bugs so I can spend more time on perf :)

    - Charlie
     
    Charles Oliver Nutter, Oct 22, 2010
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. marcus
    Replies:
    0
    Views:
    374
    marcus
    Jun 6, 2004
  2. V Green
    Replies:
    0
    Views:
    931
    V Green
    Feb 5, 2008
  3. PA Bear [MS MVP]
    Replies:
    0
    Views:
    1,043
    PA Bear [MS MVP]
    Feb 5, 2008
  4. George Moschovitis

    Nitro Real World Example

    George Moschovitis, Dec 21, 2006, in forum: Ruby
    Replies:
    1
    Views:
    105
    Uma Geller
    Dec 21, 2006
  5. mdev
    Replies:
    0
    Views:
    1,845
Loading...

Share This Page