Describing degerate dna strings

Discussion in 'Ruby' started by George George, Jan 16, 2009.

  1. I am working with strings of 4 letter alphabet a,c,t,g that describe
    biological dna sequences. sometimes a sequence can be described as
    ac[ta]cct meaning that at position 3 you are can have 't 'or an 'a'
    without changing the biological function of the sequence.

    Given ac[ta]cct as input i would like to generate a set of strings such
    that it gives me the various combination of the strings that can
    represent the above degenerate sequence e.g
    1. actcct
    2. acacct

    both satisfy the above degeneracy.

    any ideas?
    thank you
    --
    Posted via http://www.ruby-forum.com/.
    George George, Jan 16, 2009
    #1
    1. Advertising

  2. > any ideas?

    Here's a simple recursive expansion, with a block callback for each
    sequence found.

    def expand_seq(src, &blk)
    if src =~ /\A(.*?)\[(.*?)\](.*)\z/m
    prefix, chars, suffix = $1, $2, $3
    chars.split(//).each do |ch|
    expand_seq(prefix + ch + suffix, &blk)
    end
    else
    yield src
    end
    end

    expand_seq "ac[ta]cct[gt]c" do |seq|
    puts seq
    end
    --
    Posted via http://www.ruby-forum.com/.
    Brian Candler, Jan 16, 2009
    #2
    1. Advertising

  3. Thank you!


    Brian Candler wrote:
    >> any ideas?

    >
    > Here's a simple recursive expansion, with a block callback for each
    > sequence found.
    >
    > def expand_seq(src, &blk)
    > if src =~ /\A(.*?)\[(.*?)\](.*)\z/m
    > prefix, chars, suffix = $1, $2, $3
    > chars.split(//).each do |ch|
    > expand_seq(prefix + ch + suffix, &blk)
    > end
    > else
    > yield src
    > end
    > end
    >
    > expand_seq "ac[ta]cct[gt]c" do |seq|
    > puts seq
    > end


    --
    Posted via http://www.ruby-forum.com/.
    George George, Jan 16, 2009
    #3
  4. On Fri, Jan 16, 2009 at 7:54 AM, George George
    <> wrote:
    > I am working with strings of 4 letter alphabet a,c,t,g that describe
    > biological dna sequences. sometimes a sequence can be described as
    > ac[ta]cct meaning that at position 3 you are can have 't 'or an 'a'
    > without changing the biological function of the sequence.
    >
    > Given ac[ta]cct as input i would like to generate a set of strings such
    > that it gives me the various combination of the strings that can
    > represent the above degenerate sequence e.g
    > 1. actcct
    > 2. acacct
    >
    > both satisfy the above degeneracy.
    >
    > any ideas?


    Hi, this reminded me so much of a Ruby Quiz I solved that I wanted to
    mention it :)

    http://rubyquiz.com/quiz143.html
    http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/274375 (my solution)

    This code generates all strings that match a regexp. So we are left
    with the task of converting your strings to regexps:

    irb(main):010:0> require 'quiz143'
    => true
    irb(main):011:0> def expand a
    irb(main):012:1> re = Regexp.new(a.gsub(/\[(.*?)\]/) {|m|
    "(#{$1.split(//).join("|")})"})
    irb(main):013:1> re.generate
    irb(main):014:1> end
    => nil
    irb(main):015:0> expand "ac[ta]cct"
    => ["actcct", "acacct"]

    It's probably overkill for your needs.

    Jesus.
    Jesús Gabriel y Galán, Jan 16, 2009
    #4

  5. > => nil
    > irb(main):015:0> expand "ac[ta]cct"
    > => ["actcct", "acacct"]
    >
    > It's probably overkill for your needs.
    >
    > Jesus.


    hi Jesus!
    Thank you for referencing me to that quiz, its nice to study the code.
    That exactly solves one of the problems that i had while looking for dna
    motifs which are represented as regular expressions, but need to be
    expanded if you gonna use them as possible dna primers. and then such
    back and see which one gives the best predictive value ... blah blah ...
    Sorry for the bio talk :)

    Thank you so much!!

    GG
    --
    Posted via http://www.ruby-forum.com/.
    George George, Jan 16, 2009
    #5
  6. On Fri, Jan 16, 2009 at 2:56 PM, George George
    <> wrote:
    >
    >> => nil
    >> irb(main):015:0> expand "ac[ta]cct"
    >> => ["actcct", "acacct"]
    >>
    >> It's probably overkill for your needs.
    >>
    >> Jesus.

    >
    > hi Jesus!
    > Thank you for referencing me to that quiz, its nice to study the code.
    > That exactly solves one of the problems that i had while looking for dna
    > motifs which are represented as regular expressions, but need to be
    > expanded if you gonna use them as possible dna primers. and then such
    > back and see which one gives the best predictive value ... blah blah ...
    > Sorry for the bio talk :)


    You are welcome. Just a comment on the above: I have realized that if
    each position of the sequence is just one character, then your
    original string is already a valid regexp for the problem, so no need
    to change [ta] to (t|a) as I was doing, cause [ta] is a character
    class with those two possibilities and those work too:

    irb(main):001:0> require 'quiz143'
    => true
    irb(main):002:0> /#{"ac[ta]cc"}/.generate
    => ["actcc", "acacc"]

    :)

    Jesus.
    Jesús Gabriel y Galán, Jan 16, 2009
    #6
  7. On Jan 16, 2009, at 9:10 AM, Jes=FAs Gabriel y Gal=E1n wrote:
    > On Fri, Jan 16, 2009 at 2:56 PM, George George
    > <> wrote:
    >>>
    >>> =3D> nil
    >>> irb(main):015:0> expand "ac[ta]cct"
    >>> =3D> ["actcct", "acacct"]
    >>>
    >>> It's probably overkill for your needs.
    >>>
    >>> Jesus.

    >>
    >> hi Jesus!
    >> Thank you for referencing me to that quiz, its nice to study the =20
    >> code.
    >> That exactly solves one of the problems that i had while looking =20
    >> for dna
    >> motifs which are represented as regular expressions, but need to be
    >> expanded if you gonna use them as possible dna primers. and then such
    >> back and see which one gives the best predictive value ... blah =20
    >> blah ...
    >> Sorry for the bio talk :)

    >
    > You are welcome. Just a comment on the above: I have realized that if
    > each position of the sequence is just one character, then your
    > original string is already a valid regexp for the problem, so no need
    > to change [ta] to (t|a) as I was doing, cause [ta] is a character
    > class with those two possibilities and those work too:
    >
    > irb(main):001:0> require 'quiz143'
    > =3D> true
    > irb(main):002:0> /#{"ac[ta]cc"}/.generate
    > =3D> ["actcc", "acacc"]


    No need to do the string interpolation there:
    /ac[ta]cc/.generate
    Or if you have that in a string:
    x=3D"ac[ta]cc"
    Regexp.new(x).generate

    > :)
    >
    > Jesus.



    -Rob

    Rob Biedenharn http://agileconsultingllc.com
    Rob Biedenharn, Jan 16, 2009
    #7
  8. On Fri, Jan 16, 2009 at 5:51 PM, Rob Biedenharn
    <> wrote:
    > On Jan 16, 2009, at 9:10 AM, Jes=FAs Gabriel y Gal=E1n wrote:


    >> irb(main):002:0> /#{"ac[ta]cc"}/.generate
    >> =3D> ["actcc", "acacc"]

    >
    > No need to do the string interpolation there:
    > /ac[ta]cc/.generate
    > Or if you have that in a string:
    > x=3D"ac[ta]cc"
    > Regexp.new(x).generate


    Good catch !!
    Thanks.

    Jesus.
    Jesús Gabriel y Galán, Jan 16, 2009
    #8
  9. Thank you so much for all the replies. Here is a simple benchmark for
    Brian and Jesus approaches. I Run it on ubuntu 8.04, 1GB RAM, 2 CPUs
    3.40GHz.

    .....
    ...
    require 'benchmark'
    Benchmark.bm do |bm|

    bm.report("Brian:") do
    expand_seq "t[ac][tc]aaattaag[ga]gaag[ac]ttggtgga" do |seq|
    #puts seq
    end
    end

    bm.report("Jesus:") do
    /t[ac][tc]aaattaag[ga]gaag[ac]ttggtgga/.generate
    end
    end

    ser system total real
    Brian: 0.000000 0.000000 0.000000 ( 0.000642)
    Jesus: 0.000000 0.000000 0.000000 ( 0.003574)
    --
    Posted via http://www.ruby-forum.com/.
    George George, Jan 17, 2009
    #9
  10. On 17.01.2009 09:10, George George wrote:
    > Thank you so much for all the replies. Here is a simple benchmark for
    > Brian and Jesus approaches. I Run it on ubuntu 8.04, 1GB RAM, 2 CPUs
    > 3.40GHz.
    >
    > ....
    > ..
    > require 'benchmark'
    > Benchmark.bm do |bm|
    >
    > bm.report("Brian:") do
    > expand_seq "t[ac][tc]aaattaag[ga]gaag[ac]ttggtgga" do |seq|
    > #puts seq
    > end
    > end
    >
    > bm.report("Jesus:") do
    > /t[ac][tc]aaattaag[ga]gaag[ac]ttggtgga/.generate
    > end
    > end
    >
    > ser system total real
    > Brian: 0.000000 0.000000 0.000000 ( 0.000642)
    > Jesus: 0.000000 0.000000 0.000000 ( 0.003574)


    You probably need to execute each variant in a loop multiple times to
    get meaningful results.

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    Robert Klemme, Jan 17, 2009
    #10

  11. > You probably need to execute each variant in a loop multiple times to
    > get meaningful results.
    >
    > Kind regards
    >
    > robert


    Thanks robert here are the results ran 100000 times for each approach

    require 'benchmark'

    iterations = 100000
    Benchmark.bm do |bm|

    bm.report("Brian:") do

    iterations.times do
    expand_seq "t[ac][tc]aaattaag[ga]gaag[ac]ttggtgga" do |seq|
    # puts seq
    end
    end
    end

    bm.report("Jesus:") do
    iterations.times do
    /t[ac][tc]aaattaag[ga]gaag[ac]ttggtgga/.generate
    end
    end
    end

    user system total real
    Brian: 36.500000 2.080000 38.580000 ( 38.738666)
    Jesus: 217.180000 30.710000 247.890000 (248.848401)



    --
    Posted via http://www.ruby-forum.com/.
    George George, Jan 17, 2009
    #11
  12. On Sat, Jan 17, 2009 at 2:13 PM, George George
    <> wrote:

    > Thanks robert here are the results ran 100000 times for each approach
    >
    > user system total real
    > Brian: 36.500000 2.080000 38.580000 ( 38.738666)
    > Jesus: 217.180000 30.710000 247.890000 (248.848401)


    It shows that a specialized solution could be more streamlined :).
    Anyway, my solution was never optimized for performance. Could be an
    interesting project...

    Jesus.
    Jesús Gabriel y Galán, Jan 17, 2009
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    390
  2. Gundala Viswanath
    Replies:
    1
    Views:
    607
    Gert-Jan de Vos
    Jan 17, 2009
  3. Replies:
    5
    Views:
    352
    Paul McGuire
    Mar 20, 2009
  4. cyber science

    Cloning PCR DNA

    cyber science, Sep 11, 2009, in forum: Python
    Replies:
    0
    Views:
    260
    cyber science
    Sep 11, 2009
  5. Bruno Beam

    Bill Gates' dna is inside every Windows copy !!!!

    Bruno Beam, Dec 14, 2004, in forum: ASP .Net Web Controls
    Replies:
    0
    Views:
    101
    Bruno Beam
    Dec 14, 2004
Loading...

Share This Page