A
Ardanwen
Hi all.
I wrote a small script to count the number of occurrences of one string
in another string (including somewhat overlapping occurrences). Then I
found out about the .scan method, which speeded up some things, but
unfortunately introduced the 'banana problem' -> ("banana".scan("ana")
returns only one "ana")
(This was the script I had before i found out about scan
------
$proteasome.each {|x|
i = VIRUS #virus length
begin
i = virus[0,i + 4].rindex(x)
count_prot += 1 if i != nil
end until i == nil
}
------
proteasome is an array filled with 12 strings ("00010" , "00100" ,
"10110" etc,
virus is a long string of 1's and 0's (~3000 in total).
Initially, I changed the following, which speeded up the whole program
about 25% but suffered from the banana problem (most of the program's
energy goes into the above algorithm)
------
$proteasome.each {|x|
virus.scan(x) {
count_prot += 1
}
}
------
So I changed it into the following, and then optimized a little bit
(first did virus.scan(/(?=#{x})/ which is a little slower then what I'm
doing now)
------
$proteasome.each {|x|
virus.scan(/#{x[0].chr}(?=#{x[1,4]})/) {
count_prot += 1
}
}
------
The sad thing is, the above is comparable in speed with the script I
had before I found out about the scan method. Did I miss any obvious
optimizations in the scanning/regexp method?
Thanks!,
Boris
I wrote a small script to count the number of occurrences of one string
in another string (including somewhat overlapping occurrences). Then I
found out about the .scan method, which speeded up some things, but
unfortunately introduced the 'banana problem' -> ("banana".scan("ana")
returns only one "ana")
(This was the script I had before i found out about scan
------
$proteasome.each {|x|
i = VIRUS #virus length
begin
i = virus[0,i + 4].rindex(x)
count_prot += 1 if i != nil
end until i == nil
}
------
proteasome is an array filled with 12 strings ("00010" , "00100" ,
"10110" etc,
virus is a long string of 1's and 0's (~3000 in total).
Initially, I changed the following, which speeded up the whole program
about 25% but suffered from the banana problem (most of the program's
energy goes into the above algorithm)
------
$proteasome.each {|x|
virus.scan(x) {
count_prot += 1
}
}
------
So I changed it into the following, and then optimized a little bit
(first did virus.scan(/(?=#{x})/ which is a little slower then what I'm
doing now)
------
$proteasome.each {|x|
virus.scan(/#{x[0].chr}(?=#{x[1,4]})/) {
count_prot += 1
}
}
------
The sad thing is, the above is comparable in speed with the script I
had before I found out about the scan method. Did I miss any obvious
optimizations in the scanning/regexp method?
Thanks!,
Boris