Index of multiple similar strings

M

Milo Thurston

I'm trying to read through a file like this:
http://www.genomics.ceh.ac.uk/~milo/example.html
In order to count the number of N tracts and locate their
positions. My code goes like this:

dust_seq = # file in url above
nums = 0
d.dust_seq.scan(/[N]+/) do |blah|
nums += 1
puts "Index #{d.dust_seq.index(blah.to_s)}"
done
puts "Num of Ns: #{nums}"

In the example, the index for the third of the N groups
is reported as the same as the first, as it's small enough
to fit within it. Is there any way around this?
Thanks.
 
R

Robert Klemme

Milo Thurston said:
I'm trying to read through a file like this:
http://www.genomics.ceh.ac.uk/~milo/example.html
In order to count the number of N tracts and locate their
positions. My code goes like this:

dust_seq = # file in url above
nums = 0
d.dust_seq.scan(/[N]+/) do |blah|
nums += 1
puts "Index #{d.dust_seq.index(blah.to_s)}"
done
puts "Num of Ns: #{nums}"

In the example, the index for the third of the N groups
is reported as the same as the first, as it's small enough
to fit within it. Is there any way around this?
Thanks.

dust_seq = # file in url above
nums = 0
dust_seq.scan(/N+/) do |blah|
nums += 1
puts "Index #{$`.length}"
end
puts "Num of Ns: #{nums}"

Kind regards

robert
 
C

Carlos

I'm trying to read through a file like this:
http://www.genomics.ceh.ac.uk/~milo/example.html
In order to count the number of N tracts and locate their
positions. My code goes like this:

dust_seq = # file in url above
nums = 0
d.dust_seq.scan(/[N]+/) do |blah|
nums += 1
puts "Index #{d.dust_seq.index(blah.to_s)}"
done
puts "Num of Ns: #{nums}"

In the example, the index for the third of the N groups
is reported as the same as the first, as it's small enough
to fit within it. Is there any way around this?

(not tested):

nums = 0
idx = 0

while idx = dust_seq.index /N+/, idx
nums += 1
puts "Index #{idx}"
idx = Regexp.last_match.end(0)+1
end
puts "Num of Ns: #{nums}"
 
M

Milo Thurston

Carlos said:
while idx = dust_seq.index /N+/, idx

Thanks - the interpreter didn't like this line, though.
However, I got it working and it seems better than the $`
method, which caused some nasty memory hogging problems
(I now regret not compiling in an kernel OOM killer...).
 
R

Robert Klemme

Milo Thurston said:
Thanks - the interpreter didn't like this line, though.
However, I got it working and it seems better than the $`
method, which caused some nasty memory hogging problems
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
??? Care to explain?

robert
 
T

ts

R> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
R> ??? Care to explain?

You create a String object for each call, you don't have this problem with

$~.begin(0)


Guy Decoux
 
R

Robert Klemme

ts said:
R> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
R> ??? Care to explain?

You create a String object for each call, you don't have this problem with

$~.begin(0)

True. I thought of $~ also, but oversaw this aspect - "$`.length" just
looked cuter. :) Thx.

robert
 
M

Milo Thurston

ts said:
R> ??? Care to explain?
You create a String object for each call, you don't have this problem with
$~.begin(0)

That would explain it. Some of the strings I'm looking at are several MB
in size. I've been writing out the data to disk and flushing stdout, but
$` seemed to leave each complete sequence in memory, causing it to run
out rather rapidly.
 
R

Robert Klemme

Milo Thurston said:
That would explain it. Some of the strings I'm looking at are several MB
in size. I've been writing out the data to disk and flushing stdout, but
$` seemed to leave each complete sequence in memory, causing it to run
out rather rapidly.

Yes, that's the reason. I haven't though about this, but as you can see
each reference to $` creates a new string instance:

15:55:29 [robert]: ruby -e '"f".scan(/./) { 5.times{ puts $`.id } }'
134690392
134690368
134690344
134690320
134690296
15:55:58 [robert]:

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,539
Members
45,024
Latest member
ARDU_PROgrammER

Latest Threads

Top