Index of multiple similar strings

Milo Thurston · Oct 7, 2004

I'm trying to read through a file like this:
http://www.genomics.ceh.ac.uk/~milo/example.html
In order to count the number of N tracts and locate their
positions. My code goes like this:

dust_seq = # file in url above
nums = 0
d.dust_seq.scan(/[N]+/) do |blah|
nums += 1
puts "Index #{d.dust_seq.index(blah.to_s)}"
done
puts "Num of Ns: #{nums}"

In the example, the index for the third of the N groups
is reported as the same as the first, as it's small enough
to fit within it. Is there any way around this?
Thanks.

Robert Klemme · Oct 7, 2004

Milo Thurston said:
I'm trying to read through a file like this:
http://www.genomics.ceh.ac.uk/~milo/example.html
In order to count the number of N tracts and locate their
positions. My code goes like this:

dust_seq = # file in url above
nums = 0
d.dust_seq.scan(/[N]+/) do |blah|
nums += 1
puts "Index #{d.dust_seq.index(blah.to_s)}"
done
puts "Num of Ns: #{nums}"

In the example, the index for the third of the N groups
is reported as the same as the first, as it's small enough
to fit within it. Is there any way around this?
Thanks.

dust_seq = # file in url above
nums = 0
dust_seq.scan(/N+/) do |blah|
nums += 1
puts "Index #{$`.length}"
end
puts "Num of Ns: #{nums}"

Kind regards

robert

Milo Thurston · Oct 7, 2004

Robert Klemme said:
puts "Index #{$`.length}"

Excellent, thanks.
In which book/manual is $` described? I've not seen it before.

Carlos · Oct 7, 2004

I'm trying to read through a file like this:
http://www.genomics.ceh.ac.uk/~milo/example.html
In order to count the number of N tracts and locate their
positions. My code goes like this:

dust_seq = # file in url above
nums = 0
d.dust_seq.scan(/[N]+/) do |blah|
nums += 1
puts "Index #{d.dust_seq.index(blah.to_s)}"
done
puts "Num of Ns: #{nums}"

In the example, the index for the third of the N groups
is reported as the same as the first, as it's small enough
to fit within it. Is there any way around this?

(not tested):

nums = 0
idx = 0

while idx = dust_seq.index /N+/, idx
nums += 1
puts "Index #{idx}"
idx = Regexp.last_match.end(0)+1
end
puts "Num of Ns: #{nums}"

Milo Thurston · Oct 7, 2004

Carlos said:
while idx = dust_seq.index /N+/, idx

Thanks - the interpreter didn't like this line, though.
However, I got it working and it seems better than the $`
method, which caused some nasty memory hogging problems
(I now regret not compiling in an kernel OOM killer...).

Robert Klemme · Oct 7, 2004

Milo Thurston said:
Excellent, thanks.
In which book/manual is $` described? I've not seen it before.

It's in the Pickaxe (both versions) although not in the online version of
the first edition AFAIK. You can find about the other way in the Regexp
doc:
http://www.ruby-doc.org/docs/ProgrammingRuby/html/ref_c_regexp.html
http://www.ruby-doc.org/core/classes/Regexp.html

Kind regards

robert

Robert Klemme · Oct 7, 2004

Milo Thurston said:
Thanks - the interpreter didn't like this line, though.
However, I got it working and it seems better than the $`
method, which caused some nasty memory hogging problems

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
??? Care to explain?

robert

Carlos · Oct 7, 2004

Thanks - the interpreter didn't like this line, though.

You are right, it should have parens:

while idx = dust_seq.index(/N+/, idx)

Strange...

ts · Oct 7, 2004

R> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
R> ??? Care to explain?

You create a String object for each call, you don't have this problem with

$~.begin(0)

Guy Decoux

Robert Klemme · Oct 7, 2004

ts said:
R> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
R> ??? Care to explain?

You create a String object for each call, you don't have this problem with

$~.begin(0)

True. I thought of $~ also, but oversaw this aspect - "$`.length" just
looked cuter.

Thx.

robert

Milo Thurston · Oct 7, 2004

ts said:
R> ??? Care to explain?
You create a String object for each call, you don't have this problem with
$~.begin(0)

That would explain it. Some of the strings I'm looking at are several MB
in size. I've been writing out the data to disk and flushing stdout, but
$` seemed to leave each complete sequence in memory, causing it to run
out rather rapidly.

Robert Klemme · Oct 7, 2004

Milo Thurston said:
That would explain it. Some of the strings I'm looking at are several MB
in size. I've been writing out the data to disk and flushing stdout, but
$` seemed to leave each complete sequence in memory, causing it to run
out rather rapidly.

Yes, that's the reason. I haven't though about this, but as you can see
each reference to $` creates a new string instance:

15:55:29 [robert]: ruby -e '"f".scan(/./) { 5.times{ puts $`.id } }'
134690392
134690368
134690344
134690320
134690296
15:55:58 [robert]:

Kind regards

robert

Command Line Arguments	0	Mar 7, 2023
indexerror: list index out of range??	8	Jun 29, 2013
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
pdf index builder	2	Dec 5, 2011
qsort returning index	2	Oct 25, 2011
Error with an array of strings	9	Aug 28, 2009
sorting Array of accentuated Strings	8	Dec 6, 2007
Index position of array item by string match?	14	Nov 28, 2011

Index of multiple similar strings

Milo Thurston

Robert Klemme

Milo Thurston

Carlos

Milo Thurston

Robert Klemme

Robert Klemme

Carlos

ts

Robert Klemme

Milo Thurston

Robert Klemme

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads