identify and extract positions from a string - how to?

M

Marc Hoeppner

Hi,

I am not quite sure about how to approach the following problem:

I have a long (long long long) string of letters, a genomic sequence
(600k characters+).
Now, what I want to do is to extract certain parts of this string, based
on the position.
So for example lets say I want all characters from position 2340 to
5436.

A quick pointer in the right direction would be much appreciated. I have
a vague idea that it could perhaps be done with count? Like "puts string
where string.count("actg")=2340 until string.count("actg")=5436"... ?
Not sure tho, and probably there are better ways.



Cheers,

Marc
 
T

Thomas Worm

Hi,

I am not quite sure about how to approach the following problem:

I have a long (long long long) string of letters, a genomic sequence
(600k characters+).
Now, what I want to do is to extract certain parts of this string, based
on the position.
So for example lets say I want all characters from position 2340 to
5436.

What about

puts "My String"[5..7]

Thomas
 
F

F. Senault

Le 19 juillet à 12:31, Marc Hoeppner a écrit :
Hi,

I am not quite sure about how to approach the following problem:

I have a long (long long long) string of letters, a genomic sequence
(600k characters+).
Now, what I want to do is to extract certain parts of this string, based
on the position.
So for example lets say I want all characters from position 2340 to
5436.

For example :
=> "abcdefghijklmnopqrstuvwxyz"

The simplest way to do answer you question is :
=> "fghijkl"

You may want to try the other variants :
=> "fghijkl"
=> "fghijkl"
=> "fghijkl"

If you need to parse it char per char, you can use a multitude of
methods :
str[5..10].each_byte { |b| puts b.chr }
f
g
h
i
j
k
=> "fghijk"
str[5..10].split(//)
=> ["f", "g", "h", "i", "j", "k"]
str[5..10].split(//).each { |c| puts c }
f
g
h
i
j
k
=> ["f", "g", "h", "i", "j", "k"]

Etc.

I didn't try with very long strings, now, but I don't see why the ranges
methods of access wouldn't be acceptable. (Of course, the regular
expression will be slower.)

Fred
 
M

Marc Hoeppner

Thanks a lot, dont know how I missed that in the string chapter.

Anyhow, another thing came up:

while string[1..10] is pretty much what I was looking for - is there any
way that I can substitute the numbers (or the whole content of the
square brackets for that matter) with variables?

As it is now I have a file that contains coordinates and a second file
that contains the string that I want to extract from.

So ideally the script would read each line of the coordinate file

45..78
90..120
etc

and uses it in the extraction method

file.readlines each do |l|
puts string[l]
end

Doesnt work tho -any suggestions on how to pipe each line of the
coordinate file to the string method? I know I know, probably simple,
but I am still learning ;)

Cheers,

Marc
 
T

Thomas Worm

As it is now I have a file that contains coordinates and a second file
that contains the string that I want to extract from.

So ideally the script would read each line of the coordinate file

45..78
90..120
etc

Those ..-things are called ranges, which, what wonder, are a class in
ruby. Have a look at http://corelib.rubyonrails.org/ for the class Range.

another way to express str[45..78] is str[45,78] or str.slice(45,78) or
str.slice(45..78), where the numbers can be replaced by variables:
str[fr..to], str[fr,to], str.slice[fr,to], str.slice(fr..to)

This information can be found at the same webpage, just look for the
class String ;-)

and uses it in the extraction method

file.readlines each do |l|
puts string[l]
end

Doesnt work tho -any suggestions on how to pipe each line of the
coordinate file to the string method? I know I know, probably simple,
but I am still learning ;)

l is a String-object, not a Range-object.

file.readlines each do |l|
fr, to = l.split(/\.\./)
puts string[fr,to]
end

should do the job.

The thingy with the slashes in the split-method is a regular expression.

Regards
Thomas
 
F

F. Senault

Le 19 juillet à 13:16, Marc Hoeppner a écrit :
and uses it in the extraction method

file.readlines each do |l|
puts string[l]
end

The others solutions in the thread are the ones to use, but I feel the
need to suggest the very dirty / insecure / bad one :

File(filepath).readlines.each do |l|
puts string[eval(l)]
end

Don't try this at home, etc... :)

(But, in a controlled environment, it may be useful since it allows for
all the variations that can be evaluated in one line of ruby code...)

Fred
 
T

Thomas Worm

Le 19 juillet à 13:54, Thomas Worm a écrit :
Those ..-things are called ranges, which, what wonder, are a class in
ruby. Have a look at http://corelib.rubyonrails.org/ for the class
Range.

another way to express str[45..78] is str[45,78]

Nope :
str[45..78].length => 34
str[45,78].length
=> 78

(IOW start_position..end_position versus start_position,length.)

I guess you are right. I misintepreted the documentation, which says in a
number of examples:

a = "hello there"
a[1,3] #=> "ell"
a[1..3] #=> "ell"

I should have taken the time to read the text instead.

Thomas
 
R

Robert Klemme

2007/7/19 said:
Le 19 juillet =E0 13:16, Marc Hoeppner a =E9crit :
and uses it in the extraction method

file.readlines each do |l|
puts string[l]
end

The others solutions in the thread are the ones to use, but I feel the
need to suggest the very dirty / insecure / bad one :

File(filepath).readlines.each do |l|
puts string[eval(l)]
end

Don't try this at home, etc... :)

(But, in a controlled environment, it may be useful since it allows for
all the variations that can be evaluated in one line of ruby code...)

A safer variant:

file.each do |line|
if /^(\d+)\.\.(\d+)$/ =3D~ line
puts string[ $1.to_i .. $2.to_i ]
end
end

Note, that file.each is more efficient than file.readlines.each
because it does not need to read the whole file into memory.

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top