I need a string#all_indices method--is there such a thing?

T

timr

In ruby you can use string#index as follows:
str = "some text"
str.index(/t/)
=>5

But what if I want to get all the indices for a regex in the string?
Is there an string#all_indices method?

I wrote the following, which works, but there must be a more elegant
way:

class String
def all_indices(regex)
indices = []
index = 0
while index && index < self.length #index will be nil upon first
match failure, otherwise quit loop when index is equal to string
length
index = self.index(regex, index)
if index.is_a? Numeric #avoids getting a nil into the indices
array
indices << index
index +=1
end
end
indices
end
end
p "this is a test string for the ts in the worldt".all_indices(/t/)
p "what is up with all the twitter hype".all_indices(/w/)
# >> [0, 10, 13, 16, 26, 30, 36, 45]
# >> [0, 11, 25]
 
T

timr

Scan gives you the matches, not the indices (which is what I need).
=> ["t", "t", "t"]
 
R

Robert Dober

In ruby you can use string#index as follows:
str =3D "some text"
str.index(/t/)
=3D>5

But what if I want to get all the indices for a regex in the string?
Is there an string#all_indices method?

I wrote the following, which works, but there must be a more elegant
way:

class String
=A0def all_indices(regex)
=A0indices =3D []
=A0index =3D 0
=A0 =A0while index && index < self.length #index will be nil upon first
match failure, otherwise quit loop when index is equal to string
length
=A0 =A0 =A0index =3D self.index(regex, index)
=A0 =A0 =A0if index.is_a? Numeric =A0 #avoids getting a nil into the indi= ces
array
=A0 =A0 =A0 =A0indices << index
=A0 =A0 =A0 =A0index +=3D1
=A0 =A0 =A0end
=A0 =A0end
=A0 =A0indices
=A0end
end
p "this is a test string for the ts in the worldt".all_indices(/t/)
p "what is up with all the twitter hype".all_indices(/w/)
# >> [0, 10, 13, 16, 26, 30, 36, 45]
# >> [0, 11, 25]
What about
class String
def indices rgx, idx=3D0
[].tap{ |r|
loop do
idx =3D index rgx, idx
break unless idx
r << idx
idx +=3D 1
end
}
end
end

p "baaababbabbbba".indices( /a/ )



--=20
If you tell the truth you don't have to remember anything.
 
B

Bertram Scharpf

Hi,

Am Freitag, 28. Aug 2009, 17:40:05 +0900 schrieb timr:
Scan gives you the matches, not the indices (which is what I need).
=> ["t", "t", "t"]

There's a trick to do it with String#scan:

a = []
"this is a test for scan".scan( /t/) { a.push $`.length }
a

This does not work when the matches overlap.

"banana".scan /ana/ #=> ["ana"]

Bertram
 
H

Harry Kakueki

In ruby you can use string#index as follows:
str = "some text"
str.index(/t/)
=>5

But what if I want to get all the indices for a regex in the string?
Is there an string#all_indices method?

Does this do what you want?

class String
def all_indices(reg)
tmp,idx = [],[]
(0...self.length).each{|x| tmp[x] = self[x..-1]}
tmp.each_with_index{|y,i| idx << i if y =~ /\A#{reg}/}
idx
end
end

p "this is a test string for the ts in the worldt".all_indices(/th/)
#> [0, 26, 36]


It may not be very fast for very long strings ( I didn't check).
But for strings like your example it seems OK.


Harry
 
H

Harry Kakueki

Does this do what you want?
class String
def all_indices(reg)
tmp,idx = [],[]
(0...self.length).each{|x| tmp[x] = self[x..-1]}
tmp.each_with_index{|y,i| idx << i if y =~ /\A#{reg}/}
idx
end
end

p "this is a test string for the ts in the worldt".all_indices(/th/)
#> [0, 26, 36]


Harry

Sorry, it looks like I had an unnecessary line in there.

class String
def all_indices(reg)
idx = []
(0...self.length).each{|x| idx << x if self[x..-1] =~ /\A#{reg}/}
idx
end
end

p "this is a test string for the ts in the worldt".all_indices(/th/)
#> [0, 26, 36]
p "banana".all_indices(/ana/) #> [1, 3]


Harry
 
7

7rans

In ruby you can use string#index as follows:
str =3D "some text"
str.index(/t/)
=3D>5

But what if I want to get all the indices for a regex in the string?
Is there an string#all_indices method?

Facets has:

def index_all(s, reuse=3Dfalse)
s =3D Regexp.new(Regexp.escape(s)) unless Regexp=3D=3D=3Ds
ia =3D []; i =3D 0
while (i =3D index(s,i))
ia << i
i +=3D (reuse ? 1 : $~[0].size)
end
ia
end
 
T

timr

What about
class String
  def indices rgx, idx=0
    [].tap{ |r|
      loop do
        idx = index rgx, idx
        break unless idx
        r << idx
        idx += 1
      end
    }
  end
end

[].tap?
you must have defined a tap method for array somewhere. But not in the
code you showed. Can't run the code without a definition for tap.
Thanks,
Tim
 
T

timr

In ruby you can use string#index as follows:
str = "some text"
str.index(/t/)
=>5
But what if I want to get all the indices for a regex in the string?
Is there an string#all_indices method?
I wrote the following, which works, but there must be a more elegant
way:
class String
 def all_indices(regex)
 indices = []
 index = 0
   while index && index < self.length #index will be nil upon first
match failure, otherwise quit loop when index is equal to string
length
     index = self.index(regex, index)
     if index.is_a? Numeric   #avoids getting a nil into the indices
array
       indices << index
       index +=1
     end
   end
   indices
 end
end
p "this is a test string for the ts in the worldt".all_indices(/t/)
p "what is up with all the twitter hype".all_indices(/w/)
# >> [0, 10, 13, 16, 26, 30, 36, 45]
# >> [0, 11, 25]

What about
class String
  def indices rgx, idx=0
    [].tap{ |r|
      loop do
        idx = index rgx, idx
        break unless idx
        r << idx
        idx += 1
      end
    }
  end
end

p "baaababbabbbba".indices( /a/ )

Oh, tap is new in 1.9. Sorry, I hadn't come across it before and was
in 1.8.6 so it wasn't running. Got it now.
 
T

timr

Does this do what you want?
class String
 def all_indices(reg)
   tmp,idx = [],[]
   (0...self.length).each{|x| tmp[x] = self[x..-1]}
   tmp.each_with_index{|y,i| idx << i if y =~ /\A#{reg}/}
   idx
 end
end
p "this is a test string for the ts in the worldt".all_indices(/th/)
#> [0, 26, 36]

Sorry, it looks like I had an unnecessary line in there.

class String
  def all_indices(reg)
    idx = []
    (0...self.length).each{|x| idx << x if self[x..-1] =~ /\A#{reg}/}
    idx
  end
end

p "this is a test string for the ts in the worldt".all_indices(/th/)
#> [0, 26, 36]
p "banana".all_indices(/ana/) #> [1, 3]

Harry

This works and the code is more concise than what I had, but it is a
brute force approach that test for matches from every possible
substring. That would be a bit slow.
 
J

Joel VanderWerf

Bertram said:
Hi,

Am Freitag, 28. Aug 2009, 17:40:05 +0900 schrieb timr:
Scan gives you the matches, not the indices (which is what I need).
"this is a test for scan".scan(/t/)
=> ["t", "t", "t"]

There's a trick to do it with String#scan:

a = []
"this is a test for scan".scan( /t/) { a.push $`.length }
a

This does not work when the matches overlap.

"banana".scan /ana/ #=> ["ana"]

Bertram

Same difficulty with overlap, but for variety:

class String
def all_indexes re
a=[];scan(re) {a<<$~.begin(0)};a
end
end

p "foo bar baz".all_indexes(/.../)
p "banana".all_indexes(/ana/)

__END__

Output:

[0, 3, 6]
[1]
 
J

Joel VanderWerf

Joel said:
class String
def all_indexes re
a=[];scan(re) {a<<$~.begin(0)};a
end
end

p "foo bar baz".all_indexes(/.../)
p "banana".all_indexes(/ana/)

and this variant counts overlaps:

p "banana".all_indexes(/(?=ana)/)
 
R

Robert Dober

What about
class String
=A0 def indices rgx, idx=3D0
=A0 =A0 [].tap{ |r|
=A0 =A0 =A0 loop do
=A0 =A0 =A0 =A0 idx =3D index rgx, idx
=A0 =A0 =A0 =A0 break unless idx
=A0 =A0 =A0 =A0 r << idx
=A0 =A0 =A0 =A0 idx +=3D 1
=A0 =A0 =A0 end
=A0 =A0 }
=A0 end
end

[].tap?
you must have defined a tap method for array somewhere. But not in the
code you showed. Can't run the code without a definition for tap.
Thanks,
Tim
Sorry I am an unconditional one-niner. I really should be more careful
to mark 1.9 only features with comments. At least for some more weeks
;-)



--=20
If you tell the truth you don't have to remember anything.
 
H

Harry Kakueki

class String
def all_indices(reg)
idx = []
(0...self.length).each{|x| idx << x if self[x..-1] =~ /\A#{reg}/}
idx
end
end

p "this is a test string for the ts in the worldt".all_indices(/th/)
#> [0, 26, 36]
p "banana".all_indices(/ana/) #> [1, 3]

Harry

This works and the code is more concise than what I had, but it is a
brute force approach that test for matches from every possible
substring. That would be a bit slow.

This is not fast enough?

class String
def all_indices(reg)
idx = []
(0...self.length).each{|x| idx << x if self[x..-1] =~ /\A#{reg}/}
idx
end
end

p ("this is a test string for the ts in the worldt"*1000).all_indices(/th/)

I guess you are processing some big strings.
Speed is not what you asked for.
Well, until now :)


Harry
 
D

David A. Black

Hi --

Bertram said:
Hi,

Am Freitag, 28. Aug 2009, 17:40:05 +0900 schrieb timr:
Scan gives you the matches, not the indices (which is what I need).

"this is a test for scan".scan(/t/)
=> ["t", "t", "t"]

There's a trick to do it with String#scan:

a = []
"this is a test for scan".scan( /t/) { a.push $`.length }
a

This does not work when the matches overlap.

"banana".scan /ana/ #=> ["ana"]

Bertram

Same difficulty with overlap, but for variety:

class String
def all_indexes re
a=[];scan(re) {a<<$~.begin(0)};a
end
end

Just to add to the collection: there's also $~.offset(1)[0]


David

--
David A. Black / Ruby Power and Light, LLC / http://www.rubypal.com
Ruby/Rails training, mentoring, consulting, code-review
Latest book: The Well-Grounded Rubyist (http://www.manning.com/black2)

September Ruby training in NJ has been POSTPONED. Details to follow.
 
D

David A. Black

Hi --

class String
def all_indices(reg)
idx = []
(0...self.length).each{|x| idx << x if self[x..-1] =~ /\A#{reg}/}
idx
end
end

Might as well let #select do the choosing:

def all_indices(re)
(0...size).select {|i| self[i..-1][/\A#{re}/] }
end

And maybe better to create the regex only one:

def all_indices(re)
re = /\A#{re}/
(0...size).select {|i| self[i..-1][re] }
end


David

--
David A. Black / Ruby Power and Light, LLC / http://www.rubypal.com
Ruby/Rails training, mentoring, consulting, code-review
Latest book: The Well-Grounded Rubyist (http://www.manning.com/black2)

September Ruby training in NJ has been POSTPONED. Details to follow.
 
7

7rans

In ruby you can use string#index as follows:
str =3D "some text"
str.index(/t/)
=3D>5

But what if I want to get all the indices for a regex in the string?
Is there an string#all_indices method?

A lot of solutions have been given here. It would be nice to see a
test/benchmark matrix to compare them, if anyone is up to it.
 
B

Bertram Scharpf

Hi,

Am Samstag, 29. Aug 2009, 11:38:19 +0900 schrieb 7rans:
=20
A lot of solutions have been given here. It would be nice to see a
test/benchmark matrix to compare them, if anyone is up to it.

Sure I agree. But my solution was just to show some aspect of
String#scan, not of any practical sense.

Bertram


--=20
Bertram Scharpf
Stuttgart, Deutschland/Germany
http://www.bertram-scharpf.de
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,077
Latest member
SangMoor21

Latest Threads

Top