No way of looking for a regrexp match starting from a particularpoint in a string?

Kenneth McDonald · Jun 3, 2007

I'm probably just missing something obvious, but I haven't found a way
to match a regular expression against only part of a string, in
particular only past a certain point of a string, as a way of finding
successive matches. Of course, one could do a match against a string,
take the substring past that match and do a match against the substring,
and so on, to find all of the matches for the string, but that could be
very expensive for very large strings.

I'm aware of the String.scan method, but that doesn't work for me
because it doesn't return MatchData instances.

What I want is just something like regexp.match(string, n), where the
regexp starts looking for a match at or after position n in the string.

Thanks,
Ken

Nobuyoshi Nakada · Jun 3, 2007

Hi,

At Sun, 3 Jun 2007 12:59:24 +0900,
Kenneth McDonald wrote in [ruby-talk:254054]:

What I want is just something like regexp.match(string, n), where the
regexp starts looking for a match at or after position n in the string.

string.index(regexp, n)

Harry Kakueki · Jun 3, 2007

What I want is just something like regexp.match(string, n), where the
regexp starts looking for a match at or after position n in the string.

Thanks,
Ken

You could match the string but ignore the first part of the match.

str = "abcdefghabcehjjjuabcfjkiabcgdfg"
str =~ /(abc.)/
p $1 # abcd
str =~ /a.*ju(abc.)/
p $1 #abcf

Harry

Patrick Hurley · Jun 3, 2007

I'm probably just missing something obvious, but I haven't found a way
to match a regular expression against only part of a string, in
particular only past a certain point of a string, as a way of finding
successive matches. Of course, one could do a match against a string,
take the substring past that match and do a match against the substring,
and so on, to find all of the matches for the string, but that could be
very expensive for very large strings.

I'm aware of the String.scan method, but that doesn't work for me
because it doesn't return MatchData instances.

What I want is just something like regexp.match(string, n), where the
regexp starts looking for a match at or after position n in the string.

Thanks,
Ken

I don't know of anything obvious, but I would probably do something a
little more like:

class String
def match_each(exp)
str = self
while md = str.match(exp)
yield md
str = md.post_match
end
end
end

foo = "foo bar foo bar foo"
foo.match_each /[oa][or]/ do |md|
puts "Found: #{md}"
end

# pth

Patrick Hurley · Jun 3, 2007

Hi,

At Sun, 3 Jun 2007 12:59:24 +0900,
Kenneth McDonald wrote in [ruby-talk:254054]:

What I want is just something like regexp.match(string, n), where the
regexp starts looking for a match at or after position n in the string.

Click to expand...

string.index(regexp, n)

I think he wanted MatchData objects. The String#index method returns
the index (numeric position of the match). But if all you want are
captures, then index is a good solution.

pth

Edwin Fine · Jun 3, 2007

Kenneth said:
I'm probably just missing something obvious, but I haven't found a way
to match a regular expression against only part of a string, in
particular only past a certain point of a string, as a way of finding
successive matches. Of course, one could do a match against a string,
take the substring past that match and do a match against the substring,
and so on, to find all of the matches for the string, but that could be
very expensive for very large strings.

I'm aware of the String.scan method, but that doesn't work for me
because it doesn't return MatchData instances.

What I want is just something like regexp.match(string, n), where the
regexp starts looking for a match at or after position n in the string.

Thanks,
Ken

How about this?

def match(s, re, n)
/(?:.{#{n}})(#{re})/.match(s)
end

irb(main):043:0> p s
"abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh
abdefgh "
irb(main):044:0> p match(s, /abd/, 10).begin(1)
16
irb(main):045:0> p match(s, /abd/, 20).begin(1)
24

Harry Kakueki · Jun 3, 2007

You could match the string but ignore the first part of the match.

str = "abcdefghabcehjjjuabcfjkiabcgdfg"
str =~ /(abc.)/
p $1 # abcd
str =~ /a.*ju(abc.)/
p $1 #abcf

Harry

If you want to specify the point in the string by number, you could do this.

str = "abcdefghabcehjjjuabcfjkiabcgdfg"
str =~ /.{10}(abc.).*(abc.)/
p $1 #abcf
p $2 #abcg

Harry

Kenneth McDonald · Jun 3, 2007

Edwin said:
How about this?

def match(s, re, n)
/(?:.{#{n}})(#{re})/.match(s)
end

irb(main):043:0> p s
"abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh
abdefgh "
irb(main):044:0> p match(s, /abd/, 10).begin(1)
16
irb(main):045:0> p match(s, /abd/, 20).begin(1)
24

That's clever. Obscure, but clever

. I wonder if the regexp engine is
clever enough to turn a match like .{n} into a constant time operation?

Thanks,
Ken

Nobuyoshi Nakada · Jun 3, 2007

Hi,

At Sun, 3 Jun 2007 13:56:05 +0900,
Patrick Hurley wrote in [ruby-talk:254059]:

I think he wanted MatchData objects. The String#index method returns
the index (numeric position of the match). But if all you want are
captures, then index is a good solution.

String#index also sets $~.

Patrick Hurley · Jun 3, 2007

Hi,

At Sun, 3 Jun 2007 13:56:05 +0900,
Patrick Hurley wrote in [ruby-talk:254059]:

I think he wanted MatchData objects. The String#index method returns
the index (numeric position of the match). But if all you want are
captures, then index is a good solution.

Click to expand...

String#index also sets $~.

I should have know to never question Nobu Nakada

, I always forget
about those variables.

Thanks
pth

Robert Klemme · Jun 3, 2007

Hi,

At Sun, 3 Jun 2007 13:56:05 +0900,
Patrick Hurley wrote in [ruby-talk:254059]:

I think he wanted MatchData objects. The String#index method returns
the index (numeric position of the match). But if all you want are
captures, then index is a good solution.

Click to expand...

String#index also sets $~.

But then you can also use String#scan:

irb(main):002:0> "ababb".scan(/(a)b+/) {p $~}
#<MatchData:0x7ff94618>
#<MatchData:0x7ff94578>
=> "ababb"
irb(main):003:0> "ababb".scan(/(a)b+/) {p $~.to_a}
["ab", "a"]
["abb", "a"]
=> "ababb"

Ken, why do you need MatchData objects?

Kind regards

robert

Devin Mullins · Jun 3, 2007

Nobuyoshi said:
String#index also sets $~.

For that matter, so does String#scan.

Logan Capaldo · Jun 3, 2007

I'm probably just missing something obvious, but I haven't found a way
to match a regular expression against only part of a string, in
particular only past a certain point of a string, as a way of finding
successive matches. Of course, one could do a match against a string,
take the substring past that match and do a match against the substring,
and so on, to find all of the matches for the string, but that could be
very expensive for very large strings.

I'm aware of the String.scan method, but that doesn't work for me
because it doesn't return MatchData instances.

What I want is just something like regexp.match(string, n), where the
regexp starts looking for a match at or after position n in the string.

require 'strscan'
scanner = StringScanner.new(string)
scanner.pos = n
if scanner.scan(regexp)
p scanner[1]
p scanner.matched
p scanner.pos
end

It's in the stdlib. (Note, it doesn't actually give you a match data, or
set $~, but of the top of my head I can't think of anything that a
matchdata can do that the stringscanner can't.)

Rick DeNatale · Jun 3, 2007

For that matter, so does String#scan.

Hence:
irb(main):001:0> "abcdefabc".scan(/abc/) {puts "#{$~.inspect}, #{$~}"}
#<MatchData:0xb7b0220c>, abc
#<MatchData:0xb7b021e4>, abc
=> "abcdefabc"

Kenneth McDonald · Jun 3, 2007

Is $~ thread safe?

To bad it has to be done this way (though my library will hide it). I
first looked at Ruby several years ago, and at that time, didn't go
further with it because it was too PERLish for me. (PERL was great for
its time, but speaking as someone who actually had to maintain a lot of
PERL code, it's actually a pretty grotty language). One of the things
that brought me back to Ruby was the fact that an effort was being made
to move Ruby away from its PERLisms. But I guess it'll take a while
longer...

Thanks everyone,
Ken

Joel VanderWerf · Jun 3, 2007

Kenneth said:
Is $~ thread safe?

Yes. All the regex match "global" variables are actually per-thread. See
p.319 of Pick Axe 2nd ed.

Robert Klemme · Jun 4, 2007

Is $~ thread safe?
Yes.

To bad it has to be done this way (though my library will hide it). I
first looked at Ruby several years ago, and at that time, didn't go
further with it because it was too PERLish for me. (PERL was great for
its time, but speaking as someone who actually had to maintain a lot of
PERL code, it's actually a pretty grotty language). One of the things
that brought me back to Ruby was the fact that an effort was being made
to move Ruby away from its PERLisms. But I guess it'll take a while
longer...

Thanks everyone,

Ken, I still don't understand why exactly you need MatchData objects.
What are you trying to achieve?

Kind regards

robert

Robert Dober · Jun 4, 2007

I'm probably just missing something obvious, but I haven't found a way
to match a regular expression against only part of a string, in
particular only past a certain point of a string, as a way of finding
successive matches. Of course, one could do a match against a string,
take the substring past that match and do a match against the substring,
and so on, to find all of the matches for the string, but that could be
very expensive for very large strings.

I'm aware of the String.scan method, but that doesn't work for me
because it doesn't return MatchData instances.

What I want is just something like regexp.match(string, n),

Hmm apart of using #scan and #index with $~ as indicated, I do not
think that there is a performance penalty if you do

rg.match(string[n..-1])

Cheers
Robert

Robert Dober · Jun 4, 2007

rg.match(string[n..-1])

My bad how stupid, am I thinking in C????
Robert

Trans · Jun 4, 2007

On 6/3/07 said:
On 6/3/07 said:

to match a regular expression against only part of a string, in
particular only past a certain point of a string, as a way of finding
successive matches. Of course, one could do a match against a string,
take the substring past that match and do a match against the substring,
and so on, to find all of the matches for the string, but that could be
very expensive for very large strings.

Click to expand...

I'm aware of the String.scan method, but that doesn't work for me
because it doesn't return MatchData instances.

Click to expand...

What I want is just something like regexp.match(string, n),

Click to expand...

Hmm apart of using #scan and #index with $~ as indicated, I do not
think that there is a performance penalty if you do

rg.match(string[n..-1])

How can that be? You have to create a whole new String. If that can be
avoided in the internal implementation then adding an optional offset
index to #match is not an unreasonable idea.

T.

How to get education and coding job coming from abroad starting new in the US? Advice of courses or places to look?	2	May 18, 2023
Copy string from 2D array to a 1D array in C	1	Nov 1, 2023
Looking to change programming direction	1	Aug 10, 2022
String#match vs. Regexp#match - confused	1	Sep 4, 2008
Match a pattern multiple times, returning matches, captures andoffset?	9	Apr 5, 2011
A website that I couldn't make a screenshot of it nor save any page from.	1	Oct 29, 2023
Looking for a match	2	Jun 25, 2007
Search for a string in another string allowing mismatches	3	Sep 21, 2010

No way of looking for a regrexp match starting from a particularpoint in a string?

Kenneth McDonald

Nobuyoshi Nakada

Harry Kakueki

Patrick Hurley

Patrick Hurley

Edwin Fine

Harry Kakueki

Kenneth McDonald

Nobuyoshi Nakada

Patrick Hurley

Robert Klemme

Devin Mullins

Logan Capaldo

Rick DeNatale

Kenneth McDonald

Joel VanderWerf

Robert Klemme

Robert Dober

Robert Dober

Trans

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads