Regexes: Forcing the LAST Match

H

Hal Vaughan

I'm not sure what terms to look for here. I want to use a regex that will
match as little as possible in a string from the *end* of the string. For
example, if I have "OneTwoThreeTwoOne", I want to know how I can match
only "TwoOne" at the end and not "TwoThreeTwoOne".

I've been experimenting with quantifiers, but something like "Two.*?$" will
grab everything from the first occurrence of "Two". How can I make it grab
from only the last occurrence?

Thanks!

Hal
 
S

Stefan Ram

Hal Vaughan said:
I'm not sure what terms to look for here. I want to use a regex that will
match as little as possible in a string from the *end* of the string.

»As little as possible from the end of the string« would be "".
For example, if I have "OneTwoThreeTwoOne", I want to know how
I can match only "TwoOne" at the end and not "TwoThreeTwoOne".

When I tell you that this can be done by »TwoOne$«, you will
not be satisfied, I guess. But it is the actual answer to the
preceding English sentence.

It might help to try to express what you want in English,
in a manner that does not need examples to be understood.

»As little as possible from the end« does not seem to be that
expression, because it contradicts the example given.
 
R

Ronny Schuetz

I guess you want to look for the difference of greedy vs. reluctant
(non-greedy) matches. However, see Stefans reply.

Ronny
 
H

Hal Vaughan

Stefan said:
»As little as possible from the end of the string« would be "".


When I tell you that this can be done by »TwoOne$«, you will
not be satisfied, I guess. But it is the actual answer to the
preceding English sentence.

It might help to try to express what you want in English,
in a manner that does not need examples to be understood.

Okay.

I want to be able to specify a phrase with a regex and remove from the last
occurrence of that phrase to the end of the original string.

Examples:

Full String: "The date 2008-01-07 is one day before Elvis' birthday on
2008-01-08, which is tomorrow."
Match: "[0-9]{2,4}-[0-9]{1,2}-[0-9]{1,2}"
Desired Result: "The date 2008-01-07 is one day before Elvis' birthday on "

It matches the LAST date that fits the format and removes from the last
match on. If I used "[0-9]{2,4}-[0-9]{1,2}-[0-9]{1,2}.*?" it'll match from
the first date on.

Full String: "One Two Three Two One"
Match: "Two"
Desired Result: "One Two Three "


I've tried using negative lookaheads, but, as best I can guess, in that last
example, if I use any kind of quantifier, then the first match I get
is "Two Three Two One" and it seems to not see it can match just "Two One".

I have used Match.find() and a loop to get the last position of the match,
then used Match.start() to get the position, then get a substring of the
original string, using that start position as where the substring ends, but
if I want to delete from just after the match, then I have to check to make
sure I'm not out of bounds and so on. I would think there would be an easy
way to match the last occurrence of a phrase to the end of a string instead
of having to loop through it.

Thanks!

Hal
 
R

Robert Klemme

Stefan said:
»As little as possible from the end of the string« would be "".

When I tell you that this can be done by »TwoOne$«, you will
not be satisfied, I guess. But it is the actual answer to the
preceding English sentence.

It might help to try to express what you want in English,
in a manner that does not need examples to be understood.

Okay.

I want to be able to specify a phrase with a regex and remove from the last
occurrence of that phrase to the end of the original string.

Examples:

Full String: "The date 2008-01-07 is one day before Elvis' birthday on
2008-01-08, which is tomorrow."
Match: "[0-9]{2,4}-[0-9]{1,2}-[0-9]{1,2}"
Desired Result: "The date 2008-01-07 is one day before Elvis' birthday on "

It matches the LAST date that fits the format and removes from the last
match on. If I used "[0-9]{2,4}-[0-9]{1,2}-[0-9]{1,2}.*?" it'll match from
the first date on.

Full String: "One Two Three Two One"
Match: "Two"
Desired Result: "One Two Three "


I've tried using negative lookaheads, but, as best I can guess, in that last
example, if I use any kind of quantifier, then the first match I get
is "Two Three Two One" and it seems to not see it can match just "Two One".

I have used Match.find() and a loop to get the last position of the match,
then used Match.start() to get the position, then get a substring of the
original string, using that start position as where the substring ends, but
if I want to delete from just after the match, then I have to check to make
sure I'm not out of bounds and so on. I would think there would be an easy
way to match the last occurrence of a phrase to the end of a string instead
of having to loop through it.

In your case you can use this

http://java.sun.com/javase/6/docs/api/java/lang/String.html#lastIndexOf(java.lang.String)

robert
 
S

Stefan Ram

Hal Vaughan said:
I want to be able to specify a phrase with a regex and remove from the last
occurrence of that phrase to the end of the original string.

Thus, for the phrase »alpha« and the text
»alpha beta alpha gamma alpha delta alpha epsilon«,
the text to be removed is »alpha epsilon«.

public class Main
{
public static void test( final java.lang.String text )
{
final java.util.regex.Matcher matcher =
java.util.regex.Pattern.compile
( "^(.*)alpha.*$" ).
matcher( text );

while( matcher.find() )
java.lang.System.out.println( matcher.group( 1 )); }

public static void main( final java.lang.String[] args )
{ test
( "alpha beta alpha gamma alpha delta alpha epsilon alpha zeta" ); }}

alpha beta alpha gamma alpha delta alpha epsilon
 
H

Hal Vaughan

Robert said:
Stefan said:
I'm not sure what terms to look for here. I want to use a regex that
will match as little as possible in a string from the *end* of the
string.
»As little as possible from the end of the string« would be "".

For example, if I have "OneTwoThreeTwoOne", I want to know how
I can match only "TwoOne" at the end and not "TwoThreeTwoOne".
When I tell you that this can be done by »TwoOne$«, you will
not be satisfied, I guess. But it is the actual answer to the
preceding English sentence.

It might help to try to express what you want in English,
in a manner that does not need examples to be understood.

Okay.

I want to be able to specify a phrase with a regex and remove from the
last occurrence of that phrase to the end of the original string.

Examples:

Full String: "The date 2008-01-07 is one day before Elvis' birthday on
2008-01-08, which is tomorrow."
Match: "[0-9]{2,4}-[0-9]{1,2}-[0-9]{1,2}"
Desired Result: "The date 2008-01-07 is one day before Elvis' birthday on
"

It matches the LAST date that fits the format and removes from the last
match on. If I used "[0-9]{2,4}-[0-9]{1,2}-[0-9]{1,2}.*?" it'll match
from the first date on.

Full String: "One Two Three Two One"
Match: "Two"
Desired Result: "One Two Three "


I've tried using negative lookaheads, but, as best I can guess, in that
last example, if I use any kind of quantifier, then the first match I get
is "Two Three Two One" and it seems to not see it can match just "Two
One".

I have used Match.find() and a loop to get the last position of the
match, then used Match.start() to get the position, then get a substring
of the original string, using that start position as where the substring
ends, but if I want to delete from just after the match, then I have to
check to make
sure I'm not out of bounds and so on. I would think there would be an
easy way to match the last occurrence of a phrase to the end of a string
instead of having to loop through it.

In your case you can use this
http://java.sun.com/javase/6/docs/api/java/lang/String.html#lastIndexOf(java.lang.String)

But that uses a String not a regex. I tried it.

Thanks, though.

Hal
 
H

Hal Vaughan

Stefan said:
Hal Vaughan said:
I want to be able to specify a phrase with a regex and remove from the
last occurrence of that phrase to the end of the original string.

Thus, for the phrase »alpha« and the text
»alpha beta alpha gamma alpha delta alpha epsilon«,
the text to be removed is »alpha epsilon«.

public class Main
{
public static void test( final java.lang.String text )
{
final java.util.regex.Matcher matcher =
java.util.regex.Pattern.compile
( "^(.*)alpha.*$" ).
matcher( text );

while( matcher.find() )
java.lang.System.out.println( matcher.group( 1 )); }

public static void main( final java.lang.String[] args )
{ test
( "alpha beta alpha gamma alpha delta alpha epsilon alpha zeta" ); }}

alpha beta alpha gamma alpha delta alpha epsilon

Then what you're doing, in essence, is using a greedy quantifier at the
start to gobble up as much as possible so it only finds the last occurrence
and then just using capture to get the text that's to be kept and using
that as the replacement.

Am I right in how this is working? I see it works, I just want to be sure I
understand it clearly.

Thanks!

Hal
 
S

Stefan Ram

Hal Vaughan said:
Then what you're doing, in essence, is using a greedy
quantifier at the start to gobble up as much as possible so it
only finds the last occurrence and then just using capture to
get the text that's to be kept and using that as the
replacement.

I believe so.
 
S

Stefan Ram

Hal Vaughan said:
Okay. I got it.

One still might ask, whether there is a way to just inspect
the end of the string. I am not absolutely sure, whether the
following code really does that, but I would try it this way:

public class Main
{
public static java.lang.String pos( final java.lang.String text )
{
final java.util.regex.Matcher matcher =
java.util.regex.Pattern.compile
( "(?<=(alpha.{0,2147483642}?)$)" ).
matcher( text );

return matcher.find() ? matcher.group( 1 ) : ""; }

public static void main( final java.lang.String[] args )
{
final java.lang.String source = "alpha beta alpha gamma alpha delta";

final java.lang.String stringToBeRemoved = pos( source );

java.lang.System.out.println( stringToBeRemoved );

final java.lang.String result = source.substring
( 0, source.length() - stringToBeRemoved.length() );

java.lang.System.out.println( result ); }}

alpha delta
alpha beta alpha gamma
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top