boolean endsWith(String s, Pattern pattern)

L

lepikhin

Anybody hear about the

boolean endsWith(String s, Pattern pattern)

implementations?
 
R

Roedy Green

boolean endsWith(String s, Pattern pattern)
The source code for all the regex stuff is there for you to look at in
src.zip.

IIRC it compiles Patterns to commands to a state machine interpreter,
not byte code.
 
R

Rhino

Anybody hear about the

boolean endsWith(String s, Pattern pattern)

implementations?
Since a boolean can only contain the boolean values 'true' and 'false', how
could it possibly contain a String? Why would anyone write a method to
search for Strings within a boolean?

Rhino
 
R

Roedy Green

Since a boolean can only contain the boolean values 'true' and 'false', how
could it possibly contain a String? Why would anyone write a method to
search for Strings within a boolean?
huh?

I think he is asking does string s end with given pattern yes or no.

he can't make it an instance method of String since String is final.
He left out the word "static", though I suppose it could be
implemented on some unrelated object.
 
L

lepikhin

I think he is asking does string s end with given pattern yes or no.
Yes. Sorry for ambiguous problem statement.

Probmlem with "something$"-like regexps is, that you need to look
through the whole string. It's very slow.
 
R

Roedy Green

Probmlem with "something$"-like regexps is, that you need to look
through the whole string. It's very slow.

it depends on how clever the Pattern compiler is. A smart one might
scan the string backwards.
 
L

lepikhin

it depends on how clever the Pattern compiler is. A smart one might
scan the string backwards.

Unfortunately Java Pattern compiler (java.util.regex.Pattern) is
stupid. I've checked it.
 
R

Roedy Green

Unfortunately Java Pattern compiler (java.util.regex.Pattern) is
stupid. I've checked it.

In that case, if this were a bottleneck, you could help the regex
along by chopping off all but the last N characters.
 
C

Chris Uppal

Unfortunately Java Pattern compiler (java.util.regex.Pattern) is
stupid. I've checked it.

Construct your regexp backwards, reverse the string, and look for a match at
the beginning of that...

-- chris
 
J

John C. Bollinger

In said:
Unfortunately Java Pattern compiler (java.util.regex.Pattern) is
stupid. I've checked it.

And in a different one:
It's difficult, for some reasons.

My apologies, but does that mean you're stupid, too? I mean, you
criticize the regex compiler as stupid for not being able to do a job
equivalent to the one you say is too difficult for *you*. Regexes do
not easily reverse; as far as I know, there is no general algorithm for
"reversing" a regex, so it is unsurprising that Pattern doesn't know how
to do it.

You started out asking how to determine whether a string ends with a
match for a particular pattern. The usual approach is to prepend ".*"
to the pattern and match the whole string with it.
 
C

Chris Uppal

John said:
Regexes do
not easily reverse; as far as I know, there is no general algorithm for
"reversing" a regex, so it is unsurprising that Pattern doesn't know how
to do it.

Why do you say that ? Am I missing something, or are you talking about the
difficulty of reversing (some of) the Perl-ish extensions to the classic regexp
concept ?

It seems (to me) that reversing a classic regexp is straightforward using the
following (rather redundant) transformations on the parsed AST representation
of the regexp:

reverse( c ) -> c
reverse( . ) -> .
reverse( [a-b] ) -> [a-b]
reverse( R1 | R2 ) -> reverse(R1) | reverse(R2)
reverse( R1R2 ) -> reverse(R2) reverse(R1)
reverse( ( R ) ) -> ( reverse(R) )
reverse( R * ) -> reverse(R) *
reverse( R + ) -> reverse(R) +
reverse( ^ R) -> reverse(R) $
reverse( R $ ) -> ^ reverse(R)

(where a, b, and c are characters and R, R1, and R1 are regexps). I have no
idea what some of the Perl-ish extensions would even mean if reversed, but
"classic" regexps are good enough for me, and perhaps for most (legitimate)
applications.

-- chris
 
J

John C. Bollinger

Chris said:
John C. Bollinger wrote:




Why do you say that ? Am I missing something, or are you talking about the
difficulty of reversing (some of) the Perl-ish extensions to the classic regexp
concept ?

I am talking about reversing regexes expressed in the language supported
by java.util.regex.Pattern, a language even more feature-laden than
Perl's regex language.
It seems (to me) that reversing a classic regexp is straightforward using the
following (rather redundant) transformations on the parsed AST representation
of the regexp:

reverse( c ) -> c
reverse( . ) -> .
reverse( [a-b] ) -> [a-b]
reverse( R1 | R2 ) -> reverse(R1) | reverse(R2)
reverse( R1R2 ) -> reverse(R2) reverse(R1)
reverse( ( R ) ) -> ( reverse(R) )
reverse( R * ) -> reverse(R) *
reverse( R + ) -> reverse(R) +
reverse( ^ R) -> reverse(R) $
reverse( R $ ) -> ^ reverse(R)

(where a, b, and c are characters and R, R1, and R1 are regexps). I have no
idea what some of the Perl-ish extensions would even mean if reversed, but
"classic" regexps are good enough for me, and perhaps for most (legitimate)
applications.

Your argument for how that particular regex language can be reversed is
convincing, but you don't have to add much to the language to make the
task a lot harder. How about reluctant quantifiers, for instance? We
had a different thread in the past week where the regex engine's
leftmost matching behavior with respect to reluctant quantifiers
produced a result that it would be tricky to reproduce working from the
other end. (This is a regex feature also supported by Perl, and one
with some good uses.)

It might be possible (though nontrivial) to solve the problem with
reluctant quantifiers, but what about greedy ones? Those are too
closely tied to details of the engine's matching behavior to admit
reversal, I think.
 
R

Roedy Green

I am talking about reversing regexes expressed in the language supported
by java.util.regex.Pattern, a language even more feature-laden than
Perl's regex language.

A optimisation does not need to handle all cases. It is perfectly
legit to cherry pick the easy ones to optimise.
 
C

Chris Uppal

John said:
I am talking about reversing regexes expressed in the language supported
by java.util.regex.Pattern, a language even more feature-laden than
Perl's regex language.

Right. I tend to ignore the non-classical extensions, myself.

Your argument for how that particular regex language can be reversed is
convincing, but you don't have to add much to the language to make the
task a lot harder. How about reluctant quantifiers, for instance?

It seems to me that the greedy/reluctant distinction is only about handling
ambiguous "parses". That's to say that a switching between greedy and
reluctant never changes the language that the regexp recognises, but only
changes the way that sub-regexps are assigned to substrings in the input
sequence. (I'm assuming that the regexp is being used to make a simple Boolean
judgement "does this string match?", the streaming mode is obviously
different). For instance, given the pattern:
/a*a*/
and the input sequence:
aaaa
there are various possible parses:
()(aaaa)
(a)(aaa)
(aa)(aa)
(aaa)(a)
()(aaaa)
and the choice amongst them is determined by the greediness of the matches. I
haven't been able to prove it (possibly because I lack a precise definition of
"reluctant", possibly because I can't think straight), but I conjecture that
inverting the greediness of each subexpression while reversing the pattern
would produce the "same" parse of the reversed string.

I have no idea about the possessive qualifier. But then I don't care either --
it's an aesthetic disaster, a theoretic horror, and I hope (and expect) never
to see any legitimate use for it.

Beyond those cases, the only non-classical features of the Java regexp package
(I think) are: The wide palette of character classes, which can be seen as
syntactic sugar for character ranges, or even for "raw" alternation, and which
reverse trivially. The various end-of-word, end-of-line, etc,
pseudo-characters. The pseudo-characters present a problem if the set is not
closed under reversal, I don't know whether it is (CR-LF, anyone ?). And the
backref feature, which obviously doesn't reverse.

-- chris
 
J

John C. Bollinger

Chris said:
It seems to me that the greedy/reluctant distinction is only about handling
ambiguous "parses". That's to say that a switching between greedy and
reluctant never changes the language that the regexp recognises, but only
changes the way that sub-regexps are assigned to substrings in the input
sequence. (I'm assuming that the regexp is being used to make a simple Boolean
judgement "does this string match?", the streaming mode is obviously
different).

You're right. I was thinking more about assigning parts of the matched
string to groups the same way, and not as much about the overall "does
this string match?", which is the most important aspect of the question
the OP asked (somewhere back there in the distance...).

[...]
I conjecture that
inverting the greediness of each subexpression while reversing the pattern
would produce the "same" parse of the reversed string.

Now that would be a tidy result. A little prodding at it didn't make a
counterexample fall out, but I'm too lazy to try to really prove it.
I have no idea about the possessive qualifier. But then I don't care either --
it's an aesthetic disaster, a theoretic horror, and I hope (and expect) never
to see any legitimate use for it.

Do I sense a mild distaste for the feature? :^)
In truth, I agree with you. I raised the issue only because I think
they're a certain killer of general-purpose reversal of arbitrary Java
regex patterns. Perhaps in itself that is reflective of the possessive
quantifiers' perversion.
Beyond those cases, the only non-classical features of the Java regexp package
(I think) are: The wide palette of character classes, which can be seen as
syntactic sugar for character ranges, or even for "raw" alternation, and which
reverse trivially. The various end-of-word, end-of-line, etc,
pseudo-characters. The pseudo-characters present a problem if the set is not
closed under reversal, I don't know whether it is (CR-LF, anyone ?). And the
backref feature, which obviously doesn't reverse.

As I consider it, I think perhaps even a backreference is reversible by
exchanging the places of the last reference to each group and its
referrent, and fixing up the group indices in the references. I don't
see a pseudo-character for a CR/LF pair, but such a combination is
recognized as a single line terminator when not in UNIX_LINES mode. I
think that could be handled by use of the available individual
metacharacters for CR and LF. If I'm right (now) then that leaves the
possessive quantifiers as the only non-reversible feature. With that,
you have established your point very well indeed.
 
C

Chris Uppal

John said:
Do I sense a mild distaste for the feature? :^)
In truth, I agree with you. I raised the issue only because I think
they're a certain killer of general-purpose reversal of arbitrary Java
regex patterns. Perhaps in itself that is reflective of the possessive
quantifiers' perversion.

That's a good way to think of it. Should I ever write a regexp reverser, I
shall throw a FeatureInsupportableException in such cases ;-)

BTW, there's also the zero-width-look{ahead/behind} operator. That, if I've
understood it correctly (Sun, understandably don't care to admit what it
actually does) can be handled trivially in the case where the lookahead/behind
is a fixed string, otherwise it's another job for FeatureInsupportableException
I fear.

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,534
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top