regular expression reverse match?

J

Jay Dorsey

Ron said:
Is it possible to match a string to regular expression pattern instead
of the other way around?

For example, instead of finding a match within a string, I want to
find out, (pass or fail), if a string is a partial match to an re.

Given an re of 'abcd and a bunch of other stuff'

This is what i'm looking for:

string / result
'a' / pass
'ab' / pass
'abc' / pass
'abd' / fail
'aaaa' / fail
'abcd and a bunch of other stuff and then some' / fail

How about:
>>> matcher = "abcd and a bunch of other stuff"
>>> phrases = ["a", "ab", "abc", "abd", "aaaa", "abcd and a bunch of other stuff and then some"]
>>> for phrase in phrases:
.... if phrase == matcher[:len(phrase)]: print "pass"
.... else: print "fail"
....
pass
pass
pass
fail
fail
fail


Jay
 
R

Ron Adam

Is it possible to match a string to regular expression pattern instead
of the other way around?

For example, instead of finding a match within a string, I want to
find out, (pass or fail), if a string is a partial match to an re.

Given an re of 'abcd and a bunch of other stuff'

This is what i'm looking for:

string / result
'a' / pass
'ab' / pass
'abc' / pass
'abd' / fail
'aaaa' / fail
'abcd and a bunch of other stuff and then some' / fail

Is there a way to form a regular expression that will do this?

I'm hoping to use more complex regular expressions than this. But
the left to right precedence will still be the same.

_Ron Adam
 
E

Emile van Sebille

Ron Adam said:
Is it possible to match a string to regular expression pattern instead
of the other way around?

You can abuse the implmentation details to discover the original search
string.
For example, instead of finding a match within a string, I want to
find out, (pass or fail), if a string is a partial match to an re.

Given an re of 'abcd and a bunch of other stuff'

I'll assume you mean comething like:

x = re.compile('abcd and a bunch of other stuff')

This is what i'm looking for:

string / result
'a' / pass
'ab' / pass
'abc' / pass
'abd' / fail
'aaaa' / fail
'abcd and a bunch of other stuff and then some' / fail

Is there a way to form a regular expression that will do this?

for k,v in re._cache.items():
if v == x:
ss=k[0]
break

Then it's a normal:


Not sure that's what you're looking for, but reasonably sure it won't work
in all cases.

HTH,

Emile van Sebille
(e-mail address removed)
 
R

Ron Adam

You can abuse the implmentation details to discover the original search
string.


I already know the search string... Or will once I understand the how
to form them to do what I want. <hopefully> What I don't know is
the completed string that will match to it beforehand. Or at least
that's the idea.

I'm writing an interactive keyboard input routine and want to use re
to specify the allowed input. Some things are easy like only allowing
digits or characters.

So far I can check for %i and %f first to specify simple ints and
floats. A test cast operation with an exception handles those
perfectly.

As it is, the program checks the input buffer after each keystroke and
determines if the buffer is acceptable against a pattern as the string
is being built. A reverse match. It allows new keystrokes to be
added to the buffer as long as it's within the pattern.

I was hoping to use regular expression patterns as a function argument
to specify a wide range of input patterns.

I'm still a little new to Python and a lot new to the regular
expressions although I've used variations of them before.


So how do I say.... accept only 1 character out of set [YyNn] but
only one character and do not match Ye, yy nn etc...

.... accept only 10 of any type characters but not 11

.... accept only the letters of "a particular string" sequentially and
then no more.

.... accept a number in the form of "nnn-nnn-nnn" sequentially with
required dashes.

..... accept any sequence of characters and numbers as long as they
contain at least 1 of each type, cap letter, small letter, and digit
and be a minimum of 6 characters long. ie.... a password check.

It's probably easy to do all of these given a completed string first.

This started out as a python learning project.. <grin> but it's sort
turned into a real interesting coding situation.




Thanks for the reply.

_Ron Adam



For example, instead of finding a match within a string, I want to
find out, (pass or fail), if a string is a partial match to an re.

Given an re of 'abcd and a bunch of other stuff'

I'll assume you mean comething like:

x = re.compile('abcd and a bunch of other stuff')

This is what i'm looking for:

string / result
'a' / pass
'ab' / pass
'abc' / pass
'abd' / fail
'aaaa' / fail
'abcd and a bunch of other stuff and then some' / fail

Is there a way to form a regular expression that will do this?

for k,v in re._cache.items():
if v == x:
ss=k[0]
break

Then it's a normal:

I'll give it a try... not sure either. Like I said, this is kind of
new to me. :)
 
R

Ron Adam

How about:
matcher = "abcd and a bunch of other stuff"
phrases = ["a", "ab", "abc", "abd", "aaaa", "abcd and a bunch of other stuff and then some"]
for phrase in phrases:
... if phrase == matcher[:len(phrase)]: print "pass"
... else: print "fail"
...
pass
pass
pass
fail
fail
fail


Jay

Hi Jay, That was an overly simple example I gave I think. The
'pattern' will in most case not be just a simple string. And I'll
only be testing on one string at a time as it forms.

It's an interactive input routine. So as the person types, I want to
test the buffer and accept or reject each key according to the
pattern. Pass or fail.


This method would work though if it's possible to expand a regular
expression out to a string of single matching re characters I think.
Then it would be a matter of doing re.match('ibuffer',
pattern[:len(ibuffer)].

I just don't know enough yet, but am learning quickly though,
thanks.

_Ron
 
R

Ron Adam

I'll assume you mean comething like:

x = re.compile('abcd and a bunch of other stuff')
This is what i'm looking for:

string / result
'a' / pass
'ab' / pass
'abc' / pass
'abd' / fail
'aaaa' / fail
'abcd and a bunch of other stuff and then some' / fail

Is there a way to form a regular expression that will do this?

for k,v in re._cache.items():
if v == x:
ss=k[0]
break

Then it's a normal:


Hi again Emile,

I tried it and get an error.
Here's my result. Am I missing something?
if v==x:
ss=k[0]
break


Traceback (most recent call last):
File "<pyshell#21>", line 1, in ?
for k,v in re._cache.items():
AttributeError: 'module' object has no attribute '_cache'

With a trial and error method, (works for me eventually), this is what
I've been able to get to work so far. I found the '$' is what I
needed to limit the buffer length to the pattern.

e = kb_input('Enter "Y" or "N" please: ', '[YyNn]$')
f = kb_input('Enter "yes" please: ', 'y$|ye$|yes$')
g = kb_input('Enter "yes" or "no": ', '(y$|ye$|yes$)|(n$|no$)')
h = kb_input('Enter a 5 digit number:','\d$|\d{2}$\d{3}$\d{4}$\d{5}$')

New problem: Is there a way to expand an re from:

'yes$' to 'y$|ye$|yes$'

and

'(yes$)|(no$)' to '(y$|ye$|yes$)|(n$|no$)'

and

'\d{30}$' to '\d{1}$|\d{2}$|\d{3}$|\d{4}$|\d{5}$| .......'


Other expressions that I might use would be:

'\d{3}-\d{3}-d\{4}$' to match a phone number

or '\c{40}' to specify a character string 40 characters long.

but if I have to manually expend these to the above formats they can
get pretty long.

''abcd and a bunch of other stuff' becomes...
'a$|ab$|abc$|abcd$|abcd $|abcd a$|abc... etc... etc... ....stuff$'

Well at least I know what the re's look like now. Time to sleep on it
and see what tomorrow brings.

_Ron
 
E

Emile van Sebille

Ron Adam said:
I'll assume you mean comething like:

x = re.compile('abcd and a bunch of other stuff')
This is what i'm looking for:

string / result
'a' / pass
'ab' / pass
'abc' / pass
'abd' / fail
'aaaa' / fail
'abcd and a bunch of other stuff and then some' / fail

Is there a way to form a regular expression that will do this?

for k,v in re._cache.items():
if v == x:
ss=k[0]
break

Then it's a normal:
re.match('a',ss)
re.match('abd',ss)


Hi again Emile,

I tried it and get an error.
Here's my result. Am I missing something?

See, I told you it wouldn't work ;-) I abused an implementation detail
that apparently doesn't exist on the version you've got.

[snip]
New problem: Is there a way to expand an re from:

'yes$' to 'y$|ye$|yes$'

and

'(yes$)|(no$)' to '(y$|ye$|yes$)|(n$|no$)'

and

'\d{30}$' to '\d{1}$|\d{2}$|\d{3}$|\d{4}$|\d{5}$| .......'


Other expressions that I might use would be:

'\d{3}-\d{3}-d\{4}$' to match a phone number

You'll probably want to build a validation routine. Here's one quick idea:

import re

def oksofar(pattern, test, sample):
teststr = "".join([test,sample[len(test):]])
return not not re.match(pattern, teststr)

for p,t,s in [
(r'^(\d{5})$', '123456', '56789'),
(r'^([A-Z]{2}\d(2))$', 'AB4x', 'XY12'),
(r'^(\d{5})-(\d{4})$', '55555-1234', '55555-1212')
]:
print p,t,s,not not re.match(p, s)
for ii in range(len(t)):
print ii,t[:ii+1], oksofar(p, t[:ii+1], s)


HTH,

Emile van Sebille
(e-mail address removed)
 
D

Diez B. Roggisch

Hi,
As it is, the program checks the input buffer after each keystroke and
determines if the buffer is acceptable against a pattern as the string
is being built. A reverse match. It allows new keystrokes to be
added to the buffer as long as it's within the pattern.

I don't think its possible with regular expressions, or at least the way you
are using them right now, that is to come from a "in the end, I want to
look it like this, but it shall accept everything that prefixes a valid
result".

That won't work - the reason is simply that regular expressions are
equivalent to finite state automata. I don't know if you are familiar with
these, but the consist of states, which are nodes, and transitions between
these, which are edges/arrows between the states.

Now ususally there is one special starting state S, and there should be at
least on reachable end-state. As the names suggest, recoginizing a
specified String starts at S, and then every read character advances the
internal state until one of the end-states is reached _and_ there is no
more input.

Now lets see at a simple example:

"abc"

Here every character becomes a state, and the automata looks like this:

S-a->(a)-b->(b)-c->[c]

The -*-> means that this transition is taken when * is matched by the
current input. So -a-> is taken when an "a" is input.

[c] is an end-state.

Now what you want is, that _all_ states that are legal are also end-states.
That is very easy to accomplish when you implement the automata directly
like this:

S-a->[a]-b->-c->[c]

However, its way more complicated to create this as a rex:

"a(b(c)?)?"

This looks straightforward, so you might be successful to create rexes like
this using a preprocessing step - but the more complicated your rex gets,
this approach will be hard to follow. Actually, I have currently no idea.
Which doesn't mean its not doable, I just don't have enough time to think
about it right now :)

It looks to me, that if you need this feature not on a per-case base where
you can think about your rexes more thoroughly, you have to sort of roll
out your own rex-implmenetation. Which isn't too hard, but definitely
something thats more than just a couple of lines.

Regards,

Diez
 
R

Ron Adam

On Wed, 29 Oct 2003 00:48:46 -0800, "Emile van Sebille"


[snip]
See, I told you it wouldn't work ;-) I abused an implementation detail
that apparently doesn't exist on the version you've got.

It's not the first time something doesn't work on a winxp system. :)


[snip]
You'll probably want to build a validation routine. Here's one quick idea:

import re

def oksofar(pattern, test, sample):
teststr = "".join([test,sample[len(test):]])
return not not re.match(pattern, teststr)

for p,t,s in [
(r'^(\d{5})$', '123456', '56789'),
(r'^([A-Z]{2}\d(2))$', 'AB4x', 'XY12'),
(r'^(\d{5})-(\d{4})$', '55555-1234', '55555-1212')
]:
print p,t,s,not not re.match(p, s)
for ii in range(len(t)):
print ii,t[:ii+1], oksofar(p, t[:ii+1], s)


HTH,

Emile van Sebille
(e-mail address removed)


Ok, it took me a while to see what you did. Using a sample to
complete the partial input is a good Idea, Thanks! :)

This works in the case of regular expressions that are linear and only
have one possible path.

Doesn't work with expressions that have more than one possible path,
such as r'yes$|no$' . I think that's what Diez was trying to explain
to me. That this could become rather complex when grouping and
alternate cases are present.



It seems there are three possible routs to take on this.

1. Process the input so it will match the re. This will require
generating a set of samples, 1 for each re branch. Then matching with
each sample. Using the method above.

So now the problem becomes.... <starting to feel as though I fell in
a rabbit hole> .... is there a way to generate samples where there
is one of each for each re branch?

2. Process the re so it will match a subset of all possible final end
points. ie... a less than operation.

if this were a math problem it might look something like this.

string <= yes$ | no$
string = (y | ye | yes)$ | (n | no)$
string = (y$ | ye$ | yes$) | (n$ | no$)
string = y$ | ye$ | yes$ | n$ | no$

I was hoping there was a way to do this already. Since there isn't,
it would require building a parser to convert strings of 'abc' to
'a|ab|abc' and handle distributed operations with ^,$ and other
special characters. And there are probably exceptions that will
complicate the process. :-/

This is method is probably only worth doing if it had wider
applications. <shrug> I don't know enough (yet) to know.


3. Process both the input and the re.

1. string <= yes$|no$

2. string <= yes$
string <= no$

3. generate a sample list, 1 sample for each re
4. test the input using each sample for a possible match

This might be doable.... will have to think on it and do a little
experimenting.
 
R

Ron Adam

Hi,


I don't think its possible with regular expressions, or at least the way you
are using them right now, that is to come from a "in the end, I want to
look it like this, but it shall accept everything that prefixes a valid
result".

That won't work - the reason is simply that regular expressions are
equivalent to finite state automata. I don't know if you are familiar with
these, but the consist of states, which are nodes, and transitions between
these, which are edges/arrows between the states.

I'm not familiar with the terms, but I am familiar with tree data
structures.


[clipped good explanation]

However, its way more complicated to create this as a rex:

"a(b(c)?)?"

This looks straightforward, so you might be successful to create rexes like
this using a preprocessing step - but the more complicated your rex gets,
this approach will be hard to follow. Actually, I have currently no idea.
Which doesn't mean its not doable, I just don't have enough time to think
about it right now :)

I'm still definitely learning this, and appreciate the help.

In reply to Emily, I compared rex to simplifying a math problem like
this. I'm hoping this will lead to a method that will work.
if this were a math problem it might look something like this.

string <= yes$ | no$
string = (y | ye | yes)$ | (n | no)$
string = (y$ | ye$ | yes$) | (n$ | no$)
string = y$ | ye$ | yes$ | n$ | no$

Using the same approach for the example above might be something like:

s <= a( b (c)? )?
s <= a( b | bc )?
s <= a | ab | abc
s = a | (a | ab) | ( a | ab | abc)
s = a | a | ab | a | ab | abc
s = a | ab | abc

I have no idea if what I'm doing here is actually valid. I'm sort of
thinking it out and learning as I go. Are there rules and methods to
manipulating regular expressions in this manner, rex algebra? Or is
this a subset of set theory? I think my statistics instructor did
something similar to this. It was a few years ago.

It looks to me, that if you need this feature not on a per-case base where
you can think about your rexes more thoroughly, you have to sort of roll
out your own rex-implmenetation. Which isn't too hard, but definitely
something thats more than just a couple of lines.

Regards,

Diez

Looks like I only <obvious understatement> need to create a few
functions to manipulate rex expressions. Simpler than a full
rex-emplementations, but still definitely more than a few lines of
code.

_Ron Adam
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top