regex question

P

proctor

hello,

i hope this is the correct place...

i have an issue with some regex code i wonder if you have any insight:

================

import re, sys

def makeRE(w):
print w + " length = " + str(len(w))
reString = "r'" + w[:1]
w = w[1:]
if len(w) > 0:
for c in (w):
reString += "|" + c
reString += "'"
print "reString = " + reString
return reString

test = sys.argv[1]
stg = sys.argv[2]

while test:
print "test = ", test
print "stg = ", stg
rx_a = re.compile(makeRE(test))
i = rx_a.search(stg).start()
print "i = " + str(i)
id = test.find(stg)
test = test[:id] + test[id+1:]
print "test == ", test
stg = stg[:i] + stg[i+1:]
print

================

i get the following output:

================

test = abc
stg = defabc
abc length = 3
reString = r'a|b|c'
i = 4
test == ac

test = ac
stg = defac
ac length = 2
reString = r'a|c'
Traceback (most recent call last):
File "aaaa.py", line 21, in ?
i = rx_a.search(stg).start()
AttributeError: 'NoneType' object has no attribute 'start'

=================

i am fairly new to this, and can't see the reason for the error. what
am i missing?

btw, i think there are simpler ways to go about this, but i am doing it
this way (regexs) for a bit of a challenge and learning experience.

thanks to all!

sincerely,
proctor
 
P

Paul McGuire

proctor said:
hello,

i hope this is the correct place...

i have an issue with some regex code i wonder if you have any insight:

================

There's nothing actually *wrong* wth your regex. The problem is your
misunderstanding of raw string notation. In building up your regex, do not
start the string with "r'" and end it with a "'".

def makeRE(w):
print w + " length = " + str(len(w))
# reString = "r'" + w[:1]
reString = w[:1]
w = w[1:]
if len(w) > 0:
for c in (w):
reString += "|" + c
# reString += "'"
print "reString = " + reString
return reString

Or even better:

def makeRE(w):
print w + " length = " + str(len(w))
reString = "|".join(list(w))
return reString

Raw string notation is intended to be used when the string literal is in
your Python code itself, for example, this is a typical use for raw strings:

ipAddrRe = r'\d{1,3}(\.\d{1,3}){3}'

If I didn't have raw string notation to use, I'd have to double up all the
backslashes, as:

ipAddrRe = '\\d{1,3}(\\.\\d{1,3}){3}'

But no matter which way I create the string, it does not actually start with
"r'" and end with "'", those are just notations for literals that are part
of your Python source.

Does this give you a better idea of what is happening?

-- Paul
 
P

proctor

Paul said:
proctor said:
hello,

i hope this is the correct place...

i have an issue with some regex code i wonder if you have any insight:

================

There's nothing actually *wrong* wth your regex. The problem is your
misunderstanding of raw string notation. In building up your regex, do not
start the string with "r'" and end it with a "'".

def makeRE(w):
print w + " length = " + str(len(w))
# reString = "r'" + w[:1]
reString = w[:1]
w = w[1:]
if len(w) > 0:
for c in (w):
reString += "|" + c
# reString += "'"
print "reString = " + reString
return reString

Or even better:

def makeRE(w):
print w + " length = " + str(len(w))
reString = "|".join(list(w))
return reString

Raw string notation is intended to be used when the string literal is in
your Python code itself, for example, this is a typical use for raw strings:

ipAddrRe = r'\d{1,3}(\.\d{1,3}){3}'

If I didn't have raw string notation to use, I'd have to double up all the
backslashes, as:

ipAddrRe = '\\d{1,3}(\\.\\d{1,3}){3}'

But no matter which way I create the string, it does not actually start with
"r'" and end with "'", those are just notations for literals that are part
of your Python source.

Does this give you a better idea of what is happening?

-- Paul

yes! thanks so much.

it does work now...however, one more question: when i type:

rx_a = re.compile(r'a|b|c')
it works correctly!

shouldn't:
rx_a = re.compile(makeRE(test))
give the same result since makeRE(test)) returns the string "r'a|b|c'"

are you saying that the "r'" and "'" are being interpreted differently
in the second case than in the first? if so, how would i go about
using raw string notation in such a circumstance (perhaps if i need to
escape "\b" or the like)? do i have to double up in this case?

proctor.
 
S

Steven D'Aprano

it does work now...however, one more question: when i type:

rx_a = re.compile(r'a|b|c')
it works correctly!

shouldn't:
rx_a = re.compile(makeRE(test))
give the same result since makeRE(test)) returns the string "r'a|b|c'"

Those two strings are NOT the same.
r'a|b|c' 8

A string with a leading r *outside* the quotation marks is a raw-string.
The r is not part of the string, but part of the delimiter.

A string with a leading r *inside* the quotation marks is just a string
with a leading r. It has no special meaning.
 
P

proctor

Steven said:
Those two strings are NOT the same.

r'a|b|c' 8

A string with a leading r *outside* the quotation marks is a raw-string.
The r is not part of the string, but part of the delimiter.

A string with a leading r *inside* the quotation marks is just a string
with a leading r. It has no special meaning.

thanks steven,

is there any way i would be successful then, in using raw string inside
my makeRE() function?

proctor.
 
M

Mark Peters

is there any way i would be successful then, in using raw string inside
my makeRE() function?

Why do you think you even need a raw string?

Just build and return the string 'a|b|c' (NOTE: DON'T add the quotes to
the string)
 
P

Paul McGuire

proctor said:
it does work now...however, one more question: when i type:

rx_a = re.compile(r'a|b|c')
it works correctly!

Do you see the difference between:

rx_a = re.compile(r'a|b|c')

and

rx_a = re.compile("r'a|b|c'")

There is no difference in the variable datatype between "string" and "raw
string". Raw strings are just a notational helper when creating string
literals that have lots of backslashes in them (as happens a lot with
regexps).

r'a|b|c' is the same as 'a|b|c'
r'\d' is the same as '\\d'

There is no reason to "add raw strings" to your makeRE method, since you
don't have a single backslash anywhere. And even if there were a backslash
in the 'w' argument, it is just a string - no need to treat it differently.

-- Paul
 
P

proctor

Mark said:
Why do you think you even need a raw string?

Just build and return the string 'a|b|c' (NOTE: DON'T add the quotes to
the string)

yes, i suppose you are right. i can't think of a reason i would NEED a
raw string in this situation.

all very helpful! thanks very much.

sincerely,
proctor.
 
P

proctor

Paul said:
Do you see the difference between:

rx_a = re.compile(r'a|b|c')

and

rx_a = re.compile("r'a|b|c'")

There is no difference in the variable datatype between "string" and "raw
string". Raw strings are just a notational helper when creating string
literals that have lots of backslashes in them (as happens a lot with
regexps).

r'a|b|c' is the same as 'a|b|c'
r'\d' is the same as '\\d'

There is no reason to "add raw strings" to your makeRE method, since you
don't have a single backslash anywhere. And even if there were a backslash
in the 'w' argument, it is just a string - no need to treat it differently.

-- Paul

thanks paul. this helps.

proctor.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,129
Latest member
FastBurnketo
Top