regular expression back references

M

Matthew

Greetings, I am having a problem using back references in my regex and
I am having a difficult time figuring out what I am doing wrong. My
regex works fine with out the back refs but when I try to use them it
won't match my sample. It looks to me that I am using them no
differently then my examples and documentation but to no avail.

Here is my patteren:

macExpression = "^[0-9A-F]{1,2}(\:|\.|\-)[0-9A-F]{1,2}\1[0-9A-F]{1,2}\1[0-9A-F]{1,2}\1[0-9A-F]{1,2}\1[0-9A-F]{1,2}$:

And this is how I am using it:

matched = re.match(macExpression, macAddress)

I am trying to match mac addresses in the following formats
0:a0:c9:ee:b2:c0, 0-a0-c9-ee-b2-c0 & 0.a0.c9.ee.b2.c0 etc.

I wasn't sure how to do it but then I read about back references and I
thought that all was well... Alas If any one could lend a hand I would
appreciate it very much.

-matthew
 
J

John Machin

Here is my patteren:

macExpression = "^[0-9A-F]{1,2}(\:|\.|\-)[0-9A-F]{1,2}\1[0-9A-F]{1,2}\1[0-9A-F]{1,2}\1[0-9A-F]{1,2}\1[0-9A-F]{1,2}$:

And this is how I am using it:

matched = re.match(macExpression, macAddress)

I am trying to match mac addresses in the following formats
0:a0:c9:ee:b2:c0, 0-a0-c9-ee-b2-c0 & 0.a0.c9.ee.b2.c0 etc.

Four problems (1) Your pattern has 5 occurrences of [0-9-A-F] but your
data has 6 (2) your pattern has uppercase hex digits but your data has
lowercase (3) you need to double some backslashes or (preferably) use
the r"..." notation (4) your pattern is missing the trailing " -- it
helps if you cut and paste when posting rather than re-typing stuff.

and one superfluity: the "^" at the start is redundant

The following appears to work:
macExpression = r"[0-9A-F]{1,2}(\:|\.|\-)([0-9A-F]{1,2}\1){4,4}[0-9A-F]{1,2}$"
for macAddr in ["0:a0:c9:ee:b2:c0", "0-a0-c9-ee-b2-c0",
"0.a0.c9.ee.b2.c0", "0:a0-c9:ee:b2:c0"]:
.... print re.match(macExpression, macAddr, re.I)
....
<_sre.SRE_Match object at 0x007C8818>
<_sre.SRE_Match object at 0x007C8818>
<_sre.SRE_Match object at 0x007C8818>
None
 
C

Clay Shirky

Here is my patteren:

macExpression = "^[0-9A-F]{1,2}(\:|\.|\-)[0-9A-F]{1,2}\1[0-9A-F]{1,2}\1[0-9A-F]{1,2}\1[0-9A-F]{1,2}\1[0-9A-F]{1,2}$:

good lord, that looks like perl.

that sort of thing is miserable to write and miserable to maintain. it
makes more sense to treat MAC addresses as numbers than strings (and
saves you the horror of upper/lower case and "is it 0 or 00?" issues
as well)

use the re moduel to figure out what to split on, then convert
everything to numeric comparisons. here's an example, more readable
than the macExpression above:

import re

orig_list = [ 0, 160, 201, 238, 178, 192 ] # test MAC as numbers

new_addresses = [ "00:30:65:01:dc:9f", # various formats...
"00-03-93-52-0c-c6",
"00.A0.C9.EE.B2.C0" ]

for new_address in new_addresses:

test_list = []

# use regexes to see what to split on
if re.search(":", new_address):
new_list = new_address.split(":")
elif re.search("-", new_address):
new_list = new_address.split("-")
elif re.search(".", new_address):
new_list = new_address.split(".")

# convert alphanumeric hex strings to numbers
# via a long() cast, in base 16
for two_byte in new_list:
test_list.append(long(two_byte, 16)) # make a test list

if test_list == orig_list: # check for numeric matches
print new_address, "matches..."
else:
print new_address, "doesn't match..."
 
A

Andrew Dalke

Clay Shirky
# use regexes to see what to split on
if re.search(":", new_address):

or use
if ":" in new_address:
elif re.search("-", new_address):
new_list = new_address.split("-")
elif re.search(".", new_address):
new_list = new_address.split(".")

and include a
else:
raise Exception("I have no idea what you're asking for")

and maybe some ValueError catching in the int call.

Andrew
 
J

John Machin

Clay Shirky

or use
if ":" in new_address:


and include a
else:
raise Exception("I have no idea what you're asking for")

and maybe some ValueError catching in the int call.

instead maybe something like
new_list = []
for sep in '-.:':
if sep in new_address:
new_list = new_address.split(sep)
break
if len(new_list) != 6:
raise .......

plus also a test that each octet is in range(256) ....
 
P

Peter Abel

Greetings, I am having a problem using back references in my regex and
I am having a difficult time figuring out what I am doing wrong. My
regex works fine with out the back refs but when I try to use them it
won't match my sample. It looks to me that I am using them no
differently then my examples and documentation but to no avail.

Here is my patteren:

macExpression = "^[0-9A-F]{1,2}(\:|\.|\-)[0-9A-F]{1,2}\1[0-9A-F]{1,2}\1[0-9A-F]{1,2}\1[0-9A-F]{1,2}\1[0-9A-F]{1,2}$:

And this is how I am using it:

matched = re.match(macExpression, macAddress)

I am trying to match mac addresses in the following formats
0:a0:c9:ee:b2:c0, 0-a0-c9-ee-b2-c0 & 0.a0.c9.ee.b2.c0 etc.

I wasn't sure how to do it but then I read about back references and I
thought that all was well... Alas If any one could lend a hand I would
appreciate it very much.

-matthew

Matching even the silliest format of macAdress, the
following works for me:
silly_adrs=['00-30:65:01.dC:9f', '00:03-93.52.0C-c6', '00-A0:C9.eE:b2.C0']
def mac_adr(any_adr):
.... return '.'.join(map(lambda x:str(int(x,16)),
re.split('[\:\-\.]',any_adr)))
.... .... print mac_adr(adress)
....
0.48.101.1.220.159
0.3.147.82.12.198
0.160.201.238.178.192
Regards
Peter
 
M

Matthew

macExpression = r"[0-9A-F]{1,2}(\:|\.|\-)([0-9A-F]{1,2}\1){4,4}[0-9A-F]{1,2}$"


Thanks very much that will work a lot better. I am going to save the
other code you (and the others) offered for later. It may be useful
later on in the program.

Thanks
-matthew
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top