using target words from arrays in regex, pythons version of perls'map'

L

Lance Hoffmeyer

Hey all, in perl I was able to use the same regular expression multiple times changing one part of it
via a previously defined array and put the results into a new array

IN PERL:

my @targets = ('OVERALL RATING',
'CLOTHING', '
ITEMS',
'ACCESSORIES',
'SHOES',
'FINE JEWELRY');

my @JA13 = map {
$file2 =~/$_.*?(?:(\d{1,3}\.\d)\s+){3}/s;
} @targets;



So, in python instead of

match2 = re.search('OVEWRALL RATING.*?(?:(\d{1,3}\.\d)\s+){3} ', file2);m01 = match2.group(1) ;print m01
match2 = re.search('CLOTHING.*?(?:(\d{1,3}\.\d)\s+){3} ', file2); m02 = match2.group(1) ;print m02
match2 = re.search('ITEMS.*?(?:(\d{1,3}\.\d)\s+){3} ', file2); m03 = match2.group(1) ;print m03
match2 = re.search('ACCESSORIES.*?(?:(\d{1,3}\.\d)\s+){3} ', file2); m04 = match2.group(1) ;print m04
match2 = re.search('SHOES.*?(?:(\d{1,3}\.\d)\s+){3} ', file2); m05 = match2.group(1) ;print m05
match2 = re.search('FINE JEWELRY.*?(?:(\d{1,3}\.\d)\s+){3} ', file2); m06 = match2.group(1) ;print m06


I would have something similar to perl above:


targets = ['OVERALL RATING',
'CLOTHING', ITEMS',
'ACCESSORIES',
'SHOES',
'FINE JEWELRY']

PROPOSED CODE:

match2 = re.search(targets.*?(?:(\d{1,3}\.\d)\s+){3} ', file2);m = match2.group(1)




Lance
 
D

Dennis Lee Bieber

I would have something similar to perl above:


targets = ['OVERALL RATING',
'CLOTHING', ITEMS',
'ACCESSORIES',
'SHOES',
'FINE JEWELRY']

PROPOSED CODE:

match2 = re.search(targets.*?(?:(\d{1,3}\.\d)\s+){3} ', file2);m = match2.group(1)

I don't do regex's, and I also don't find packing multiple
statements on one line attractive.

However... Why don't you basically do what you say you do in
Python... Substitute you targets into the expression while inside a
loop...

targets = [ "OVERALL RATING",
"CLOTHING",
"ITEMS",
"ACCESSORIES",
"SHOES",
"FINE JEWELRY" ]

results = []
for t in target:
m2 = re.search("%s.*?(?:(\d{1,3}\.\d)\s+){3}" % t,
file2)
results.append(m2.group(1))
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
P

Paddy

I don't like string interpolation within REs, it pops me out of 'RE
mode' as I scan the line.

Maybe creating a dict of matchobjects could be used in the larger
context?:
dict( [(t, re.search(t+regexp_tail, file2) for t in targets] )

(untested).

- Pad.
 
J

John Machin

Think about how well the above solutions scale as len(targets)
increases.

1. Make "targets" a set, not a list.
2. Have *ONE* regex which includes a bracketed match for a generic
target e.g. ([A-Z\s]+)
3. Do *ONE* regex search, not 1 per target
4. If successful, check if the bracketed gizmoid is in the set of
targets.

It's not 100% apparent whether it is possible that there can be more
than one target in the inappropriately named file2 (it is a string,
isn't it?). If so, then write your own findall-like loop wrapped
around steps 2 & 3 above. Compile the regex in advance.

HTH,
John
 
P

Paul McGuire

Dennis Lee Bieber said:
I would have something similar to perl above:


targets = ['OVERALL RATING',
'CLOTHING', ITEMS',
'ACCESSORIES',
'SHOES',
'FINE JEWELRY']

PROPOSED CODE:

match2 = re.search(targets.*?(?:(\d{1,3}\.\d)\s+){3} ', file2);m = match2.group(1)

I don't do regex's, and I also don't find packing multiple statements on

one line attractive.

I concur - this kind of multiple-statements-per-line-ishness feels
gratuitous. Yes, I know they line up nicely when all 6 statements are
printed out together, but the repetition of "mNN = match2.group(1) ;print
mNN" should tell you that this might be better done with a for loop. DRY.
However... Why don't you basically do what you say you do in
Python... Substitute you targets into the expression while inside a
loop...

targets = [ "OVERALL RATING",
"CLOTHING",
"ITEMS",
"ACCESSORIES",
"SHOES",
"FINE JEWELRY" ]

results = []
for t in target:
m2 = re.search("%s.*?(?:(\d{1,3}\.\d)\s+){3}" % t, file2)
results.append(m2.group(1))
--


# by contrast, here is a reasonably Pythonic one-liner, if one-liner it must
be
results = [ re.search(r"%s.*?(?:(\d{1,3}\.\d)\s+){3}" % t, file2).group(1)
for t in targets ]

# or for improved readability (sometimes 2 lines are better than 1):
reSearchFunc = lamdba tt,ff : re.search(tt + r".*?(?:(\d{1,3}\.\d)\s+){3}",
ff).group(1)
results = [ reSearchFunc(t,file2) for t in targets ]


Resisting-the-urge-to-plug-pyparsing-ly yours,
-- Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top