using target words from arrays in regex, pythons version of perls'map'

Lance Hoffmeyer · May 15, 2006

Hey all, in perl I was able to use the same regular expression multiple times changing one part of it
via a previously defined array and put the results into a new array

IN PERL:

my @targets = ('OVERALL RATING',
'CLOTHING', '
ITEMS',
'ACCESSORIES',
'SHOES',
'FINE JEWELRY');

my @JA13 = map {
$file2 =~/$_.*?(?

\d{1,3}\.\d)\s+){3}/s;
} @targets;

So, in python instead of

match2 = re.search('OVEWRALL RATING.*?(?

\d{1,3}\.\d)\s+){3} ', file2);m01 = match2.group(1) ;print m01
match2 = re.search('CLOTHING.*?(?

\d{1,3}\.\d)\s+){3} ', file2); m02 = match2.group(1) ;print m02
match2 = re.search('ITEMS.*?(?

\d{1,3}\.\d)\s+){3} ', file2); m03 = match2.group(1) ;print m03
match2 = re.search('ACCESSORIES.*?(?

\d{1,3}\.\d)\s+){3} ', file2); m04 = match2.group(1) ;print m04
match2 = re.search('SHOES.*?(?

\d{1,3}\.\d)\s+){3} ', file2); m05 = match2.group(1) ;print m05
match2 = re.search('FINE JEWELRY.*?(?

\d{1,3}\.\d)\s+){3} ', file2); m06 = match2.group(1) ;print m06

I would have something similar to perl above:

targets = ['OVERALL RATING',
'CLOTHING', ITEMS',
'ACCESSORIES',
'SHOES',
'FINE JEWELRY']

PROPOSED CODE:

match2 = re.search(targets.*?(?\d{1,3}\.\d)\s+){3} ', file2);m = match2.group(1)

Lance

Dennis Lee Bieber · May 16, 2006

I would have something similar to perl above:

targets = ['OVERALL RATING',
'CLOTHING', ITEMS',
'ACCESSORIES',
'SHOES',
'FINE JEWELRY']

PROPOSED CODE:

match2 = re.search(targets.*?(?\d{1,3}\.\d)\s+){3} ', file2);m = match2.group(1)

I don't do regex's, and I also don't find packing multiple
statements on one line attractive.

However... Why don't you basically do what you say you do in
Python... Substitute you targets into the expression while inside a
loop...

targets = [ "OVERALL RATING",
"CLOTHING",
"ITEMS",
"ACCESSORIES",
"SHOES",
"FINE JEWELRY" ]

results = []
for t in target:
m2 = re.search("%s.*?(?\d{1,3}\.\d)\s+){3}" % t,
file2)
results.append(m2.group(1))
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/

Paddy · May 16, 2006

I don't like string interpolation within REs, it pops me out of 'RE
mode' as I scan the line.

Maybe creating a dict of matchobjects could be used in the larger
context?:
dict( [(t, re.search(t+regexp_tail, file2) for t in targets] )

(untested).

- Pad.

John Machin · May 16, 2006

Think about how well the above solutions scale as len(targets)
increases.

1. Make "targets" a set, not a list.
2. Have *ONE* regex which includes a bracketed match for a generic
target e.g. ([A-Z\s]+)
3. Do *ONE* regex search, not 1 per target
4. If successful, check if the bracketed gizmoid is in the set of
targets.

It's not 100% apparent whether it is possible that there can be more
than one target in the inappropriately named file2 (it is a string,
isn't it?). If so, then write your own findall-like loop wrapped
around steps 2 & 3 above. Compile the regex in advance.

HTH,
John

John Machin · May 16, 2006

Would you believe "steps 3 & 4"?

Paul McGuire · May 16, 2006

Dennis Lee Bieber said:
I would have something similar to perl above:

targets = ['OVERALL RATING',
'CLOTHING', ITEMS',
'ACCESSORIES',
'SHOES',
'FINE JEWELRY']

PROPOSED CODE:

match2 = re.search(targets.*?(?\d{1,3}\.\d)\s+){3} ', file2);m = match2.group(1)

Click to expand...

I don't do regex's, and I also don't find packing multiple statements on

one line attractive.

I concur - this kind of multiple-statements-per-line-ishness feels
gratuitous. Yes, I know they line up nicely when all 6 statements are
printed out together, but the repetition of "mNN = match2.group(1) ;print
mNN" should tell you that this might be better done with a for loop. DRY.

However... Why don't you basically do what you say you do in
Python... Substitute you targets into the expression while inside a
loop...

targets = [ "OVERALL RATING",
"CLOTHING",
"ITEMS",
"ACCESSORIES",
"SHOES",
"FINE JEWELRY" ]

results = []
for t in target:
m2 = re.search("%s.*?(?\d{1,3}\.\d)\s+){3}" % t, file2)
results.append(m2.group(1))
--

Click to expand...

# by contrast, here is a reasonably Pythonic one-liner, if one-liner it must
be
results = [ re.search(r"%s.*?(?\d{1,3}\.\d)\s+){3}" % t, file2).group(1)
for t in targets ]

# or for improved readability (sometimes 2 lines are better than 1):
reSearchFunc = lamdba tt,ff : re.search(tt + r".*?(?\d{1,3}\.\d)\s+){3}",
ff).group(1)
results = [ reSearchFunc(t,file2) for t in targets ]

Resisting-the-urge-to-plug-pyparsing-ly yours,
-- Paul

Edward Elliott · May 16, 2006

John said:
Would you believe "steps 3 & 4"?

How about "two pops and a pass?"

Quick! Lower the cone of silence!

Using arrays instaed of sequentially numbered variables	2	Jan 26, 2006
arrays, even, roundup, odd round down ?	2	May 16, 2006
Question regarding lists and regex	2	Nov 9, 2006
FAQ 4.53 How do I manipulate arrays of bits?	0	Feb 10, 2011
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
using re.finditer()	4	Oct 27, 2004
FAQ 5.2 How do I change, delete, or insert a line in a file, or append to the beginning of a file?	0	Feb 24, 2011
need help with a cart I inherited, need to increase number of total characters allowed	3	Oct 21, 2007

using target words from arrays in regex, pythons version of perls'map'

Lance Hoffmeyer

Dennis Lee Bieber

Paddy

John Machin

John Machin

Paul McGuire

Edward Elliott

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads