reusing parts of a string in RE matches?

J

John Salerno

Ben said:
Yes, and no extra for loops are needed! You can define groups inside
the lookahead assertion:
['aba', 'aba', 'aba', 'aba', 'aba', 'aba', 'aba']

Wow, that was like magic! :)
 
M

Mirco Wahab

Hi John
rg = r'(\w)(?=(.)\1)'

That would at least isolate the number, although you'd still have to get
it out of the list/tuple.

I have no idea how to do this
in Python in a terse way - but
I'll try ;-)

In Perl, its easy. Here, the
"match construct" (\w)(?=(.)\1)
returns all captures in a
list (a 1 a 2 a 4 b 7 c 9)
because we capture 2 fields per
comparision:
The first is (\w), needed for
backreference, the second is
the dot (.), which finds the
number in the center (or any-
thing else).

So, in perl you 'filter' by the
'grep' function on each list
element: grep{ /\d/ } - this
means, only numbers (\d) will
pass through:

$_ = 'a1a2a3Aa4a35a6b7b8c9c';
print grep{/\d/} /(\w)(?=(.)\1)/g;
#prints => 1 2 4 7 9

I'll try to fiddle somthing out
that works in Python too ...

Regards

M.
 
J

John Salerno

Mirco said:
I have no idea how to do this
in Python in a terse way - but
I'll try ;-)

In Perl, its easy. Here, the
"match construct" (\w)(?=(.)\1)
returns all captures in a
list (a 1 a 2 a 4 b 7 c 9)

Ah, I see the difference. In Python you get a list of tuples, so there
seems to be a little extra work to do to get the number out.
 
M

Mirco Wahab

Hi John
Ah, I see the difference. In Python you get a list of tuples, so there
seems to be a little extra work to do to get the number out.

Dohh, after two cups of coffee
ans several bars of chocolate
I eventually mad(e) it ;-)

In Python, you have to deconstruct
the 2D-lists (here: long list of
short lists [a,2] ...) by
'slicing the slice':

char,num = list[:][:]

in a loop and using the apropriate element then:

import re

t = 'a1a2a3Aa4a35a6b7b8c9c';
r = r'(\w)(?=(.)\1)'
l = re.findall(r, t)

for a,b in (l[:][:]) : print b

In the moment, I find this syntax
awkward and arbitary, but my mind
should change if I'm adopted more
to this in the end ;-)

Regards,

M.
 
M

Mirco Wahab

Hi John
> Ah, I see the difference. In Python you get a list of
> tuples, so there seems to be a little extra work to do
> to get the number out.

Dohh, after two cups of coffee
ans several bars of chocolate
I eventually mad(e) it ;-)

In Python, you have to deconstruct
the 2D-lists (here: long list of
short lists [a,2] ...) by
'slicing the slice':

char,num = list[:][:]

in a loop and using the apropriate element then:

import re

t = 'a1a2a3Aa4a35a6b7b8c9c';
r = r'(\w)(?=(.)\1)'
l = re.findall(r, t)

for a,b in l : print b

(l sould implicitly be decoded
sequentially as l[:][->a : ->b]
in the loop context.)

In the moment, I find this syntax
somehow hard to remember, but my
mind should change if I'm adopted
more to this in the end ;-)

Regards,

M.
 
F

Fredrik Lundh

Mirco said:
In Python, you have to deconstruct
the 2D-lists (here: long list of
short lists [a,2] ...) by
'slicing the slice':

char,num = list[:][:]

in a loop and using the apropriate element then:

import re

t = 'a1a2a3Aa4a35a6b7b8c9c';
r = r'(\w)(?=(.)\1)'
l = re.findall(r, t)

for a,b in (l[:][:]) : print b

In the moment, I find this syntax
awkward and arbitary, but my mind
should change if I'm adopted more
to this in the end ;-)

in contemporary Python, this is best done by a list comprehension:

l = [m[1] for m in re.findall(r, t)]

or, depending on what you want to do with the result, a generator
expression:

g = (m[1] for m in re.findall(r, t))

or

process(m[1] for m in re.findall(r, t))

if you want to avoid creating the tuples, you can use finditer instead:

l = [m.group(2) for m in re.finditer(r, t)]
g = (m.group(2) for m in re.finditer(r, t))

finditer is also a good tool to use if you need to do more things with
each match:

for m in re.finditer(r, t):
s = m.group(2)
... process s in some way ...

the code body will be executed every time the RE engine finds a match,
which can be useful if you're working on large target strings, and only
want to process the first few matches.

for m in re.finditer(r, t):
s = m.group(2)
if s == something:
break
... process s in some way ...

</F>
 
M

Mirco Wahab

Hi Fredrik

you brought up some terse and
somehow expressive lines with
their own beauty ...
[this] is best done by a list comprehension:
l = [m[1] for m in re.findall(r, t)]

or, [...] a generator expression:
g = (m[1] for m in re.findall(r, t))

or
process(m[1] for m in re.findall(r, t))

... avoid creating the tuples, ... finditer instead:
l = [m.group(2) for m in re.finditer(r, t)]
g = (m.group(2) for m in re.finditer(r, t))

finditer is also a good tool to use
for m in re.finditer(r, t):
s = m.group(2)
... process s in some way ...

.... which made me wish to internalize such wisdom too ;-)

This looks almost beautiful, it made me stand up and
go to some large book stores in order to grab a good
book on python.

Sadly, there were none (except one small 'dictionary',
ISBN: 3826615123). I live in a fairly large city
in Germany w/three large bookstores in the center,
where one can get loads of PHP and Java books, lots
of C/C++ and the like - even some Ruby books (some
"Rails" too) on display (WTF).

Not that I wouldn't order books (I do that all the
time for 'original versions') but it makes one
sad-faced to see the small impact of the Python
language here today on bookstore-tournarounds ...

Thanks & regards

Mirco
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,271
Latest member
BuyAtenaLabsCBD

Latest Threads

Top