a list/re problem

E

Ed Keith

I have a problem and I am trying to find a solution to it that is both
efficient and elegant.

I have a list call it 'l':

l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']

Notice that some of the items in the list start and end with an '*'. I wish to construct a new list, call it 'n' which is all the members of l that start and end with '*', with the '*'s removed.

So in the case above n would be ['nbh', 'jkjsdfjasd']

the following works:

r = re.compile('\*(.+)\*')

def f(s):
m = r.match(s)
if m:
return m.group(1)
else:
return ''

n = [f(x) for x in l if r.match(x)]



But it is inefficient, because it is matching the regex twice for each item, and it is a bit ugly.

I could use:


n = []
for x in keys:
m = r.match(x)
if m:
n.append(m.group(1))


It is more efficient, but much uglier.

Does anyone have a better solution?

Thank,

-EdK


Ed Keith
(e-mail address removed)

Blog: edkeith.blogspot.com
 
G

Grant Edwards

I have a problem and I am trying to find a solution to it that is both
efficient and elegant.

I have a list call it 'l':

l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']
Notice that some of the items in the list start and end with
an '*'. I wish to construct a new list, call it 'n' which is
all the members of l that start and end with '*', with the
'*'s removed.

So in the case above n would be ['nbh', 'jkjsdfjasd']

[s[1:-1] for s in l if (s[0] == s[-1] == '*')]
 
N

Neil Cerutti

[s[1:-1] for s in l if (s[0] == s[-1] == '*')]

That last bit doesn't work right, does it, since an == expression
evaluates to True or False, no the true or false value itself?
 
P

Peter Otten

Ed said:
I have a problem and I am trying to find a solution to it that is both
efficient and elegant.

I have a list call it 'l':

l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']

Notice that some of the items in the list start and end with an '*'. I
wish to construct a new list, call it 'n' which is all the members of l
that start and end with '*', with the '*'s removed.

So in the case above n would be ['nbh', 'jkjsdfjasd']

the following works:

r = re.compile('\*(.+)\*')

def f(s):
m = r.match(s)
if m:
return m.group(1)
else:
return ''

n = [f(x) for x in l if r.match(x)]



But it is inefficient, because it is matching the regex twice for each
item, and it is a bit ugly.

I could use:


n = []
for x in keys:
m = r.match(x)
if m:
n.append(m.group(1))


It is more efficient, but much uglier.

It's efficient and easy to understand; maybe you have to readjust your
taste.
Does anyone have a better solution?

In this case an approach based on string slicing is probably best. When the
regular expression gets more complex you can use a nested a generator
expression:
items = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']
match = re.compile(r"\*(.+)\*").match
[m.group(1) for m in (match(s) for s in items) if m is not None]
['nbh', 'jkjsdfjasd']

Peter
 
G

Grant Edwards

[s[1:-1] for s in l if (s[0] == s[-1] == '*')]

That last bit doesn't work right, does it, since an == expression
evaluates to True or False, no the true or false value itself?

It works for me. Doesn't it work for you?

From the fine manual (section 5.9. Comparisons):

Comparisons can be chained arbitrarily, e.g., x < y <= z is
equivalent to x < y and y <= z, except that y is evaluated
only once (but in both cases z is not evaluated at all when x
< y is found to be false).
 
M

Matt Nordhoff

Grant said:
I have a problem and I am trying to find a solution to it that is both
efficient and elegant.

I have a list call it 'l':

l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']
Notice that some of the items in the list start and end with
an '*'. I wish to construct a new list, call it 'n' which is
all the members of l that start and end with '*', with the
'*'s removed.

So in the case above n would be ['nbh', 'jkjsdfjasd']

[s[1:-1] for s in l if (s[0] == s[-1] == '*')]

s[0] and s[-1] raise an IndexError if l contains an empty string.

Better something like:
[s[1:-1] for s in l if (s[:1] == s[-1:] == '*')]

Or just the slightly more verbose startswith/endswith version.
 
S

Steven D'Aprano

I have a problem and I am trying to find a solution to it that is both
efficient and elegant.

I have a list call it 'l':

l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']

Notice that some of the items in the list start and end with an '*'. I
wish to construct a new list, call it 'n' which is all the members of l
that start and end with '*', with the '*'s removed.

So in the case above n would be ['nbh', 'jkjsdfjasd']

the following works:

r = re.compile('\*(.+)\*')
[snip]


Others have suggested using a list comp. Just to be different, here's a
version using filter and map.

l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']
l = map(
lambda s: s[1:-1] if s.startswith('*') and s.endswith('*') else '', l)
l = filter(None, l)
 
L

Lie Ryan

import re
r = re.compile('\*(.+)\*')

def f(s):
m = r.match(s)
if m:
return m.group(1)

l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']

n = [y for y in (f(x) for x in l) if y]
 
L

Lie Ryan

But it is inefficient, because it is matching the regex twice for each
item, and it is a bit ugly.

I could use:


n = []
for x in keys:
m = r.match(x)
if m:
n.append(m.group(1))


It is more efficient, but much uglier.

It's efficient and easy to understand; maybe you have to readjust your
taste.

I agree, it's easy to understand, but it's also ugly because of the
level of indentation (which is too deep for such a simple problem).

(sorry to ramble around)

A few months ago, I suggested an improvement in the python-ideas list to
add a post-filter to list-comprehension, somewhere in this line:

a = [f(x) as F for x in l if c(F)]

where the evaluation of f(x) will be the value of F so F can be used in
the if-expression as a post-filter (complementing list-comps' pre-filter).

Many doubted its usefulness since they say it's easy to wrap in another
list-comp:
a = [y for y in (f(x) for x in l) if c(y)]
or with a map and filter
a = filter(None, map(f, l))

Up till now, I don't really like the alternatives.
 
N

Nobody

the following works:

r = re.compile('\*(.+)\*')

def f(s):
m = r.match(s)
if m:
return m.group(1)
else:
return ''

n = [f(x) for x in l if r.match(x)]



But it is inefficient, because it is matching the regex twice for each
item, and it is a bit ugly.
Does anyone have a better solution?

Use a language with *real* list comprehensions?

Flamebait aside, you can use another level of comprehension, i.e.:

n = [m.group(1) for m in (r.match(x) for x in l) if m]
 
N

Neil Cerutti

[s[1:-1] for s in l if (s[0] == s[-1] == '*')]

That last bit doesn't work right, does it, since an == expression
evaluates to True or False, no the true or false value itself?

It works for me. Doesn't it work for you?

From the fine manual (section 5.9. Comparisons):

Comparisons can be chained arbitrarily, e.g., x < y <= z is
equivalent to x < y and y <= z, except that y is evaluated
only once (but in both cases z is not evaluated at all when x
< y is found to be false).

I did not know that. Thanks, Grant.
 
A

Aahz

I have a list call it 'l':

l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']

Notice that some of the items in the list start and end with an '*'. I
wish to construct a new list, call it 'n' which is all the members of l
that start and end with '*', with the '*'s removed.

What kind of guarantee do you have that the asterisk will only exist on
the first and last character, if at all?
 
S

Steven D'Aprano

Ed said:
I have a list call it 'l':

l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']

Notice that some of the items in the list start and end with an '*'. I
wish to construct a new list, call it 'n' which is all the members of l
that start and end with '*', with the '*'s removed.

What kind of guarantee do you have that the asterisk will only exist on
the first and last character, if at all?

Does it matter?



In any case, surely the simplest solution is to eschew regular
expressions and do it the easy way.


result = [s[1:-1] for s in l if s.startswith('*') and s.endswith('*')]


For a more general solution, I'd use a pair of helper functions:

def bracketed_by(s, prefix, suffix=None):
if suffix is None:
suffix = prefix
return s.startswith(prefix) and s.endswith(suffix)

def strip_brackets(s, prefix, suffix=None):
if suffix is None:
suffix = prefix
return s[len(prefix):-len(suffix)]


Note that I haven't tested these two helper functions. The second in
particular may not work correctly in some corner cases (e.g. passing the
empty string as suffix).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,267
Latest member
WaylonCogb

Latest Threads

Top