a list/re problem

Ed Keith · Dec 11, 2009

I have a problem and I am trying to find a solution to it that is both
efficient and elegant.

I have a list call it 'l':

l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']

Notice that some of the items in the list start and end with an '*'. I wish to construct a new list, call it 'n' which is all the members of l that start and end with '*', with the '*'s removed.

So in the case above n would be ['nbh', 'jkjsdfjasd']

the following works:

r = re.compile('\*(.+)\*')

def f(s):
m = r.match(s)
if m:
return m.group(1)
else:
return ''

n = [f(x) for x in l if r.match(x)]

But it is inefficient, because it is matching the regex twice for each item, and it is a bit ugly.

I could use:

n = []
for x in keys:
m = r.match(x)
if m:
n.append(m.group(1))

It is more efficient, but much uglier.

Does anyone have a better solution?

Thank,

-EdK

Ed Keith
(e-mail address removed)

Blog: edkeith.blogspot.com

Grant Edwards · Dec 11, 2009

I have a problem and I am trying to find a solution to it that is both
efficient and elegant.

I have a list call it 'l':

l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']

Notice that some of the items in the list start and end with
an '*'. I wish to construct a new list, call it 'n' which is
all the members of l that start and end with '*', with the
'*'s removed.

So in the case above n would be ['nbh', 'jkjsdfjasd']

[s[1:-1] for s in l if (s[0] == s[-1] == '*')]

Neil Cerutti · Dec 11, 2009

[s[1:-1] for s in l if (s[0] == s[-1] == '*')]

That last bit doesn't work right, does it, since an == expression
evaluates to True or False, no the true or false value itself?

Peter Otten · Dec 11, 2009

Ed said:
I have a problem and I am trying to find a solution to it that is both
efficient and elegant.

I have a list call it 'l':

l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']

Notice that some of the items in the list start and end with an '*'. I
wish to construct a new list, call it 'n' which is all the members of l
that start and end with '*', with the '*'s removed.

So in the case above n would be ['nbh', 'jkjsdfjasd']

the following works:

r = re.compile('\*(.+)\*')

def f(s):
m = r.match(s)
if m:
return m.group(1)
else:
return ''

n = [f(x) for x in l if r.match(x)]

But it is inefficient, because it is matching the regex twice for each
item, and it is a bit ugly.

I could use:

n = []
for x in keys:
m = r.match(x)
if m:
n.append(m.group(1))

It is more efficient, but much uglier.

It's efficient and easy to understand; maybe you have to readjust your
taste.

Does anyone have a better solution?

In this case an approach based on string slicing is probably best. When the
regular expression gets more complex you can use a nested a generator
expression:

items = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']
match = re.compile(r"\*(.+)\*").match
[m.group(1) for m in (match(s) for s in items) if m is not None]

Click to expand...

Click to expand...

['nbh', 'jkjsdfjasd']

Peter

Grant Edwards · Dec 11, 2009

[s[1:-1] for s in l if (s[0] == s[-1] == '*')]

Click to expand...

That last bit doesn't work right, does it, since an == expression
evaluates to True or False, no the true or false value itself?

It works for me. Doesn't it work for you?

From the fine manual (section 5.9. Comparisons):

Comparisons can be chained arbitrarily, e.g., x < y <= z is
equivalent to x < y and y <= z, except that y is evaluated
only once (but in both cases z is not evaluated at all when x
< y is found to be false).

Matt Nordhoff · Dec 12, 2009

Grant said:
I have a problem and I am trying to find a solution to it that is both
efficient and elegant.

I have a list call it 'l':

l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']

Click to expand...

Notice that some of the items in the list start and end with
an '*'. I wish to construct a new list, call it 'n' which is
all the members of l that start and end with '*', with the
'*'s removed.

So in the case above n would be ['nbh', 'jkjsdfjasd']

Click to expand...

[s[1:-1] for s in l if (s[0] == s[-1] == '*')]

s[0] and s[-1] raise an IndexError if l contains an empty string.

Better something like:

[s[1:-1] for s in l if (s[:1] == s[-1:] == '*')]

Click to expand...

Click to expand...

Or just the slightly more verbose startswith/endswith version.

Steven D'Aprano · Dec 12, 2009

I have a problem and I am trying to find a solution to it that is both
efficient and elegant.

I have a list call it 'l':

l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']

Notice that some of the items in the list start and end with an '*'. I
wish to construct a new list, call it 'n' which is all the members of l
that start and end with '*', with the '*'s removed.

So in the case above n would be ['nbh', 'jkjsdfjasd']

the following works:

r = re.compile('\*(.+)\*')

[snip]

Others have suggested using a list comp. Just to be different, here's a
version using filter and map.

l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']
l = map(
lambda s: s[1:-1] if s.startswith('*') and s.endswith('*') else '', l)
l = filter(None, l)

Lie Ryan · Dec 12, 2009

import re
r = re.compile('\*(.+)\*')

def f(s):
m = r.match(s)
if m:
return m.group(1)

l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']

n = [y for y in (f(x) for x in l) if y]

Lie Ryan · Dec 12, 2009

But it is inefficient, because it is matching the regex twice for each
item, and it is a bit ugly.

I could use:

n = []
for x in keys:
m = r.match(x)
if m:
n.append(m.group(1))

It is more efficient, but much uglier.

Click to expand...

It's efficient and easy to understand; maybe you have to readjust your
taste.

I agree, it's easy to understand, but it's also ugly because of the
level of indentation (which is too deep for such a simple problem).

(sorry to ramble around)

A few months ago, I suggested an improvement in the python-ideas list to
add a post-filter to list-comprehension, somewhere in this line:

a = [f(x) as F for x in l if c(F)]

where the evaluation of f(x) will be the value of F so F can be used in
the if-expression as a post-filter (complementing list-comps' pre-filter).

Many doubted its usefulness since they say it's easy to wrap in another
list-comp:
a = [y for y in (f(x) for x in l) if c(y)]
or with a map and filter
a = filter(None, map(f, l))

Up till now, I don't really like the alternatives.

Nobody · Dec 12, 2009

the following works:

r = re.compile('\*(.+)\*')

def f(s):
m = r.match(s)
if m:
return m.group(1)
else:
return ''

n = [f(x) for x in l if r.match(x)]

But it is inefficient, because it is matching the regex twice for each
item, and it is a bit ugly.

Does anyone have a better solution?

Use a language with *real* list comprehensions?

Flamebait aside, you can use another level of comprehension, i.e.:

n = [m.group(1) for m in (r.match(x) for x in l) if m]

Neil Cerutti · Dec 14, 2009

[s[1:-1] for s in l if (s[0] == s[-1] == '*')]

Click to expand...

That last bit doesn't work right, does it, since an == expression
evaluates to True or False, no the true or false value itself?

Click to expand...

It works for me. Doesn't it work for you?

From the fine manual (section 5.9. Comparisons):

Comparisons can be chained arbitrarily, e.g., x < y <= z is
equivalent to x < y and y <= z, except that y is evaluated
only once (but in both cases z is not evaluated at all when x
< y is found to be false).

I did not know that. Thanks, Grant.

Aahz · Dec 29, 2009

I have a list call it 'l':

l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']

Notice that some of the items in the list start and end with an '*'. I
wish to construct a new list, call it 'n' which is all the members of l
that start and end with '*', with the '*'s removed.

What kind of guarantee do you have that the asterisk will only exist on
the first and last character, if at all?

Steven D'Aprano · Dec 29, 2009

Ed said:
Ed said:

I have a list call it 'l':

l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']

Notice that some of the items in the list start and end with an '*'. I
wish to construct a new list, call it 'n' which is all the members of l
that start and end with '*', with the '*'s removed.

Click to expand...

What kind of guarantee do you have that the asterisk will only exist on
the first and last character, if at all?

Does it matter?

In any case, surely the simplest solution is to eschew regular
expressions and do it the easy way.

result = [s[1:-1] for s in l if s.startswith('*') and s.endswith('*')]

For a more general solution, I'd use a pair of helper functions:

def bracketed_by(s, prefix, suffix=None):
if suffix is None:
suffix = prefix
return s.startswith(prefix) and s.endswith(suffix)

def strip_brackets(s, prefix, suffix=None):
if suffix is None:
suffix = prefix
return s[len(prefix):-len(suffix)]

Note that I haven't tested these two helper functions. The second in
particular may not work correctly in some corner cases (e.g. passing the
empty string as suffix).

Range / empty list issues??	1	Dec 11, 2023
Codeforces problem	0	Apr 25, 2022
Python code problem	2	Apr 23, 2023
Re for Apache log file format	4	Oct 8, 2013
Blue J Ciphertext Program	2	Nov 22, 2023
My Status, Ciphertext	2	Nov 28, 2023
Wrapping around a list	8	Nov 27, 2013
ChatGPT will make us Job(Home)less	3	Jan 22, 2023

a list/re problem

Ed Keith

Grant Edwards

Neil Cerutti

Peter Otten

Grant Edwards

Matt Nordhoff

Steven D'Aprano

Lie Ryan

Lie Ryan

Nobody

Neil Cerutti

Aahz

Steven D'Aprano

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads