Extract all words that begin with x

J

Jimbo

Hello

I am trying to find if there is a string OR list function that will
search a list of strings for all the strings that start with 'a' &
return a new list containing all the strings that started with 'a'.

I have had a search of Python site & I could not find what I am
looking for, does a function like this exist?

The only one that I think could work is, use the string
function .count()

algorithm: to exract all the words inside a list that start with 'a'
- make sure the list is arranged in alphabetical order
- convert the list to a big string
- use list_string.count(',a') to obtain the index where the last 'a'
word occurs
- convert back into string (yes this is a REAL hack :p)
- and then create a new list, ie, new_list = list[0:32]
- & return new_list

Ok that algorithm is terrible, I know, & I just realise that it wont
work for letters after 'a'. So if anyone could suggest a function or
algorithm it would be extremely helpful
 
J

James Mills

Have I missed something, or wouldn't this work just as well:
list_of_strings = ['2', 'awes', '3465sdg', 'dbsdf', 'asdgas']
[word for word in list_of_strings if word[0] == 'a']
['awes', 'asdgas']

I would do this for completeness (just in case):
[word for word in list_of_strings if word and word[0] == 'a']

Just guards against empty strings which may or may not be in the list.

--James
 
A

Aahz

Have I missed something, or wouldn't this work just as well:
list_of_strings = ['2', 'awes', '3465sdg', 'dbsdf', 'asdgas']
[word for word in list_of_strings if word[0] == 'a']
['awes', 'asdgas']

I would do this for completeness (just in case):
[word for word in list_of_strings if word and word[0] == 'a']

Just guards against empty strings which may or may not be in the list.

No need to do that with startswith():
False

You would only need to use your code if you suspected that some elements
might be None.
 
T

Terry Reedy

Have I missed something, or wouldn't this work just as well:
list_of_strings = ['2', 'awes', '3465sdg', 'dbsdf', 'asdgas']
[word for word in list_of_strings if word[0] == 'a']
['awes', 'asdgas']

I would do this for completeness (just in case):
[word for word in list_of_strings if word and word[0] == 'a']

Just guards against empty strings which may or may not be in the list.

... word[0:1] does the same thing. All Python programmers should learn
to use slicing to extract a char from a string that might be empty.
The method call of .startswith() will be slower, I am sure.
 
T

Tycho Andersen

Have I missed something, or wouldn't this work just as well:

list_of_strings = ['2', 'awes', '3465sdg', 'dbsdf', 'asdgas']
[word for word in list_of_strings if word[0] == 'a']

['awes', 'asdgas']

I would do this for completeness (just in case):
[word for word in list_of_strings if word and word[0] == 'a']

Just guards against empty strings which may or may not be in the list.

 ... word[0:1] does the same thing. All Python programmers should learn to
use slicing to extract a  char from a string that might be empty.
The method call of .startswith() will be slower, I am sure.

Why? Isn't slicing just sugar for a method call?

\t
 
A

Aahz

Have I missed something, or wouldn't this work just as well:

list_of_strings = ['2', 'awes', '3465sdg', 'dbsdf', 'asdgas']
[word for word in list_of_strings if word[0] == 'a']
['awes', 'asdgas']

I would do this for completeness (just in case):
[word for word in list_of_strings if word and word[0] == 'a']

Just guards against empty strings which may or may not be in the list.

... word[0:1] does the same thing. All Python programmers should learn
to use slicing to extract a char from a string that might be empty.
The method call of .startswith() will be slower, I am sure.

And if it is slower, so what? Using startswith() makes for faster
reading of the code for me, and I'm sure I'm not the only one.
 
P

python

Terry,
... word[0:1] does the same thing. All Python programmers should learn to use slicing to extract a char from a string that might be empty.

Is there an equivalent way to slice the last char from a string (similar
to an .endswith) that doesn't raise an exception when a string is empty?

Thanks,
Malcolm
 
J

Jerry Hill

Is there an equivalent way to slice the last char from a string (similar
to an .endswith) that doesn't raise an exception when a string is empty?

If you use negative indexes in the slice, they refer to items from the
end of the sequence instead of the front. So slicing the last
character from the string would be:

word[-1:]
 
P

python

Jerry,
If you use negative indexes in the slice, they refer to items from the end of the sequence instead of the front. So slicing the last character from the string would be:

word[-1:]

Perfect! Thank you,

Malcolm
 
J

jim-on-linux

python help,

I'm open for suggestions.

I'm using py2exe to compile a working program.

The program runs and prints fine until I compile it with py2exe.

After compiling the program, it runs fine until it tries to import
the win32ui module, v2.6214.0.

Then, I get a windows error message:

ImportError: Dll load failed:
This application has failed to start because
the application configuration is incorrect.
Reinstalling the application may fix this problem.


Anyone have the same problem with this?.

jim-on linux
 
B

Bryan

Tycho said:
Terry said:
 ... word[0:1] does the same thing. All Python programmers should learn to
use slicing to extract a  char from a string that might be empty.
The method call of .startswith() will be slower, I am sure.

Why? Isn't slicing just sugar for a method call?

Yes, but finding the method doesn't require looking it up by name at
run-time, and startswith is built to work for startings of any length.

Let's timeit:

# -----
from timeit import Timer
from random import choice
from string import ascii_lowercase as letters

strs = [''.join([choice(letters) for _ in range(5)])
for _ in range(5000)]

way1 = "[s for s in strs if s.startswith('a')]"
way2 = "[s for s in strs if s[:1] == 'a']"

assert eval(way1) == eval(way2)

for way in [way1, way2]:
t = Timer(way, 'from __main__ import strs')
print(way, ' took: ', t.timeit(1000))

# -----

On my particular box, I get:

[s for s in strs if s.startswith('a')] took: 5.43566498797
[s for s in strs if s[:1] == 'a'] took: 3.20704924968

So Terry Reedy was right: startswith() is slower. I would,
nevertheless, use startswith(). Later, if users want my program to run
faster and my profiling shows a lot of the run-time is spent finding
words that start with 'a', I might switch.
 
T

Terry Reedy

Tycho said:
Terry said:
... word[0:1] does the same thing. All Python programmers should learn to
use slicing to extract a char from a string that might be empty.
The method call of .startswith() will be slower, I am sure.

Why? Isn't slicing just sugar for a method call?

Yes, but finding the method doesn't require looking it up by name at
run-time, and startswith is built to work for startings of any length.

Let's timeit:

# -----
from timeit import Timer
from random import choice
from string import ascii_lowercase as letters

strs = [''.join([choice(letters) for _ in range(5)])
for _ in range(5000)]

way1 = "[s for s in strs if s.startswith('a')]"
way2 = "[s for s in strs if s[:1] == 'a']"

assert eval(way1) == eval(way2)

for way in [way1, way2]:
t = Timer(way, 'from __main__ import strs')
print(way, ' took: ', t.timeit(1000))

# -----

On my particular box, I get:

[s for s in strs if s.startswith('a')] took: 5.43566498797
[s for s in strs if s[:1] == 'a'] took: 3.20704924968

So Terry Reedy was right: startswith() is slower. I would,
nevertheless, use startswith(). Later, if users want my program to run
faster and my profiling shows a lot of the run-time is spent finding
words that start with 'a', I might switch.

Thank you for that timing report.

My main point is that there are two ways to fetch a char, the difference
being the error return -- exception IndexError versus error value ''.
This is an example of out-of-band versus in-band error/exception
signaling, which programmers, especially of Python, should understand.

The fact that in Python syntax tends to be faster than calls was
secondary, though good to know on occasion.

..startswith and .endswith are methods that wrap the special cases of
slice at an end and compare to one value. There are not necessary, and
save no keystrokes, but Guido obviously thought they added enough to
more than balance the slight expansion of the language. They were added
after I learned Python and I thought the tradeoff to be a toss-up, but I
will consider using the methods when writing didactic code meant to be
read by others.

Terry Jan Reedy
 
A

Aahz

.startswith and .endswith are methods that wrap the special cases of
slice at an end and compare to one value. There are not necessary, and
save no keystrokes, but Guido obviously thought they added enough to
more than balance the slight expansion of the language. They were added
after I learned Python and I thought the tradeoff to be a toss-up, but I
will consider using the methods when writing didactic code meant to be
read by others.

They were also added after I learned Python, and I think they're great!
Using them signals that you're doing simple string checking rather than
some more arcane slicing.
 
B

Bryan

Terry said:
Thank you for that timing report.

Enjoyed doing it, and more on that below.
My main point is that there are two ways to fetch a char, the difference
being the error return -- exception IndexError versus error value ''.
This is an example of out-of-band versus in-band error/exception
signaling, which programmers, especially of Python, should understand.

Sure. I think your posts and the bits of Python code they contain are
great. Slicing is amazingly useful, and it helps that slices don't
barf just because 'start' or 'stop' falls outside the index range.

My main point was to show off how Python and its standard library make
answering the which-is-faster question easy. I think that's another
thing Python programmers should understand, even though I just learned
how today.

Now back to the arguably-interesting issue of speed in the particular
problem here: 'Superpollo' had suggested another variant, which I
appended to my timeit targets, resulting in:

[s for s in strs if s.startswith('a')] took: 5.68393977159
[s for s in strs if s[:1] == 'a'] took: 3.31676491502
[s for s in strs if s and s[0] == 'a'] took: 2.29392950076

Superpollo's condition -- s and s[0] == 'a' -- is the fastest of the
three.

What's more, in my timeit script the strings in the list are all of
length 5, so the 'and' never gets to short-circuit. If a major portion
of the strings are in fact empty superpollo's condition should do even
better. But I didn't test and time that. Yet.

-Bryan Olson


# ----- timeit code -----

from random import choice
from string import ascii_lowercase as letters
from timeit import Timer

strs = [''.join([choice(letters) for _ in range(5)])
for _ in range(5000)]

way1 = "[s for s in strs if s.startswith('a')]"
way2 = "[s for s in strs if s[:1] == 'a']"
way3 = "[s for s in strs if s and s[0] == 'a']"

assert eval(way1) == eval(way2) == eval(way3)

for way in [way1, way2, way3]:
t = Timer(way, 'from __main__ import strs')
print(way, ' took: ', t.timeit(1000))
 
S

Stefan Behnel

Bryan, 12.05.2010 08:55:
Now back to the arguably-interesting issue of speed in the particular
problem here: 'Superpollo' had suggested another variant, which I
appended to my timeit targets, resulting in:

[s for s in strs if s.startswith('a')] took: 5.68393977159
[s for s in strs if s[:1] == 'a'] took: 3.31676491502
[s for s in strs if s and s[0] == 'a'] took: 2.29392950076

Superpollo's condition -- s and s[0] == 'a' -- is the fastest of the
three.

Just out of curiosity, I ran the same code in the latest Cython pre-0.13
and added some optimised Cython implementations. Here's the code:

def cython_way0(l):
return [ s for s in l if s.startswith(u'a') ]

def cython_way1(list l):
cdef unicode s
return [ s for s in l if s.startswith(u'a') ]

def cython_way2(list l):
cdef unicode s
return [ s for s in l if s[:1] == u'a' ]

def cython_way3(list l):
cdef unicode s
return [ s for s in l if s[0] == u'a' ]

def cython_way4(list l):
cdef unicode s
return [ s for s in l if s and s[0] == u'a' ]

def cython_way5(list l):
cdef unicode s
return [ s for s in l if (<Py_UNICODE>s[0]) == u'a' ]

def cython_way6(list l):
cdef unicode s
return [ s for s in l if s and (<Py_UNICODE>s[0]) == u'a' ]


And here are the numbers (plain Python 2.6.5 first):

[s for s in strs if s.startswith(u'a')] took: 1.04618620872
[s for s in strs if s[:1] == u'a'] took: 0.518909931183
[s for s in strs if s and s[0] == u'a'] took: 0.617404937744

cython_way0(strs) took: 0.769457817078
cython_way1(strs) took: 0.0861849784851
cython_way2(strs) took: 0.208586931229
cython_way3(strs) took: 0.18615603447
cython_way4(strs) took: 0.190477132797
cython_way5(strs) took: 0.0366449356079
cython_way6(strs) took: 0.0368368625641

Personally, I think the cast to Py_UNICODE in the last two implementations
shouldn't be required, that should happen automatically, so that way3/4
runs equally fast as way5/6. I'll add that when I get to it.

Note that unicode.startswith() is optimised in Cython, so it's a pretty
fast option, too. Also note that the best speed-up here is only a factor of
14, so plain Python is quite competitive, unless the list is huge and this
is really a bottleneck in an application.

Stefan
 
S

Stefan Behnel

superpollo, 11.05.2010 17:03:
Aahz ha scritto:
On 5/10/2010 5:35 AM, James Mills wrote:
Have I missed something, or wouldn't this work just as well:

list_of_strings = ['2', 'awes', '3465sdg', 'dbsdf', 'asdgas']
[word for word in list_of_strings if word[0] == 'a']
['awes', 'asdgas']
I would do this for completeness (just in case):

[word for word in list_of_strings if word and word[0] == 'a']
Just guards against empty strings which may or may not be in the list.
... word[0:1] does the same thing. All Python programmers should
learn to use slicing to extract a char from a string that might be
empty.
The method call of .startswith() will be slower, I am sure.

And if it is slower, so what? Using startswith() makes for faster
reading of the code for me, and I'm sure I'm not the only one.

also, what if the OP intended "words that begin with x" with x a string
(as opposed to a single character) ?

word[:len(x)] == x

will work in that case.

Stefan
 
A

Aahz

superpollo, 11.05.2010 17:03:
Aahz ha scritto:
Have I missed something, or wouldn't this work just as well:

list_of_strings = ['2', 'awes', '3465sdg', 'dbsdf', 'asdgas']
[word for word in list_of_strings if word[0] == 'a']
['awes', 'asdgas']
I would do this for completeness (just in case):

[word for word in list_of_strings if word and word[0] == 'a']
Just guards against empty strings which may or may not be in the list.
... word[0:1] does the same thing. All Python programmers should
learn to use slicing to extract a char from a string that might be
empty.
The method call of .startswith() will be slower, I am sure.

And if it is slower, so what? Using startswith() makes for faster
reading of the code for me, and I'm sure I'm not the only one.

also, what if the OP intended "words that begin with x" with x a string
(as opposed to a single character) ?

word[:len(x)] == x

will work in that case.

But that's now going to be slower. ;-) (Unless one makes the obvious
optimization to hoist len(x) out of the loop.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top