Extract all words that begin with x

Jimbo · May 10, 2010

Hello

I am trying to find if there is a string OR list function that will
search a list of strings for all the strings that start with 'a' &
return a new list containing all the strings that started with 'a'.

I have had a search of Python site & I could not find what I am
looking for, does a function like this exist?

The only one that I think could work is, use the string
function .count()

algorithm: to exract all the words inside a list that start with 'a'
- make sure the list is arranged in alphabetical order
- convert the list to a big string
- use list_string.count(',a') to obtain the index where the last 'a'
word occurs
- convert back into string (yes this is a REAL hack

)
- and then create a new list, ie, new_list = list[0:32]
- & return new_list

Ok that algorithm is terrible, I know, & I just realise that it wont
work for letters after 'a'. So if anyone could suggest a function or
algorithm it would be extremely helpful

James Mills · May 10, 2010

Have I missed something, or wouldn't this work just as well:

list_of_strings = ['2', 'awes', '3465sdg', 'dbsdf', 'asdgas']
[word for word in list_of_strings if word[0] == 'a']

Click to expand...

Click to expand...

['awes', 'asdgas']

I would do this for completeness (just in case):

[word for word in list_of_strings if word and word[0] == 'a']

Click to expand...

Click to expand...

Just guards against empty strings which may or may not be in the list.

--James

Aahz · May 10, 2010

Have I missed something, or wouldn't this work just as well:

list_of_strings = ['2', 'awes', '3465sdg', 'dbsdf', 'asdgas']
[word for word in list_of_strings if word[0] == 'a']

Click to expand...

['awes', 'asdgas']

Click to expand...

I would do this for completeness (just in case):

[word for word in list_of_strings if word and word[0] == 'a']

Click to expand...

Click to expand...

Just guards against empty strings which may or may not be in the list.

No need to do that with startswith():
False

You would only need to use your code if you suspected that some elements
might be None.

Terry Reedy · May 11, 2010

Have I missed something, or wouldn't this work just as well:

list_of_strings = ['2', 'awes', '3465sdg', 'dbsdf', 'asdgas']
[word for word in list_of_strings if word[0] == 'a']

Click to expand...

['awes', 'asdgas']

Click to expand...

I would do this for completeness (just in case):

[word for word in list_of_strings if word and word[0] == 'a']

Click to expand...

Click to expand...

Just guards against empty strings which may or may not be in the list.

... word[0:1] does the same thing. All Python programmers should learn
to use slicing to extract a char from a string that might be empty.
The method call of .startswith() will be slower, I am sure.

Tycho Andersen · May 11, 2010

Have I missed something, or wouldn't this work just as well:

list_of_strings = ['2', 'awes', '3465sdg', 'dbsdf', 'asdgas']
[word for word in list_of_strings if word[0] == 'a']

['awes', 'asdgas']

Click to expand...

I would do this for completeness (just in case):

[word for word in list_of_strings if word and word[0] == 'a']

Click to expand...

Just guards against empty strings which may or may not be in the list.

Click to expand...

... word[0:1] does the same thing. All Python programmers should learn to
use slicing to extract a char from a string that might be empty.
The method call of .startswith() will be slower, I am sure.

Why? Isn't slicing just sugar for a method call?

\t

Aahz · May 11, 2010

Have I missed something, or wouldn't this work just as well:

list_of_strings = ['2', 'awes', '3465sdg', 'dbsdf', 'asdgas']
[word for word in list_of_strings if word[0] == 'a']
['awes', 'asdgas']

Click to expand...

I would do this for completeness (just in case):

[word for word in list_of_strings if word and word[0] == 'a']

Click to expand...

Just guards against empty strings which may or may not be in the list.

Click to expand...

... word[0:1] does the same thing. All Python programmers should learn
to use slicing to extract a char from a string that might be empty.
The method call of .startswith() will be slower, I am sure.

And if it is slower, so what? Using startswith() makes for faster
reading of the code for me, and I'm sure I'm not the only one.

python · May 11, 2010

Terry,

... word[0:1] does the same thing. All Python programmers should learn to use slicing to extract a char from a string that might be empty.

Is there an equivalent way to slice the last char from a string (similar
to an .endswith) that doesn't raise an exception when a string is empty?

Thanks,
Malcolm

Jerry Hill · May 11, 2010

Is there an equivalent way to slice the last char from a string (similar
to an .endswith) that doesn't raise an exception when a string is empty?

If you use negative indexes in the slice, they refer to items from the
end of the sequence instead of the front. So slicing the last
character from the string would be:

word[-1:]

python · May 11, 2010

Superpollo,

word[len(word)-1:]

Perfect! Thank you,

Malcolm

python · May 11, 2010

Jerry,

If you use negative indexes in the slice, they refer to items from the end of the sequence instead of the front. So slicing the last character from the string would be:

word[-1:]

Perfect! Thank you,

Malcolm

James Mills · May 11, 2010

word[len(word)-1:]

Click to expand...

This works just as well:

word[-1:]

Click to expand...

Click to expand...

cheers
James

jim-on-linux · May 11, 2010

python help,

I'm open for suggestions.

I'm using py2exe to compile a working program.

The program runs and prints fine until I compile it with py2exe.

After compiling the program, it runs fine until it tries to import
the win32ui module, v2.6214.0.

Then, I get a windows error message:

ImportError: Dll load failed:
This application has failed to start because
the application configuration is incorrect.
Reinstalling the application may fix this problem.

Anyone have the same problem with this?.

jim-on linux

Bryan · May 11, 2010

Tycho said:
Terry said:

... word[0:1] does the same thing. All Python programmers should learn to
use slicing to extract a char from a string that might be empty.
The method call of .startswith() will be slower, I am sure.

Click to expand...

Why? Isn't slicing just sugar for a method call?

Yes, but finding the method doesn't require looking it up by name at
run-time, and startswith is built to work for startings of any length.

Let's timeit:

# -----
from timeit import Timer
from random import choice
from string import ascii_lowercase as letters

strs = [''.join([choice(letters) for _ in range(5)])
for _ in range(5000)]

way1 = "[s for s in strs if s.startswith('a')]"
way2 = "[s for s in strs if s[:1] == 'a']"

assert eval(way1) == eval(way2)

for way in [way1, way2]:
t = Timer(way, 'from __main__ import strs')
print(way, ' took: ', t.timeit(1000))

# -----

On my particular box, I get:

[s for s in strs if s.startswith('a')] took: 5.43566498797
[s for s in strs if s[:1] == 'a'] took: 3.20704924968

So Terry Reedy was right: startswith() is slower. I would,
nevertheless, use startswith(). Later, if users want my program to run
faster and my profiling shows a lot of the run-time is spent finding
words that start with 'a', I might switch.

jim-on-linux · May 12, 2010

I appreciate the help, it's working.

jim-on-linux

Terry Reedy · May 12, 2010

Tycho said:
Tycho said:

Terry said:

... word[0:1] does the same thing. All Python programmers should learn to
use slicing to extract a char from a string that might be empty.
The method call of .startswith() will be slower, I am sure.

Click to expand...

Why? Isn't slicing just sugar for a method call?

Click to expand...

Yes, but finding the method doesn't require looking it up by name at
run-time, and startswith is built to work for startings of any length.

Let's timeit:

# -----
from timeit import Timer
from random import choice
from string import ascii_lowercase as letters

strs = [''.join([choice(letters) for _ in range(5)])
for _ in range(5000)]

way1 = "[s for s in strs if s.startswith('a')]"
way2 = "[s for s in strs if s[:1] == 'a']"

assert eval(way1) == eval(way2)

for way in [way1, way2]:
t = Timer(way, 'from __main__ import strs')
print(way, ' took: ', t.timeit(1000))

# -----

On my particular box, I get:

[s for s in strs if s.startswith('a')] took: 5.43566498797
[s for s in strs if s[:1] == 'a'] took: 3.20704924968

So Terry Reedy was right: startswith() is slower. I would,
nevertheless, use startswith(). Later, if users want my program to run
faster and my profiling shows a lot of the run-time is spent finding
words that start with 'a', I might switch.

Thank you for that timing report.

My main point is that there are two ways to fetch a char, the difference
being the error return -- exception IndexError versus error value ''.
This is an example of out-of-band versus in-band error/exception
signaling, which programmers, especially of Python, should understand.

The fact that in Python syntax tends to be faster than calls was
secondary, though good to know on occasion.

..startswith and .endswith are methods that wrap the special cases of
slice at an end and compare to one value. There are not necessary, and
save no keystrokes, but Guido obviously thought they added enough to
more than balance the slight expansion of the language. They were added
after I learned Python and I thought the tradeoff to be a toss-up, but I
will consider using the methods when writing didactic code meant to be
read by others.

Terry Jan Reedy

Aahz · May 12, 2010

.startswith and .endswith are methods that wrap the special cases of
slice at an end and compare to one value. There are not necessary, and
save no keystrokes, but Guido obviously thought they added enough to
more than balance the slight expansion of the language. They were added
after I learned Python and I thought the tradeoff to be a toss-up, but I
will consider using the methods when writing didactic code meant to be
read by others.

They were also added after I learned Python, and I think they're great!
Using them signals that you're doing simple string checking rather than
some more arcane slicing.

Bryan · May 12, 2010

Terry said:
Thank you for that timing report.

Enjoyed doing it, and more on that below.

My main point is that there are two ways to fetch a char, the difference
being the error return -- exception IndexError versus error value ''.
This is an example of out-of-band versus in-band error/exception
signaling, which programmers, especially of Python, should understand.

Sure. I think your posts and the bits of Python code they contain are
great. Slicing is amazingly useful, and it helps that slices don't
barf just because 'start' or 'stop' falls outside the index range.

My main point was to show off how Python and its standard library make
answering the which-is-faster question easy. I think that's another
thing Python programmers should understand, even though I just learned
how today.

Now back to the arguably-interesting issue of speed in the particular
problem here: 'Superpollo' had suggested another variant, which I
appended to my timeit targets, resulting in:

[s for s in strs if s.startswith('a')] took: 5.68393977159
[s for s in strs if s[:1] == 'a'] took: 3.31676491502
[s for s in strs if s and s[0] == 'a'] took: 2.29392950076

Superpollo's condition -- s and s[0] == 'a' -- is the fastest of the
three.

What's more, in my timeit script the strings in the list are all of
length 5, so the 'and' never gets to short-circuit. If a major portion
of the strings are in fact empty superpollo's condition should do even
better. But I didn't test and time that. Yet.

-Bryan Olson

# ----- timeit code -----

from random import choice
from string import ascii_lowercase as letters
from timeit import Timer

strs = [''.join([choice(letters) for _ in range(5)])
for _ in range(5000)]

way1 = "[s for s in strs if s.startswith('a')]"
way2 = "[s for s in strs if s[:1] == 'a']"
way3 = "[s for s in strs if s and s[0] == 'a']"

assert eval(way1) == eval(way2) == eval(way3)

for way in [way1, way2, way3]:
t = Timer(way, 'from __main__ import strs')
print(way, ' took: ', t.timeit(1000))

Stefan Behnel · May 12, 2010

Bryan, 12.05.2010 08:55:

Now back to the arguably-interesting issue of speed in the particular
problem here: 'Superpollo' had suggested another variant, which I
appended to my timeit targets, resulting in:

[s for s in strs if s.startswith('a')] took: 5.68393977159
[s for s in strs if s[:1] == 'a'] took: 3.31676491502
[s for s in strs if s and s[0] == 'a'] took: 2.29392950076

Superpollo's condition -- s and s[0] == 'a' -- is the fastest of the
three.

Just out of curiosity, I ran the same code in the latest Cython pre-0.13
and added some optimised Cython implementations. Here's the code:

def cython_way0(l):
return [ s for s in l if s.startswith(u'a') ]

def cython_way1(list l):
cdef unicode s
return [ s for s in l if s.startswith(u'a') ]

def cython_way2(list l):
cdef unicode s
return [ s for s in l if s[:1] == u'a' ]

def cython_way3(list l):
cdef unicode s
return [ s for s in l if s[0] == u'a' ]

def cython_way4(list l):
cdef unicode s
return [ s for s in l if s and s[0] == u'a' ]

def cython_way5(list l):
cdef unicode s
return [ s for s in l if (<Py_UNICODE>s[0]) == u'a' ]

def cython_way6(list l):
cdef unicode s
return [ s for s in l if s and (<Py_UNICODE>s[0]) == u'a' ]

And here are the numbers (plain Python 2.6.5 first):

[s for s in strs if s.startswith(u'a')] took: 1.04618620872
[s for s in strs if s[:1] == u'a'] took: 0.518909931183
[s for s in strs if s and s[0] == u'a'] took: 0.617404937744

cython_way0(strs) took: 0.769457817078
cython_way1(strs) took: 0.0861849784851
cython_way2(strs) took: 0.208586931229
cython_way3(strs) took: 0.18615603447
cython_way4(strs) took: 0.190477132797
cython_way5(strs) took: 0.0366449356079
cython_way6(strs) took: 0.0368368625641

Personally, I think the cast to Py_UNICODE in the last two implementations
shouldn't be required, that should happen automatically, so that way3/4
runs equally fast as way5/6. I'll add that when I get to it.

Note that unicode.startswith() is optimised in Cython, so it's a pretty
fast option, too. Also note that the best speed-up here is only a factor of
14, so plain Python is quite competitive, unless the list is huge and this
is really a bottleneck in an application.

Stefan

Stefan Behnel · May 12, 2010

superpollo, 11.05.2010 17:03:

Aahz ha scritto:

On 5/10/2010 5:35 AM, James Mills wrote:
Have I missed something, or wouldn't this work just as well:

list_of_strings = ['2', 'awes', '3465sdg', 'dbsdf', 'asdgas']
[word for word in list_of_strings if word[0] == 'a']
['awes', 'asdgas']
I would do this for completeness (just in case):

[word for word in list_of_strings if word and word[0] == 'a']
Just guards against empty strings which may or may not be in the list.
... word[0:1] does the same thing. All Python programmers should
learn to use slicing to extract a char from a string that might be
empty.
The method call of .startswith() will be slower, I am sure.

Click to expand...

And if it is slower, so what? Using startswith() makes for faster
reading of the code for me, and I'm sure I'm not the only one.

Click to expand...

also, what if the OP intended "words that begin with x" with x a string
(as opposed to a single character) ?

word[:len(x)] == x

will work in that case.

Stefan

Aahz · May 12, 2010

superpollo, 11.05.2010 17:03:

Aahz ha scritto:

Have I missed something, or wouldn't this work just as well:

list_of_strings = ['2', 'awes', '3465sdg', 'dbsdf', 'asdgas']
[word for word in list_of_strings if word[0] == 'a']
['awes', 'asdgas']
I would do this for completeness (just in case):

[word for word in list_of_strings if word and word[0] == 'a']
Just guards against empty strings which may or may not be in the list.
... word[0:1] does the same thing. All Python programmers should
learn to use slicing to extract a char from a string that might be
empty.
The method call of .startswith() will be slower, I am sure.

And if it is slower, so what? Using startswith() makes for faster
reading of the code for me, and I'm sure I'm not the only one.

Click to expand...

also, what if the OP intended "words that begin with x" with x a string
(as opposed to a single character) ?

Click to expand...

word[:len(x)] == x

will work in that case.

But that's now going to be slower. ;-) (Unless one makes the obvious
optimization to hoist len(x) out of the loop.)

SQL Problem Using Extract Command	0	Apr 8, 2022
RegExp - Match specific words, but not if they're inside parenthesis (with or without other words within)	6	Jan 29, 2023
Single put routine overlapping words during iteration	4	Jan 2, 2023
Hi, I am a webflow user. I am looking for CSS code that can KEEP ALL ELEMENTS POSITIONED in the SAME spot across all resolutions	0	Oct 27, 2023
How can I guarantee that the all callback functions of the first Ajax API call have finished executing before initiating the 2 call in JavaScript?	2	Oct 30, 2023
How to loop in folder through all excel files and all sheets using pandas?	0	Dec 1, 2022
I Need Help with making a function that draws in a canvas using location data.	1	Dec 17, 2021
I have to finish this code for my assignment but I cant figure out how to solve it	1	Jun 27, 2023

Extract all words that begin with x

Jimbo

James Mills

Aahz

Terry Reedy

Tycho Andersen

Aahz

python

Jerry Hill

python

python

James Mills

jim-on-linux

Bryan

jim-on-linux

Terry Reedy

Aahz

Bryan

Stefan Behnel

Stefan Behnel

Aahz

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads