Newbie question - better way to do this?

E

Eric

I have some working code, but I realized it is just the way I would
write it in C, which means there is probably a better (more pythonic)
way of doing it.

Here's the section of code:

accumulate = firstIsCaps = False
accumStart = i = 0
while i < len(words):
firstIsCaps = firstIsCapitalized(words)
if firstIsCaps and not accumulate:
accumStart, accumulate = i, True
elif accumulate and not firstIsCaps:
doSomething(words[accumStart : i])
accumulate = False
i += 1

words is a big long array of strings. What I want to do is find
consecutive sequences of words that have the first letter capitalized,
and then call doSomething on them. (And you can ignore the fact that
it won't find a sequence at the very end of words, that is fine for my
purposes).

Thanks,
Eric
 
G

Gabriel Genellina

I have some working code, but I realized it is just the way I would
write it in C, which means there is probably a better (more pythonic)
way of doing it.

Here's the section of code:

accumulate = firstIsCaps = False
accumStart = i = 0
while i < len(words):
firstIsCaps = firstIsCapitalized(words)
if firstIsCaps and not accumulate:
accumStart, accumulate = i, True
elif accumulate and not firstIsCaps:
doSomething(words[accumStart : i])
accumulate = False
i += 1

words is a big long array of strings. What I want to do is find
consecutive sequences of words that have the first letter capitalized,
and then call doSomething on them. (And you can ignore the fact that
it won't find a sequence at the very end of words, that is fine for my
purposes).


Using groupby:

py> from itertools import groupby
py>
py> words = "Este es un Ejemplo. Los Ejemplos usualmente son tontos. Yo
siempre
escribo tonterias.".split()
py>
py> for upper, group in groupby(words, str.istitle):
.... if upper:
.... print list(group) # doSomething(list(group))
....
['Este']
['Ejemplo.', 'Los', 'Ejemplos']
['Yo']

You could replace your firstIsCapitalized function instead of the string
method istitle(), but I think it's the same. See
http://docs.python.org/lib/itertools-functions.html
 
S

Steven D'Aprano

words is a big long array of strings. What I want to do is find
consecutive sequences of words that have the first letter capitalized,
and then call doSomething on them. (And you can ignore the fact that
it won't find a sequence at the very end of words, that is fine for my
purposes).

Assuming the list of words will fit into memory, and you can probably
expect to fit anything up to millions of words comfortably into memory,
something like this might be suitable:

list_of_words = "lots of words go here".split()

accumulator = []
for word in list_of_words:
if word.istitle():
accumulator.append(word)
else:
doSomething(accumulator)
accumulator = []
 
J

John Machin

words is a big long array of strings. What I want to do is find
consecutive sequences of words that have the first letter capitalized,
and then call doSomething on them. (And you can ignore the fact that
it won't find a sequence at the very end of words, that is fine for my
purposes).

Assuming the list of words will fit into memory, and you can probably
expect to fit anything up to millions of words comfortably into memory,
something like this might be suitable:

list_of_words = "lots of words go here".split()

accumulator = []
for word in list_of_words:
if word.istitle():
accumulator.append(word)
else:
doSomething(accumulator)
accumulator = []
Bzzzt. Needs the following code at the end:
if accumulator:
doSomething(accumulator)
 
S

Steven D'Aprano

words is a big long array of strings. What I want to do is find
consecutive sequences of words that have the first letter capitalized,
and then call doSomething on them. (And you can ignore the fact that
it won't find a sequence at the very end of words, that is fine for my
purposes).

Assuming the list of words will fit into memory, and you can probably
expect to fit anything up to millions of words comfortably into memory,
something like this might be suitable:

list_of_words = "lots of words go here".split()

accumulator = []
for word in list_of_words:
if word.istitle():
accumulator.append(word)
else:
doSomething(accumulator)
accumulator = []
Bzzzt. Needs the following code at the end:
if accumulator:
doSomething(accumulator)


Bzzzt! Somebody didn't read the Original Poster's comment "And you can
ignore the fact that it won't find a sequence at the very end of words,
that is fine for my purposes".

Of course, for somebody whose requirements _aren't_ broken, you would be
completely right. Besides, I'm under no obligation to write all the O.P.'s
code for him, just point him in the right direction.
 
P

Paul Rubin

Eric said:
words is a big long array of strings. What I want to do is find
consecutive sequences of words that have the first letter capitalized,
and then call doSomething on them. (And you can ignore the fact that
it won't find a sequence at the very end of words, that is fine for my
purposes).

As another poster suggested, use itertools.groupby:

for cap,g in groupby(words, firstIsCapitalized):
if cap: doSomething(list(g))

This will handle sequences at the the end words just like other sequences.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,066
Latest member
VytoKetoReviews

Latest Threads

Top