Good use for itertools.dropwhile and itertools.takewhile

Discussion in 'Python' started by Nick Mellor, Dec 4, 2012.

  1. Nick Mellor

    Nick Mellor Guest

    Hi,

    I came across itertools.dropwhile only today, then shortly afterwards found Raymond Hettinger wondering, in 2007, whether to drop [sic] dropwhile and takewhile from the itertools module.

    Fate of itertools.dropwhile() and itertools.takewhile() - Python
    bytes.com
    http://bit.ly/Vi2PqP

    Almost nobody else of the 18 respondents seemed to be using them.

    And then 2 hours later, a use case came along. I think. Anyone have any better solutions?

    I have a file full of things like this:

    "CAPSICUM RED fresh from Queensland"

    Product names (all caps, at start of string) and descriptions (mixed case, to end of string) all muddled up in the same field. And I need to split them into two fields. Note that if the text had said:

    "CAPSICUM RED fresh from QLD"

    I would want QLD in the description, not shunted forwards and put in the product name. So (uncontrived) list comprehensions and regex's are out.

    I want to split the above into:

    ("CAPSICUM RED", "fresh from QLD")

    Enter dropwhile and takewhile. 6 lines later:

    from itertools import takewhile, dropwhile
    def split_product_itertools(s):
    words = s.split()
    allcaps = lambda word: word == word.upper()
    product, description = takewhile(allcaps, words), dropwhile(allcaps, words)
    return " ".join(product), " ".join(description)


    When I tried to refactor this code to use while or for loops, I couldn't find any way that felt shorter or more pythonic:

    (9 lines: using for)

    def split_product_1(s):
    words = s.split()
    product = []
    for word in words:
    if word == word.upper():
    product.append(word)
    else:
    break
    return " ".join(product), " ".join(words[len(product):])


    (12 lines: using while)

    def split_product_2(s):
    words = s.split()
    i = 0
    product = []
    while 1:
    word = words
    if word == word.upper():
    product.append(word)
    i += 1
    else:
    break
    return " ".join(product), " ".join(words[i:])


    Any thoughts?

    Nick
     
    Nick Mellor, Dec 4, 2012
    #1
    1. Advertising

  2. Nick Mellor

    Neil Cerutti Guest

    On 2012-12-04, Nick Mellor <> wrote:
    > I have a file full of things like this:
    >
    > "CAPSICUM RED fresh from Queensland"
    >
    > Product names (all caps, at start of string) and descriptions
    > (mixed case, to end of string) all muddled up in the same
    > field. And I need to split them into two fields. Note that if
    > the text had said:
    >
    > "CAPSICUM RED fresh from QLD"
    >
    > I would want QLD in the description, not shunted forwards and
    > put in the product name. So (uncontrived) list comprehensions
    > and regex's are out.
    >
    > I want to split the above into:
    >
    > ("CAPSICUM RED", "fresh from QLD")
    >
    > Enter dropwhile and takewhile. 6 lines later:
    >
    > from itertools import takewhile, dropwhile
    > def split_product_itertools(s):
    > words = s.split()
    > allcaps = lambda word: word == word.upper()
    > product, description = takewhile(allcaps, words), dropwhile(allcaps, words)
    > return " ".join(product), " ".join(description)
    >
    > When I tried to refactor this code to use while or for loops, I
    > couldn't find any way that felt shorter or more pythonic:


    I'm really tempted to import re, and that means takewhile and
    dropwhile need to stay. ;)

    But seriously, this is a quick implementation of my first thought.

    description = s.lstrip(string.ascii_uppercase + ' ')
    product = s[:-len(description)-1]

    --
    Neil Cerutti
     
    Neil Cerutti, Dec 4, 2012
    #2
    1. Advertising

  3. Nick Mellor

    Nick Mellor Guest

    Hi Neil,

    Nice! But fails if the first word of the description starts with a capital letter.

    Nick


    On Wednesday, 5 December 2012 01:23:34 UTC+11, Neil Cerutti wrote:
    > On 2012-12-04, Nick Mellor <> wrote:
    >
    > > I have a file full of things like this:

    >
    > >

    >
    > > "CAPSICUM RED fresh from Queensland"

    >
    > >

    >
    > > Product names (all caps, at start of string) and descriptions

    >
    > > (mixed case, to end of string) all muddled up in the same

    >
    > > field. And I need to split them into two fields. Note that if

    >
    > > the text had said:

    >
    > >

    >
    > > "CAPSICUM RED fresh from QLD"

    >
    > >

    >
    > > I would want QLD in the description, not shunted forwards and

    >
    > > put in the product name. So (uncontrived) list comprehensions

    >
    > > and regex's are out.

    >
    > >

    >
    > > I want to split the above into:

    >
    > >

    >
    > > ("CAPSICUM RED", "fresh from QLD")

    >
    > >

    >
    > > Enter dropwhile and takewhile. 6 lines later:

    >
    > >

    >
    > > from itertools import takewhile, dropwhile

    >
    > > def split_product_itertools(s):

    >
    > > words = s.split()

    >
    > > allcaps = lambda word: word == word.upper()

    >
    > > product, description = takewhile(allcaps, words), dropwhile(allcaps, words)

    >
    > > return " ".join(product), " ".join(description)

    >
    > >

    >
    > > When I tried to refactor this code to use while or for loops, I

    >
    > > couldn't find any way that felt shorter or more pythonic:

    >
    >
    >
    > I'm really tempted to import re, and that means takewhile and
    >
    > dropwhile need to stay. ;)
    >
    >
    >
    > But seriously, this is a quick implementation of my first thought.
    >
    >
    >
    > description = s.lstrip(string.ascii_uppercase + ' ')
    >
    > product = s[:-len(description)-1]
    >
    >
    >
    > --
    >
    > Neil Cerutti
     
    Nick Mellor, Dec 4, 2012
    #3
  4. Nick Mellor

    Neil Cerutti Guest

    On 2012-12-04, Nick Mellor <> wrote:
    > Hi Neil,
    >
    > Nice! But fails if the first word of the description starts
    > with a capital letter.


    Darn edge cases.

    --
    Neil Cerutti
     
    Neil Cerutti, Dec 4, 2012
    #4
  5. Nick Mellor

    Nick Mellor Guest

    I love the way you guys can write a line of code that does the same as 20 of mine :)

    I can turn up the heat on your regex by feeding it a null description or multiple white space (both in the original file.) I'm sure you'd adjust, but at the cost of a more complex regex.

    Meanwhile takewith and dropwith are behaving themselves impeccably but my while loop has fallen over.

    Best,

    Nick

    On Wednesday, 5 December 2012 01:31:48 UTC+11, Vlastimil Brom wrote:
    > 2012/12/4 Nick Mellor <>:
    >
    > > Hi,

    >
    > >

    >
    > > I came across itertools.dropwhile only today, then shortly afterwards found Raymond Hettinger wondering, in 2007, whether to drop [sic] dropwhile and takewhile from the itertools module.

    >
    > >

    >
    > > Fate of itertools.dropwhile() and itertools.takewhile() - Python

    >
    > > bytes.com

    >
    > > http://bit.ly/Vi2PqP

    >
    > >

    >
    > > Almost nobody else of the 18 respondents seemed to be using them.

    >
    > >

    >
    > > And then 2 hours later, a use case came along. I think. Anyone have any better solutions?

    >
    > >

    >
    > > I have a file full of things like this:

    >
    > >

    >
    > > "CAPSICUM RED fresh from Queensland"

    >
    > >

    >
    > > Product names (all caps, at start of string) and descriptions (mixed case, to end of string) all muddled up in the same field. And I need to split them into two fields. Note that if the text had said:

    >
    > >

    >
    > > "CAPSICUM RED fresh from QLD"

    >
    > >

    >
    > > I would want QLD in the description, not shunted forwards and put in the product name. So (uncontrived) list comprehensions and regex's are out.

    >
    > >

    >
    > > I want to split the above into:

    >
    > >

    >
    > > ("CAPSICUM RED", "fresh from QLD")

    >
    > >

    >
    > > Enter dropwhile and takewhile. 6 lines later:

    >
    > >

    >
    > > from itertools import takewhile, dropwhile

    >
    > > def split_product_itertools(s):

    >
    > > words = s.split()

    >
    > > allcaps = lambda word: word == word.upper()

    >
    > > product, description = takewhile(allcaps, words), dropwhile(allcaps, words)

    >
    > > return " ".join(product), " ".join(description)

    >
    > >

    >
    > >

    >
    > > When I tried to refactor this code to use while or for loops, I couldn't find any way that felt shorter or more pythonic:

    >
    > >

    >
    > > (9 lines: using for)

    >
    > >

    >
    > > def split_product_1(s):

    >
    > > words = s.split()

    >
    > > product = []

    >
    > > for word in words:

    >
    > > if word == word.upper():

    >
    > > product.append(word)

    >
    > > else:

    >
    > > break

    >
    > > return " ".join(product), " ".join(words[len(product):])

    >
    > >

    >
    > >

    >
    > > (12 lines: using while)

    >
    > >

    >
    > > def split_product_2(s):

    >
    > > words = s.split()

    >
    > > i = 0

    >
    > > product = []

    >
    > > while 1:

    >
    > > word = words

    >
    > > if word == word.upper():

    >
    > > product.append(word)

    >
    > > i += 1

    >
    > > else:

    >
    > > break

    >
    > > return " ".join(product), " ".join(words[i:])

    >
    > >

    >
    > >

    >
    > > Any thoughts?

    >
    > >

    >
    > > Nick

    >
    > > --

    >
    > > http://mail.python.org/mailman/listinfo/python-list

    >
    >
    >
    > Hi,
    >
    > the regex approach doesn't actually seem to be very complex, given the
    >
    > mentioned specification, e.g.
    >
    >
    >
    > >>> import re

    >
    > >>> re.findall(r"(?m)^([A-Z\s]+) (.+)$", "CAPSICUM RED fresh from QLD\nCAPSICUM RED fresh from Queensland")

    >
    > [('CAPSICUM RED', 'fresh from QLD'), ('CAPSICUM RED', 'fresh from Queensland')]
    >
    > >>>

    >
    >
    >
    > (It might be necessary to account for some punctuation, whitespace etc. too.)
    >
    >
    >
    > hth,
    >
    > vbr
     
    Nick Mellor, Dec 4, 2012
    #5
  6. Nick Mellor

    Nick Mellor Guest

    I love the way you guys can write a line of code that does the same as 20 of mine :)

    I can turn up the heat on your regex by feeding it a null description or multiple white space (both in the original file.) I'm sure you'd adjust, but at the cost of a more complex regex.

    Meanwhile takewith and dropwith are behaving themselves impeccably but my while loop has fallen over.

    Best,

    Nick

    On Wednesday, 5 December 2012 01:31:48 UTC+11, Vlastimil Brom wrote:
    > 2012/12/4 Nick Mellor <>:
    >
    > > Hi,

    >
    > >

    >
    > > I came across itertools.dropwhile only today, then shortly afterwards found Raymond Hettinger wondering, in 2007, whether to drop [sic] dropwhile and takewhile from the itertools module.

    >
    > >

    >
    > > Fate of itertools.dropwhile() and itertools.takewhile() - Python

    >
    > > bytes.com

    >
    > > http://bit.ly/Vi2PqP

    >
    > >

    >
    > > Almost nobody else of the 18 respondents seemed to be using them.

    >
    > >

    >
    > > And then 2 hours later, a use case came along. I think. Anyone have any better solutions?

    >
    > >

    >
    > > I have a file full of things like this:

    >
    > >

    >
    > > "CAPSICUM RED fresh from Queensland"

    >
    > >

    >
    > > Product names (all caps, at start of string) and descriptions (mixed case, to end of string) all muddled up in the same field. And I need to split them into two fields. Note that if the text had said:

    >
    > >

    >
    > > "CAPSICUM RED fresh from QLD"

    >
    > >

    >
    > > I would want QLD in the description, not shunted forwards and put in the product name. So (uncontrived) list comprehensions and regex's are out.

    >
    > >

    >
    > > I want to split the above into:

    >
    > >

    >
    > > ("CAPSICUM RED", "fresh from QLD")

    >
    > >

    >
    > > Enter dropwhile and takewhile. 6 lines later:

    >
    > >

    >
    > > from itertools import takewhile, dropwhile

    >
    > > def split_product_itertools(s):

    >
    > > words = s.split()

    >
    > > allcaps = lambda word: word == word.upper()

    >
    > > product, description = takewhile(allcaps, words), dropwhile(allcaps, words)

    >
    > > return " ".join(product), " ".join(description)

    >
    > >

    >
    > >

    >
    > > When I tried to refactor this code to use while or for loops, I couldn't find any way that felt shorter or more pythonic:

    >
    > >

    >
    > > (9 lines: using for)

    >
    > >

    >
    > > def split_product_1(s):

    >
    > > words = s.split()

    >
    > > product = []

    >
    > > for word in words:

    >
    > > if word == word.upper():

    >
    > > product.append(word)

    >
    > > else:

    >
    > > break

    >
    > > return " ".join(product), " ".join(words[len(product):])

    >
    > >

    >
    > >

    >
    > > (12 lines: using while)

    >
    > >

    >
    > > def split_product_2(s):

    >
    > > words = s.split()

    >
    > > i = 0

    >
    > > product = []

    >
    > > while 1:

    >
    > > word = words

    >
    > > if word == word.upper():

    >
    > > product.append(word)

    >
    > > i += 1

    >
    > > else:

    >
    > > break

    >
    > > return " ".join(product), " ".join(words[i:])

    >
    > >

    >
    > >

    >
    > > Any thoughts?

    >
    > >

    >
    > > Nick

    >
    > > --

    >
    > > http://mail.python.org/mailman/listinfo/python-list

    >
    >
    >
    > Hi,
    >
    > the regex approach doesn't actually seem to be very complex, given the
    >
    > mentioned specification, e.g.
    >
    >
    >
    > >>> import re

    >
    > >>> re.findall(r"(?m)^([A-Z\s]+) (.+)$", "CAPSICUM RED fresh from QLD\nCAPSICUM RED fresh from Queensland")

    >
    > [('CAPSICUM RED', 'fresh from QLD'), ('CAPSICUM RED', 'fresh from Queensland')]
    >
    > >>>

    >
    >
    >
    > (It might be necessary to account for some punctuation, whitespace etc. too.)
    >
    >
    >
    > hth,
    >
    > vbr
     
    Nick Mellor, Dec 4, 2012
    #6
  7. Alexander Blinne, Dec 4, 2012
    #7
  8. Nick Mellor

    Neil Cerutti Guest

    On 2012-12-04, Nick Mellor <> wrote:
    > I love the way you guys can write a line of code that does the
    > same as 20 of mine :)
    >
    > I can turn up the heat on your regex by feeding it a null
    > description or multiple white space (both in the original
    > file.) I'm sure you'd adjust, but at the cost of a more complex
    > regex.


    A re.split should be able to handle this without too much hassle.

    The simplicity of my two-line version will evaporate pretty
    quickly to compensate for edge cases.

    Here's one that can handle one of the edge cases you mention, but
    it's hardly any shorter than what you had, and it doesn't
    preserve non-standard whites space, like double spaces.

    def prod_desc(s):
    """split s into product name and product description. Product
    name is a series of one or more capitalized words followed
    by white space. Everything after the trailing white space is
    the product description.

    >>> prod_desc("CAR FIFTY TWO Chrysler LeBaron.")

    ['CAR FIFTY TWO', 'Chrysler LeBaron.']
    """
    prod = []
    desc = []
    target = prod
    for word in s.split():
    if target is prod and not word.isupper():
    target = desc
    target.append(word)
    return [' '.join(prod), ' '.join(desc)]

    When str methods fail I'll usually write my own parser before
    turning to re. The following is no longer nice looking at all.

    def prod_desc(s):
    """split s into product name and product description. Product
    name is a series of one or more capitalized words followed
    by white space. Everything after the trailing white space is
    the product description.

    >>> prod_desc("CAR FIFTY TWO Chrysler LeBaron.")

    ['CAR FIFTY TWO', 'Chrysler LeBaron.']

    >>> prod_desc("MR. JONESEY Saskatchewan's finest")

    ['MR. JONESEY', "Saskatchewan's finest"]
    """
    i = 0
    while not s.islower():
    i += 1
    i -= 1
    while not s.isspace():
    i -= 1
    start_desc = i+1
    while s.isspace():
    i -= 1
    end_prod = i+1
    return [s[:end_prod], s[start_desc:]]

    --
    Neil Cerutti
     
    Neil Cerutti, Dec 4, 2012
    #8
  9. Nick Mellor

    DJC Guest

    On 04/12/12 17:18, Alexander Blinne wrote:
    > Another neat solution with a little help from
    >
    > http://stackoverflow.com/questions/...st-element-of-a-list-which-makes-a-passed-fun
    >
    >>>> def split_product(p):

    > .... w = p.split(" ")
    > .... j = (i for i,v in enumerate(w) if v.upper() != v).next()
    > .... return " ".join(w[:j]), " ".join(w[j:])
    >

    Python 2.7.3 (default, Sep 26 2012, 21:51:14)
    [GCC 4.7.2] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> w1 = "CAPSICUM RED Fresh from Queensland"
    >>> w1.split()

    ['CAPSICUM', 'RED', 'Fresh', 'from', 'Queensland']
    >>> w = w1.split()


    >>> (i for i,v in enumerate(w) if v.upper() != v)

    <generator object <genexpr> at 0x18b1910>
    >>> (i for i,v in enumerate(w) if v.upper() != v).next()

    2

    Python 3.2.3 (default, Oct 19 2012, 19:53:16)

    >>> (i for i,v in enumerate(w) if v.upper() != v).next()

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    AttributeError: 'generator' object has no attribute 'next'
     
    DJC, Dec 4, 2012
    #9
  10. Am 04.12.2012 19:28, schrieb DJC:
    >>>> (i for i,v in enumerate(w) if v.upper() != v).next()

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in <module>
    > AttributeError: 'generator' object has no attribute 'next'


    Yeah, i saw this problem right after i sent the posting. It now is
    supposed to read like this

    >>> def split_product(p):

    .... w = p.split(" ")
    .... j = next(i for i,v in enumerate(w) if v.upper() != v)
    .... return " ".join(w[:j]), " ".join(w[j:])

    Greetings
     
    Alexander Blinne, Dec 4, 2012
    #10
  11. Nick Mellor

    Ian Kelly Guest

    On Tue, Dec 4, 2012 at 11:48 AM, Alexander Blinne <> wrote:

    > Am 04.12.2012 19:28, schrieb DJC:
    > >>>> (i for i,v in enumerate(w) if v.upper() != v).next()

    > > Traceback (most recent call last):
    > > File "<stdin>", line 1, in <module>
    > > AttributeError: 'generator' object has no attribute 'next'

    >
    > Yeah, i saw this problem right after i sent the posting. It now is
    > supposed to read like this
    >
    > >>> def split_product(p):

    > ... w = p.split(" ")
    > ... j = next(i for i,v in enumerate(w) if v.upper() != v)
    > ... return " ".join(w[:j]), " ".join(w[j:])
    >


    It still fails if the product description is empty.

    >>> split_product("CAPSICUM RED")

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "<stdin>", line 3, in split_product
    StopIteration

    I'm not meaning to pick on you; some of the other solutions in this thread
    also fail in that case.

    >>> re.findall(r"(?m)^([A-Z\s]+) (.+)$", "CAPSICUM RED")

    [('CAPSICUM', 'RED')]

    >>> prod_desc("CAPSICUM RED") # the second version from Neil's post

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "<stdin>", line 14, in prod_desc
    IndexError: string index out of range
     
    Ian Kelly, Dec 4, 2012
    #11
  12. Nick Mellor

    MRAB Guest

    On 2012-12-04 19:37, Ian Kelly wrote:
    > On Tue, Dec 4, 2012 at 11:48 AM, Alexander Blinne <
    > <mailto:>> wrote:
    >
    > Am 04.12.2012 19:28, schrieb DJC:
    > >>>> (i for i,v in enumerate(w) if v.upper() != v).next()

    > > Traceback (most recent call last):
    > > File "<stdin>", line 1, in <module>
    > > AttributeError: 'generator' object has no attribute 'next'

    >
    > Yeah, i saw this problem right after i sent the posting. It now is
    > supposed to read like this
    >
    > >>> def split_product(p):

    > ... w = p.split(" ")
    > ... j = next(i for i,v in enumerate(w) if v.upper() != v)
    > ... return " ".join(w[:j]), " ".join(w[j:])
    >
    >
    > It still fails if the product description is empty.
    >
    > >>> split_product("CAPSICUM RED")

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in <module>
    > File "<stdin>", line 3, in split_product
    > StopIteration
    >
    > I'm not meaning to pick on you; some of the other solutions in this
    > thread also fail in that case.
    >
    > >>> re.findall(r"(?m)^([A-Z\s]+) (.+)$", "CAPSICUM RED")

    > [('CAPSICUM', 'RED')]
    >

    That's easily fixed:

    >>> re.findall(r"(?m)^([A-Z\s]+)(?: (.*))?$", "CAPSICUM RED")

    [('CAPSICUM RED', '')]

    > >>> prod_desc("CAPSICUM RED") # the second version from Neil's post

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in <module>
    > File "<stdin>", line 14, in prod_desc
    > IndexError: string index out of range
    >
     
    MRAB, Dec 4, 2012
    #12
  13. Am 04.12.2012 20:37, schrieb Ian Kelly:
    > >>> def split_product(p):

    > ... w = p.split(" ")
    > ... j = next(i for i,v in enumerate(w) if v.upper() != v)
    > ... return " ".join(w[:j]), " ".join(w[j:])
    >
    >
    > It still fails if the product description is empty.


    That's true... let's see, next() takes a default value in case the
    iterator is empty and then we could use some special value and test for
    it. But i think it would be more elegant to just handle the excepten
    ourselves, so:

    >>> def split_product(p):

    .... w = p.split(" ")
    .... try:
    .... j = next(i for i,v in enumerate(w) if v.upper() != v)
    .... except StopIteration:
    .... return p, ''
    .... return " ".join(w[:j]), " ".join(w[j:])

    > I'm not meaning to pick on you; some of the other solutions in this
    > thread also fail in that case.


    It's ok, opening the eye for edge cases is always a good idea :)

    Greetings
     
    Alexander Blinne, Dec 4, 2012
    #13
  14. Nick Mellor

    Terry Reedy Guest

    On 12/4/2012 8:57 AM, Nick Mellor wrote:

    > I have a file full of things like this:
    >
    > "CAPSICUM RED fresh from Queensland"
    >
    > Product names (all caps, at start of string) and descriptions (mixed
    > case, to end of string) all muddled up in the same field. And I need
    > to split them into two fields. Note that if the text had said:
    >
    > "CAPSICUM RED fresh from QLD"
    >
    > I would want QLD in the description, not shunted forwards and put in
    > the product name. So (uncontrived) list comprehensions and regex's
    > are out.
    >
    > I want to split the above into:
    >
    > ("CAPSICUM RED", "fresh from QLD")
    >
    > Enter dropwhile and takewhile. 6 lines later:
    >
    > from itertools import takewhile, dropwhile
    > def split_product_itertools(s):
    > words = s.split()
    > allcaps = lambda word: word == word.upper()
    > product, description =\
    > takewhile(allcaps, words), dropwhile(allcaps, words)
    > return " ".join(product), " ".join(description)


    If the original string has no excess whitespace, description is what
    remains of s after product prefix is omitted. (Py 3 code)

    from itertools import takewhile
    def allcaps(word): return word == word.upper()

    def split_product_itertools(s):
    product = ' '.join(takewhile(allcaps, s.split()))
    return product, s[len(product)+1:]

    print(split_product_itertools("CAPSICUM RED fresh from QLD"))
    >>>

    ('CAPSICUM RED', 'fresh from QLD')

    Without that assumption, the same idea applies to the split list.

    def split_product_itertools(s):
    words = s.split()
    product = list(takewhile(allcaps, words))
    return ' '.join(product), ' '.join(words[len(product):])

    --
    Terry Jan Reedy
     
    Terry Reedy, Dec 4, 2012
    #14
  15. 2012/12/4 Nick Mellor <>:
    > I love the way you guys can write a line of code that does the same as 20 of mine :)
    > I can turn up the heat on your regex by feeding it a null description or multiple white space (both in the original file.) I'm sure you'd adjust, but at the cost of a more complex regex.
    > Meanwhile takewith and dropwith are behaving themselves impeccably but my while loop has fallen over.
    >
    > Best,
    > Nick
    >> [...]

    > --


    Hi,
    well, for what is it worth, both cases could be addressed quite
    easily, with little added complexity - e.g.: make the description part
    optional, allow multiple whitespace and enforce word boundary after
    the product name in order to get rid of the trailing whitespace in it:

    >>> re.findall(r"(?m)^([A-Z\s]+\b)(?:\s+(.*))?$", "CAPSICUM RED fresh from QLD\nCAPSICUM RED fresh from Queensland\nCAPSICUM RED")

    [('CAPSICUM RED', 'fresh from QLD'), ('CAPSICUM RED', 'fresh from
    Queensland'), ('CAPSICUM RED', '')]
    >>>


    However, it's certainly preferable to use a solution you are more
    comfortable with, e.g. the itertools one...

    regards,
    vbr
     
    Vlastimil Brom, Dec 4, 2012
    #15
  16. Ian,

    For the sanity of those of us reading this via Usenet using the Pan
    newsreader, could you please turn off HTML emailing? It's very
    distracting.

    Thanks,

    Steven


    On Tue, 04 Dec 2012 12:37:38 -0700, Ian Kelly wrote:

    [...]
    > <div class="gmail_quote">On Tue,
    > Dec 4, 2012 at 11:48 AM, Alexander Blinne <span dir="ltr">&lt;<a
    > href="mailto:"
    > target="_blank"></a>&gt;</span> wrote:<br><blockquote
    > class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc
    > solid;padding-left:1ex">
    >
    > Am 04.12.2012 19:28, schrieb DJC:<br> <div class="im">&gt;&gt;&gt;&gt;
    > (i for i,v in enumerate(w) if v.upper() != v).next()<br> &gt; Traceback
    > (most recent call last):<br> &gt;   File &quot;&lt;stdin&gt;&quot;, line
    > 1, in &lt;module&gt;<br> &gt; AttributeError: 'generator' object
    > has no attribute 'next'<br> <br>
    > </div>Yeah, i saw this problem right after i sent the posting. It now
    > is<br> supposed to read like this<br>
    > <div class="im"><br>
    > &gt;&gt;&gt; def split_product(p):<br> ...     w = p.split(&quot;
    > &quot;)<br> </div>...     j = next(i for i,v in enumerate(w) if
    > v.upper() != v)<br> <div class="im">...     return &quot;
    > &quot;.join(w[:j]), &quot;
    > &quot;.join(w[j:])<br></div></blockquote></div><br>It still fails if the
    > product description is empty.<br><br>&gt;&gt;&gt;
    > split_product(&quot;CAPSICUM RED&quot;)<br>
    >
    > Traceback (most recent call last):<br>  File &quot;&lt;stdin&gt;&quot;,
    > line 1, in &lt;module&gt;<br>  File &quot;&lt;stdin&gt;&quot;, line 3,
    > in split_product<br>StopIteration<br><br>I'm not meaning to pick on
    > you; some of the other solutions in this thread also fail in that
    > case.<br>
    >
    > <br>&gt;&gt;&gt; re.findall(r&quot;(?m)^([A-Z\s]+) (.+)$&quot;,
    > &quot;CAPSICUM RED&quot;)<br>[('CAPSICUM',
    > 'RED')]<br><br>&gt;&gt;&gt; prod_desc(&quot;CAPSICUM RED&quot;) 
    > # the second version from Neil's post<br>
    >
    > Traceback (most recent call last):<br>  File &quot;&lt;stdin&gt;&quot;,
    > line 1, in &lt;module&gt;<br>  File &quot;&lt;stdin&gt;&quot;, line 14,
    > in prod_desc<br>IndexError: string index out of range<br><br>



    --
    Steven
     
    Steven D'Aprano, Dec 4, 2012
    #16
  17. Nick Mellor

    Terry Reedy Guest

    On 12/4/2012 3:44 PM, Terry Reedy wrote:

    > If the original string has no excess whitespace, description is what
    > remains of s after product prefix is omitted. (Py 3 code)
    >
    > from itertools import takewhile
    > def allcaps(word): return word == word.upper()
    >
    > def split_product_itertools(s):
    > product = ' '.join(takewhile(allcaps, s.split()))
    > return product, s[len(product)+1:]
    >
    > print(split_product_itertools("CAPSICUM RED fresh from QLD"))
    > >>>

    > ('CAPSICUM RED', 'fresh from QLD')
    >
    > Without that assumption, the same idea applies to the split list.
    >
    > def split_product_itertools(s):
    > words = s.split()
    > product = list(takewhile(allcaps, words))
    > return ' '.join(product), ' '.join(words[len(product):])


    Because these slice rather than index, either works trivially on an
    empty description.

    print(split_product_itertools("CAPSICUM RED"))
    >>>

    ('CAPSICUM RED', '')



    --
    Terry Jan Reedy
     
    Terry Reedy, Dec 4, 2012
    #17
  18. Nick Mellor

    Nick Mellor Guest

    Hi Terry,

    For my money, and especially in your versions, despite several expert solutions using other features, itertools has it. It seems to me to need less nutting out than the other approaches. It's short, robust, has a minimum of symbols, uses simple expressions and is not overly clever. If we could just get used to using takewhile.

    takewhile mines for gold at the start of a sequence, dropwhile drops the dross at the start of a sequence.

    Thanks all for your interest and your help,

    Best,

    Nick

    Terry's implementations:

    > from itertools import takewhile
    >
    > def allcaps(word): return word == word.upper()
    >
    >
    >
    > def split_product_itertools(s):
    >
    > product = ' '.join(takewhile(allcaps, s.split()))
    >
    > return product, s[len(product)+1:]
    >
    >
    >
    > print(split_product_itertools("CAPSICUM RED fresh from QLD"))
    >
    > >>>

    >
    > ('CAPSICUM RED', 'fresh from QLD')
    >
    >
    >
    > [if there could be surplus whitespace], the same idea applies to the split list.
    >
    >
    >
    > def split_product_itertools(s):
    >
    > words = s.split()
    >
    > product = list(takewhile(allcaps, words))
    >
    > return ' '.join(product), ' '.join(words[len(product):])
    >
     
    Nick Mellor, Dec 5, 2012
    #18
  19. Nick Mellor

    Nick Mellor Guest

    Hi Terry,

    For my money, and especially in your versions, despite several expert solutions using other features, itertools has it. It seems to me to need less nutting out than the other approaches. It's short, robust, has a minimum of symbols, uses simple expressions and is not overly clever. If we could just get used to using takewhile.

    takewhile mines for gold at the start of a sequence, dropwhile drops the dross at the start of a sequence.

    Thanks all for your interest and your help,

    Best,

    Nick

    Terry's implementations:

    > from itertools import takewhile
    >
    > def allcaps(word): return word == word.upper()
    >
    >
    >
    > def split_product_itertools(s):
    >
    > product = ' '.join(takewhile(allcaps, s.split()))
    >
    > return product, s[len(product)+1:]
    >
    >
    >
    > print(split_product_itertools("CAPSICUM RED fresh from QLD"))
    >
    > >>>

    >
    > ('CAPSICUM RED', 'fresh from QLD')
    >
    >
    >
    > [if there could be surplus whitespace], the same idea applies to the split list.
    >
    >
    >
    > def split_product_itertools(s):
    >
    > words = s.split()
    >
    > product = list(takewhile(allcaps, words))
    >
    > return ' '.join(product), ' '.join(words[len(product):])
    >
     
    Nick Mellor, Dec 5, 2012
    #19
  20. Nick Mellor

    Neil Cerutti Guest

    On 2012-12-05, Nick Mellor <> wrote:
    > Hi Terry,
    >
    > For my money, and especially in your versions, despite several
    > expert solutions using other features, itertools has it. It
    > seems to me to need less nutting out than the other approaches.
    > It's short, robust, has a minimum of symbols, uses simple
    > expressions and is not overly clever. If we could just get used
    > to using takewhile.


    The main reason most of the solutions posted failed is lack of
    complete specification to work with while sumultaneously trying
    to make as tiny and simplistic a solution as possible.

    I'm struggling with the empty description bug right now. ;)

    --
    Neil Cerutti
     
    Neil Cerutti, Dec 5, 2012
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Steven Bethard
    Replies:
    0
    Views:
    399
    Steven Bethard
    Mar 12, 2005
  2. Raymond Hettinger
    Replies:
    17
    Views:
    552
    Simon Brunning
    Feb 18, 2008
  3. Rajanikanth Jammalamadaka

    dropwhile question

    Rajanikanth Jammalamadaka, Aug 23, 2008, in forum: Python
    Replies:
    2
    Views:
    390
    Rajanikanth Jammalamadaka
    Aug 24, 2008
  4. Wolfgang Maier

    itertools.filterfalse - what is it good for

    Wolfgang Maier, Mar 8, 2013, in forum: Python
    Replies:
    3
    Views:
    130
    Miki Tebeka
    Mar 9, 2013
  5. Terry Reedy
    Replies:
    0
    Views:
    118
    Terry Reedy
    Mar 9, 2013
Loading...

Share This Page