Re: itertools.groupby

Discussion in 'Python' started by Peter Otten, Apr 21, 2013.

  1. Peter Otten

    Peter Otten Guest

    Jason Friedman wrote:

    > I have a file such as:
    >
    > $ cat my_data
    > Starting a new group
    > a
    > b
    > c
    > Starting a new group
    > 1
    > 2
    > 3
    > 4
    > Starting a new group
    > X
    > Y
    > Z
    > Starting a new group
    >
    > I am wanting a list of lists:
    > ['a', 'b', 'c']
    > ['1', '2', '3', '4']
    > ['X', 'Y', 'Z']
    > []
    >
    > I wrote this:
    > ------------------------------------
    > #!/usr/bin/python3
    > from itertools import groupby
    >
    > def get_lines_from_file(file_name):
    > with open(file_name) as reader:
    > for line in reader.readlines():


    readlines() slurps the whole file into memory! Don't do that, iterate over
    the file directly instead:

    for line in reader:

    > yield(line.strip())
    >
    > counter = 0
    > def key_func(x):
    > if x.startswith("Starting a new group"):
    > global counter
    > counter += 1
    > return counter
    >
    > for key, group in groupby(get_lines_from_file("my_data"), key_func):
    > print(list(group)[1:])
    > ------------------------------------
    >
    > I get the output I desire, but I'm wondering if there is a solution
    > without the global counter.


    If you were to drop the empty groups you could simplify it to

    def is_header(line):
    return line.startswith("Starting a new group")

    with open("my_data") as lines:
    stripped_lines = (line.strip() for line in lines)
    for header, group in itertools.groupby(stripped_lines, key=is_header):
    if not header:
    print(list(group))

    And here's a refactoring for your initial code. The main point is the use of
    nonlocal instead of global state to make the function reentrant.

    def split_groups(items, header):
    odd = True
    def group_key(item):
    nonlocal odd
    if header(item):
    odd = not odd
    return odd

    for _key, group in itertools.groupby(items, key=group_key):
    yield itertools.islice(group, 1, None)

    def is_header(line):
    return line.startswith("Starting a new group")

    with open("my_data") as lines:
    stripped_lines = map(str.strip, lines)
    for group in split_groups(stripped_lines, header=is_header):
    print(list(group))

    One remaining problem with that code is that it will silently drop the first
    line of the file if it doesn't start with a header:

    $ cat my_data
    alpha
    beta
    gamma
    Starting a new group
    a
    b
    c
    Starting a new group
    Starting a new group
    1
    2
    3
    4
    Starting a new group
    X
    Y
    Z
    Starting a new group
    $ python3 group.py
    ['beta', 'gamma'] # where's alpha?
    ['a', 'b', 'c']
    []
    ['1', '2', '3', '4']
    ['X', 'Y', 'Z']
    []

    How do you want to handle that case?
    Peter Otten, Apr 21, 2013
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. G?nter Jantzen

    whatsnew 2.4 about itertools.groupby:

    G?nter Jantzen, Jun 9, 2004, in forum: Python
    Replies:
    0
    Views:
    278
    G?nter Jantzen
    Jun 9, 2004
  2. Replies:
    3
    Views:
    324
    Fredrik Lundh
    May 25, 2006
  3. 7stud

    itertools.groupby

    7stud, May 27, 2007, in forum: Python
    Replies:
    13
    Views:
    591
    =?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=
    Jun 5, 2007
  4. Steve Howell

    Re: itertools.groupby

    Steve Howell, May 27, 2007, in forum: Python
    Replies:
    13
    Views:
    535
  5. Tobiah

    itertools.groupby

    Tobiah, Jan 15, 2008, in forum: Python
    Replies:
    2
    Views:
    299
    Tobiah
    Jan 16, 2008
Loading...

Share This Page