Problem of function calls from map()

Discussion in 'Python' started by Dasn, Aug 21, 2006.

  1. Dasn

    Dasn Guest

    Hi, there.

    'lines' is a large list of strings each of which is seperated by '\t'
    >>> lines = ['bla\tbla\tblah', 'bh\tb\tb', ... ]


    I wanna split each string into a list. For speed, using map() instead
    of 'for' loop. 'map(str.split, lines)' works fine , but...
    when I was trying:

    >>> l = map(str.split('\t'), lines)


    I got "TypeError: 'list' object is not callable".

    To avoid function call overhead, I am not willing to use lambda function
    either. So how to put '\t' argument to split() in map() ?

    Thanks.
    Dasn, Aug 21, 2006
    #1
    1. Advertising

  2. Dasn

    Tim Lesher Guest

    Dasn wrote:
    > So how to put '\t' argument to split() in map() ?


    How much is the lambda costing you, according to your profiler?

    Anyway, what you really want is a list comprehension:

    l = [line.split('\t') for line in lines]
    Tim Lesher, Aug 21, 2006
    #2
    1. Advertising

  3. Dasn wrote:

    >
    > Hi, there.
    >
    > 'lines' is a large list of strings each of which is seperated by '\t'
    >>>> lines = ['bla\tbla\tblah', 'bh\tb\tb', ... ]

    >
    > I wanna split each string into a list. For speed, using map() instead
    > of 'for' loop. 'map(str.split, lines)' works fine , but...
    > when I was trying:
    >
    >>>> l = map(str.split('\t'), lines)

    >
    > I got "TypeError: 'list' object is not callable".
    >
    > To avoid function call overhead, I am not willing to use lambda function
    > either. So how to put '\t' argument to split() in map() ?


    You can't. Use a lambda or list-comprehension.


    map(lambda l: l.split("\t"), lines)

    [l.split("\t") for l in lines]


    Diez
    Diez B. Roggisch, Aug 21, 2006
    #3
  4. Dasn

    Paul McGuire Guest

    "Dasn" <> wrote in message
    news:...
    >
    > Hi, there.
    >
    > 'lines' is a large list of strings each of which is seperated by '\t'
    > >>> lines = ['bla\tbla\tblah', 'bh\tb\tb', ... ]

    >
    > I wanna split each string into a list. For speed, using map() instead
    > of 'for' loop.


    Try this. Not sure how it stacks up for speed, though. (As others have
    suggested, if 'for' loop is giving you speed heartburn, use a list
    comprehension.)

    In this case, splitUsing is called only once, to create the embedded
    function tmp. tmp is the function that split will call once per list item,
    using whatever characters were specified in the call to splitUsing.

    -- Paul



    data = [
    "sldjflsdfj\tlsjdlj\tlkjsdlkfj",
    "lsdjflsjd\tlsjdlfdj\tlskjdflkj",
    "lskdjfl\tlskdjflj\tlskdlfkjsd",
    ]

    def splitUsing(chars):
    def tmp(s):
    return s.split(chars)
    return tmp

    for d in map(splitUsing('\t'), data):
    print d
    Paul McGuire, Aug 21, 2006
    #4
  5. Dasn

    Paul McGuire Guest

    >>tmp is the function that split will call once per list item

    should be

    tmp is the function that *map* will call once per list item

    -- Paul
    Paul McGuire, Aug 21, 2006
    #5
  6. Dasn

    Georg Brandl Guest

    Paul McGuire wrote:
    > "Dasn" <> wrote in message
    > news:...
    >>
    >> Hi, there.
    >>
    >> 'lines' is a large list of strings each of which is seperated by '\t'
    >> >>> lines = ['bla\tbla\tblah', 'bh\tb\tb', ... ]

    >>
    >> I wanna split each string into a list. For speed, using map() instead
    >> of 'for' loop.

    >
    > Try this. Not sure how it stacks up for speed, though. (As others have
    > suggested, if 'for' loop is giving you speed heartburn, use a list
    > comprehension.)
    >
    > In this case, splitUsing is called only once, to create the embedded
    > function tmp. tmp is the function that split will call once per list item,
    > using whatever characters were specified in the call to splitUsing.
    >
    > -- Paul
    >
    >
    >
    > data = [
    > "sldjflsdfj\tlsjdlj\tlkjsdlkfj",
    > "lsdjflsjd\tlsjdlfdj\tlskjdflkj",
    > "lskdjfl\tlskdjflj\tlskdlfkjsd",
    > ]
    >
    > def splitUsing(chars):
    > def tmp(s):
    > return s.split(chars)
    > return tmp
    >
    > for d in map(splitUsing('\t'), data):
    > print d


    And why is this better than

    map(lambda t: t.split('\t'), data)

    ?

    Georg
    Georg Brandl, Aug 22, 2006
    #6
  7. Dasn

    Paul McGuire Guest

    "Georg Brandl" <> wrote in message
    news:ecemdl$qd5$...
    > Paul McGuire wrote:
    > > "Dasn" <> wrote in message
    > > news:...
    > >>
    > >> Hi, there.
    > >>
    > >> 'lines' is a large list of strings each of which is seperated by '\t'
    > >> >>> lines = ['bla\tbla\tblah', 'bh\tb\tb', ... ]
    > >>
    > >> I wanna split each string into a list. For speed, using map() instead
    > >> of 'for' loop.

    > >

    <snip>
    > >
    > > def splitUsing(chars):
    > > def tmp(s):
    > > return s.split(chars)
    > > return tmp
    > >
    > > for d in map(splitUsing('\t'), data):
    > > print d

    >
    > And why is this better than
    >
    > map(lambda t: t.split('\t'), data)
    >
    > ?
    >
    > Georg


    Hmm, "better" is a funny word. My posting was definitely more verbose, but
    verbosity isn't always bad.

    In defense of brevity:
    - often (but not always) runs faster
    - usually easier to understand as a single gestalt (i.e., you don't have to
    jump around in the code, or grasp the intent of a dozen or more lines, when
    one or a few lines do all the work), but this can be overdone

    In defense of verbosity:
    - usually more explicit, as individual bits of logic are exposed as separate
    functions or statements, and anonymous functions can be given more
    descriptive names
    - usually easier to understand, especially for language newcomers
    - separate functions can be compiled by psyco

    Of course, such generalizations invite obvious extremes and counterexamples.
    Prime number algorithms compacted into one-liners are anything but quick to
    understand; conversely, I've seen a 40-line database function exploded into
    >100 classes (this was in Java, so each was also a separate file!) in

    pursuit of implementing a developer's favorite GOF pattern.

    This idiom (as used in the splitUsing case) of returning a callable from a
    function whose purpose is to be a factory for callables seems to be a common
    one in Python, I think I've seen it go by different names: currying, and
    closures being most common, and decorators are another flavor of this idea.
    Perhaps these idioms ("idia"?) emerged when "lambda" was on Guido's Py3K
    chopping block.

    So I wouldn't really hold these two routines up for "betterness" - the OP's
    performance test shows them to be about the same. To summarize his
    performance results (times in CPU secs):
    - explicit "for" loop - 20.510 (309130 total function calls; 154563 to
    split and 154563 to append)
    - list comprehension - 12.240 (154567 total function calls; 154563 to split)
    - map+lambda - 20.480 (309130 total function calls; 154563 to <lambda> and
    154563 to split)
    - map+splitUsing - 21.900 (309130 total function calls; 154563 to tmp and
    154563 to split)

    The big winner here is the list comprehension, and it would seem it outdoes
    the others by halving the number of function calls. Unfortunately, most of
    our currying/closure/decorator idioms are implemented using some sort of
    "function-calls-an-embedded-function" form, and function calls are poison to
    performance in Python (and other languages, too, but perhaps easier to
    observe in Python). Even the anonymous lambda implementation has this same
    issue.

    So the interesting point here is to go back to the OP's OP, in which he
    states, "For speed, [I'm] using map() instead of 'for' loop." As it turns
    out, map() isn't much of a win in this case. The real, "best" solution is
    the list comprehension, not only for speed, but also for ease of readability
    and understanding. It's tough to beat this:

    return [s.split('\t') for s in lines]

    for clarity, explicity, brevity, and as it happens, also for speed.

    -- Paul
    Paul McGuire, Aug 22, 2006
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Honne Gowda A
    Replies:
    2
    Views:
    857
    Karl Heinz Buchegger
    Oct 31, 2003
  2. amit kumar
    Replies:
    5
    Views:
    6,122
    velthuijsen
    May 18, 2004
  3. Replies:
    2
    Views:
    892
    Bengt Richter
    Aug 1, 2005
  4. Dasn
    Replies:
    9
    Views:
    280
  5. Bob
    Replies:
    5
    Views:
    247
Loading...

Share This Page