better way for ' '.join(args) + '\n'?

Discussion in 'Python' started by Ulrich Eckhardt, Oct 26, 2012.

  1. Hi!

    General advise when assembling strings is to not concatenate them
    repeatedly but instead use string's join() function, because it avoids
    repeated reallocations and is at least as expressive as any alternative.

    What I have now is a case where I'm assembling lines of text for driving
    a program with a commandline interface. In this scenario, I'm currently
    doing this:

    args = ['foo', 'bar', 'baz']
    line = ' '.join(args) + '\n'

    So, in other words, I'm avoiding all the unnecessary copying, just to
    make another copy to append the final newline.

    The only way around this that I found involves creating an intermediate
    sequence like ['foo', ' ', 'bar', ' ', 'baz', '\n']. This can be done
    rather cleanly with a generator:

    def helper(s):
    for i in s[:-1]:
    yield i
    yield ' '
    yield s[-1]
    yield '\n'
    line = ''.join(tmp(args))

    Efficiency-wise, this is satisfactory. However, readability counts and
    that is where this version fails and that is the reason why I'm writing
    this message. So, dear fellow Pythonistas, any ideas to improve the
    original versions efficiency while preserving its expressiveness?

    Oh, for all those that are tempted to tell me that this is not my
    bottleneck unless it's called in a very tight loop, you're right.
    Indeed, the overhead of the communication channel TCP between the two
    programs is by far dwarving the few microseconds I could save here. I'm
    still interested in learning new and better solutions though.


    Cheers!

    Uli
     
    Ulrich Eckhardt, Oct 26, 2012
    #1
    1. Advertising

  2. Ulrich Eckhardt

    Peter Otten Guest

    Ulrich Eckhardt wrote:

    > Hi!
    >
    > General advise when assembling strings is to not concatenate them
    > repeatedly but instead use string's join() function, because it avoids
    > repeated reallocations and is at least as expressive as any alternative.
    >
    > What I have now is a case where I'm assembling lines of text for driving
    > a program with a commandline interface. In this scenario, I'm currently
    > doing this:
    >
    > args = ['foo', 'bar', 'baz']
    > line = ' '.join(args) + '\n'
    >
    > So, in other words, I'm avoiding all the unnecessary copying, just to
    > make another copy to append the final newline.
    >
    > The only way around this that I found involves creating an intermediate
    > sequence like ['foo', ' ', 'bar', ' ', 'baz', '\n']. This can be done
    > rather cleanly with a generator:
    >
    > def helper(s):
    > for i in s[:-1]:
    > yield i
    > yield ' '
    > yield s[-1]
    > yield '\n'
    > line = ''.join(tmp(args))
    >
    > Efficiency-wise, this is satisfactory.


    No, it is not. In a quick timeit test it takes 5 to 10 times as long as the
    original. Remember that function calls are costly, and that with s[:-1] you
    are trading the extra string for an extra list. Also, you are doubling the
    loop implicit in str.join() with the explicit one in your oh-so-efficient
    generator.

    > However, readability counts and
    > that is where this version fails and that is the reason why I'm writing
    > this message. So, dear fellow Pythonistas, any ideas to improve the
    > original versions efficiency while preserving its expressiveness?
    >
    > Oh, for all those that are tempted to tell me that this is not my
    > bottleneck unless it's called in a very tight loop, you're right.
    > Indeed, the overhead of the communication channel TCP between the two
    > programs is by far dwarving the few microseconds I could save here. I'm
    > still interested in learning new and better solutions though.


    Even if it were the bottleneck the helper generator approach would still be
    unhelpful.
     
    Peter Otten, Oct 26, 2012
    #2
    1. Advertising

  3. On Fri, 26 Oct 2012 09:49:50 +0200, Ulrich Eckhardt wrote:

    > Hi!
    >
    > General advise when assembling strings is to not concatenate them
    > repeatedly but instead use string's join() function, because it avoids
    > repeated reallocations and is at least as expressive as any alternative.
    >
    > What I have now is a case where I'm assembling lines of text for driving
    > a program with a commandline interface. In this scenario, I'm currently
    > doing this:
    >
    > args = ['foo', 'bar', 'baz']
    > line = ' '.join(args) + '\n'
    >
    > So, in other words, I'm avoiding all the unnecessary copying, just to
    > make another copy to append the final newline.


    *shrug*

    The difference between ' '.join(sequence) and (' '.join(sequence) + '\n')
    is, in Big Oh analysis, insignificant. The first case does O(N)
    operations, the second does O(N) + O(N) = 2*O(N) operations, which is
    still O(N). In effect, the two differ only by an approximately constant
    factor.

    If you really care, and you don't mind ending your last line with a
    space, just append '\n' to the sequence before calling join.


    > The only way around this that I found involves creating an intermediate
    > sequence like ['foo', ' ', 'bar', ' ', 'baz', '\n']. This can be done
    > rather cleanly with a generator:
    >
    > def helper(s):
    > for i in s[:-1]:
    > yield i
    > yield ' '
    > yield s[-1]
    > yield '\n'
    > line = ''.join(tmp(args))
    >
    > Efficiency-wise, this is satisfactory.


    Have you actually tested this? I would not be the least surprised if
    that's actually less efficient than the (' '.join(seq) + '\n') version.


    --
    Steven
     
    Steven D'Aprano, Oct 26, 2012
    #3
  4. Hi Ulrich,

    is this acceptable?

    args = ['foo', 'bar', 'baz']
    args.append('\n')
    line = ' '.join(args)

    Cheers,
    Hubert

    On 10/26/2012 09:49 AM, Ulrich Eckhardt wrote:
    > Hi!
    >
    > General advise when assembling strings is to not concatenate them
    > repeatedly but instead use string's join() function, because it avoids
    > repeated reallocations and is at least as expressive as any alternative.
    >
    > What I have now is a case where I'm assembling lines of text for driving
    > a program with a commandline interface. In this scenario, I'm currently
    > doing this:
    >
    > args = ['foo', 'bar', 'baz']
    > line = ' '.join(args) + '\n'
    >
    > So, in other words, I'm avoiding all the unnecessary copying, just to
    > make another copy to append the final newline.
    >
    > The only way around this that I found involves creating an intermediate
    > sequence like ['foo', ' ', 'bar', ' ', 'baz', '\n']. This can be done
    > rather cleanly with a generator:
    >
    > def helper(s):
    > for i in s[:-1]:
    > yield i
    > yield ' '
    > yield s[-1]
    > yield '\n'
    > line = ''.join(tmp(args))
    >
    > Efficiency-wise, this is satisfactory. However, readability counts and
    > that is where this version fails and that is the reason why I'm writing
    > this message. So, dear fellow Pythonistas, any ideas to improve the
    > original versions efficiency while preserving its expressiveness?
    >
    > Oh, for all those that are tempted to tell me that this is not my
    > bottleneck unless it's called in a very tight loop, you're right.
    > Indeed, the overhead of the communication channel TCP between the two
    > programs is by far dwarving the few microseconds I could save here. I'm
    > still interested in learning new and better solutions though.
    >
    >
    > Cheers!
    >
    > Uli
    >
     
    Hubert Grünheidt, Oct 26, 2012
    #4
  5. On Fri, Oct 26, 2012 at 09:49:50AM +0200, Ulrich Eckhardt wrote:
    > Hi!
    >
    > General advise when assembling strings is to not concatenate them
    > repeatedly but instead use string's join() function, because it
    > avoids repeated reallocations and is at least as expressive as any
    > alternative.
    >
    > What I have now is a case where I'm assembling lines of text for
    > driving a program with a commandline interface. In this scenario,
    > I'm currently doing this:
    >
    > args = ['foo', 'bar', 'baz']
    > line = ' '.join(args) + '\n'


    Assuming it's the length of the list that's the problem, not the
    length of the strings in the list...

    args = ['foo', 'bar', 'baz']
    args[-1] = args[-1] + '\n'
    line = ' '.join(args)

    \t
     
    Tycho Andersen, Oct 26, 2012
    #5
  6. Ulrich Eckhardt

    Dave Angel Guest

    On 10/26/2012 05:26 PM, Tycho Andersen wrote:
    > On Fri, Oct 26, 2012 at 09:49:50AM +0200, Ulrich Eckhardt wrote:
    >> Hi!
    >>
    >> General advise when assembling strings is to not concatenate them
    >> repeatedly but instead use string's join() function, because it
    >> avoids repeated reallocations and is at least as expressive as any
    >> alternative.
    >>
    >> What I have now is a case where I'm assembling lines of text for
    >> driving a program with a commandline interface. In this scenario,
    >> I'm currently doing this:
    >>
    >> args = ['foo', 'bar', 'baz']
    >> line = ' '.join(args) + '\n'

    > Assuming it's the length of the list that's the problem, not the
    > length of the strings in the list...
    >
    > args = ['foo', 'bar', 'baz']
    > args[-1] = args[-1] + '\n'
    > line = ' '.join(args)
    >
    > \t


    Main problem with that is the trailing space before the newline. If
    that's not a problem, then fine.

    Not sure why we try so hard to optimize something that's going to take
    negligible time.

    --

    DaveA
     
    Dave Angel, Oct 26, 2012
    #6
  7. On Fri, Oct 26, 2012 at 05:36:50PM -0400, Dave Angel wrote:
    > On 10/26/2012 05:26 PM, Tycho Andersen wrote:
    > > Assuming it's the length of the list that's the problem, not the
    > > length of the strings in the list...
    > >
    > > args = ['foo', 'bar', 'baz']
    > > args[-1] = args[-1] + '\n'
    > > line = ' '.join(args)
    > >
    > > \t

    >
    > Main problem with that is the trailing space before the newline. If
    > that's not a problem, then fine.


    What trailing space before the newline? The other solutions have it,
    the above does not. However, the above does mutate args, which isn't
    all that great. Alas, if you want the performance of mutable
    structures, you're probably going to have to mutate something. (In any
    case, it's easy enough to change it back, though ugly.)

    > Not sure why we try so hard to optimize something that's going to take
    > negligible time.


    The same reason some people enjoy sporting events: it's fun :)

    \t
     
    Tycho Andersen, Oct 26, 2012
    #7
  8. Am 26.10.2012 09:49 schrieb Ulrich Eckhardt:
    > Hi!
    >
    > General advise when assembling strings is to not concatenate them
    > repeatedly but instead use string's join() function, because it avoids
    > repeated reallocations and is at least as expressive as any alternative.
    >
    > What I have now is a case where I'm assembling lines of text for driving
    > a program with a commandline interface.


    Stop.

    In this case, you think too complicated.

    Just do

    subprocess.Popen(['prog', 'foo', 'bar', 'baz'])

    - is the most safest thing for this use case.

    If it should not be possible for any reason, you should be aware of any
    traps you could catch - e.g., if you want to feed your string to a
    Bourne shell, you should escape the strings properly.

    In such cases, I use


    def shellquote(*strs):
    r"""Input: file names, output: ''-enclosed strings where every ' is
    replaced with '\''. Intended for usage with the shell."""
    # just take over everything except ';
    # replace ' with '\''
    # The shell sees ''' as ''\'''\'''\'''. Ugly, but works.
    return " ".join([
    "'"+st.replace("'","'\\''")+"'"
    for st in strs
    ])


    so I can use

    shellquote('program name', 'argu"ment 1', '$arg 2',
    "even args containing a ' are ok")

    For Windows, you'll have to modify this somehow.


    HTH,

    Thomas
     
    Thomas Rachel, Oct 27, 2012
    #8
  9. Thomas Rachel wrote:

    > Am 26.10.2012 09:49 schrieb Ulrich Eckhardt:

    > > Hi!
    > >
    > > General advise when assembling strings is to not concatenate them
    > > repeatedly but instead use string's join()function, because it avoids
    > > repeated reallocations and is at least as expressive as any alternative.
    > >
    > > What I have now is a case where I'm assembling lines of text for driving
    > > a program with a commandline interface.

    >
    > Stop.
    >
    > In thiscase, you think too complicated.
    >
    > Just do
    >
    > subprocess.Popen(['prog', 'foo', 'bar', 'baz'])
    >
    > - is the most safest thing for this use case.
    >
    > If it should not be possible for any reason, you should be aware of any
    > traps you could catch - e.g., if you want to feed your string to a
    > Bourne shell, you should escape the strings properly.
    >
    > In such cases,I use
    >
    >
    > def shellquote(*strs):
    > r"""Input: file names, output: ''-enclosed strings where every ' is
    > replaced with '\''. Intended for usage with the shell."""
    > # just take over everything except ';
    > # replace ' with '\''
    > # The shell sees ''' as ''\'''\'''\'''. Ugly, but works.
    > return " ".join([
    > "'"+st.replace("'","'\\''")+"'"
    > for st in strs
    > ])
    >
    >
    > so I can use
    >
    > shellquote('program name', 'argu"ment 1', '$arg 2',
    > "even args containing a ' are ok")
    >
    > For Windows, you'll have to modify this somehow.
    >


    The subprocess module suggests using pipes.quote for escaping.


    >>> a

    ('program name', 'argu"ment 1', '$arg 2', "even args containing a ' are ok")

    >>> import pipes
    >>> map(pipes.quote, a)

    ["'program name'", '\'argu"ment 1\'', "'$arg 2'", '\'even args containing a \'"\'"\' are ok\'']

    >>> ' '.join(a)

    '\'program name\' \'argu"ment 1\' \'$arg 2\' \'even args containing a \'\\\'\' are ok\''


    Ramit Prasad


    This email is confidential and subject to important disclaimers and
    conditions including on offers for the purchase or sale of
    securities, accuracy and completeness ofinformation, viruses,
    confidentiality, legal privilege, and legal entity disclaimers,
    available at http://www.jpmorgan.com/pages/disclosures/email.
     
    Prasad, Ramit, Oct 29, 2012
    #9
  10. On Saturday, 27 October 2012 03:12:31 UTC+5:30, Tycho Andersen wrote:
    > On Fri, Oct 26, 2012 at 05:36:50PM -0400, Dave Angel wrote:
    >
    > > On 10/26/2012 05:26 PM, Tycho Andersen wrote:

    >
    > > > Assuming it's the length of the list that's the problem, not the

    >
    > > > length of the strings in the list...

    >
    > > >

    >
    > > > args = ['foo', 'bar', 'baz']

    >
    > > > args[-1] = args[-1] + '\n'

    >
    > > > line = ' '.join(args)

    >
    > > >

    >
    > > > \t

    >
    > >

    >
    > > Main problem with that is the trailing space before the newline. If

    >
    > > that's not a problem, then fine.

    >
    >
    >
    > What trailing space before the newline? The other solutions have it,
    >
    > the above does not. However, the above does mutate args, which isn't
    >
    > all that great. Alas, if you want the performance of mutable
    >
    > structures, you're probably going to have to mutate something. (In any
    >
    > case, it's easy enough to change it back, though ugly.)
    >
    >
    >
    > > Not sure why we try so hard to optimize something that's going to take

    >
    > > negligible time.

    >
    >
    >
    > The same reason some people enjoy sporting events: it's fun :)

    Me too
    >
    >
    >
    > \t
     
    Ramchandra Apte, Nov 3, 2012
    #10
  11. On Saturday, 27 October 2012 03:12:31 UTC+5:30, Tycho Andersen wrote:
    > On Fri, Oct 26, 2012 at 05:36:50PM -0400, Dave Angel wrote:
    >
    > > On 10/26/2012 05:26 PM, Tycho Andersen wrote:

    >
    > > > Assuming it's the length of the list that's the problem, not the

    >
    > > > length of the strings in the list...

    >
    > > >

    >
    > > > args = ['foo', 'bar', 'baz']

    >
    > > > args[-1] = args[-1] + '\n'

    >
    > > > line = ' '.join(args)

    >
    > > >

    >
    > > > \t

    >
    > >

    >
    > > Main problem with that is the trailing space before the newline. If

    >
    > > that's not a problem, then fine.

    >
    >
    >
    > What trailing space before the newline? The other solutions have it,
    >
    > the above does not. However, the above does mutate args, which isn't
    >
    > all that great. Alas, if you want the performance of mutable
    >
    > structures, you're probably going to have to mutate something. (In any
    >
    > case, it's easy enough to change it back, though ugly.)
    >
    >
    >
    > > Not sure why we try so hard to optimize something that's going to take

    >
    > > negligible time.

    >
    >
    >
    > The same reason some people enjoy sporting events: it's fun :)

    Me too
    >
    >
    >
    > \t
     
    Ramchandra Apte, Nov 3, 2012
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ken Varn
    Replies:
    2
    Views:
    636
    Ken Varn
    Jun 22, 2005
  2. Replies:
    3
    Views:
    497
    David Eppstein
    Sep 17, 2003
  3. Pierre Fortin

    args v. *args passed to: os.path.join()

    Pierre Fortin, Sep 18, 2004, in forum: Python
    Replies:
    2
    Views:
    698
    Pierre Fortin
    Sep 18, 2004
  4. er
    Replies:
    2
    Views:
    509
  5. Andrew Tomazos
    Replies:
    5
    Views:
    582
Loading...

Share This Page