Splitting a string into substrings of equal size

Discussion in 'Python' started by candide, Aug 15, 2009.

  1. candide

    candide Guest

    Suppose you need to split a string into substrings of a given size (except
    possibly the last substring). I make the hypothesis the first slice is at the
    end of the string.
    A typical example is provided by formatting a decimal string with thousands
    separator.


    What is the pythonic way to do this ?


    For my part, i reach to this rather complicated code:


    # ----------------------

    def comaSep(z,k=3, sep=','):
    z=z[::-1]
    x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
    return sep.join(x)

    # Test
    for z in ["75096042068045", "509", "12024", "7", "2009"]:
    print z+" --> ", comaSep(z)

    # ----------------------

    outputting :

    75096042068045 --> 75,096,042,068,045
    509 --> 509
    12024 --> 12,024
    7 --> 7
    2009 --> 2,009


    Thanks
     
    candide, Aug 15, 2009
    #1
    1. Advertising

  2. En Fri, 14 Aug 2009 21:22:57 -0300, candide <>
    escribió:

    > Suppose you need to split a string into substrings of a given size
    > (except
    > possibly the last substring). I make the hypothesis the first slice is
    > at the
    > end of the string.
    > A typical example is provided by formatting a decimal string with
    > thousands
    > separator.
    >
    >
    > What is the pythonic way to do this ?


    py> import locale
    py> locale.setlocale(locale.LC_ALL, '')
    'Spanish_Argentina.1252'
    py> locale.format("%d", 75096042068045, True)
    '75.096.042.068.045'

    :)

    > For my part, i reach to this rather complicated code:


    Mine isn't very simple either:

    py> def genparts(z):
    .... n = len(z)
    .... i = n%3
    .... if i: yield z[:i]
    .... for i in xrange(i, n, 3):
    .... yield z[i:i+3]
    ....
    py> ','.join(genparts("75096042068045"))
    '75,096,042,068,045'

    --
    Gabriel Genellina
     
    Gabriel Genellina, Aug 15, 2009
    #2
    1. Advertising

  3. 15-08-2009 candide <> wrote:

    > Suppose you need to split a string into substrings of a given size
    > (except
    > possibly the last substring). I make the hypothesis the first slice is
    > at the end of the string.
    > A typical example is provided by formatting a decimal string with
    > thousands separator.


    I'd use iterators, especially for longer strings...


    import itertools

    def separate(text, grouplen=3, sep=','):
    "separate('12345678') -> '123,456,78'"
    repeated_iterator = [iter(text)] * grouplen
    groups = itertools.izip_longest(fillvalue='', *repeated_iterator)
    strings = (''.join(group) for group in groups) # gen. expr.
    return sep.join(strings)

    def back_separate(text, grouplen=3, sep=','):
    "back_separate('12345678') -> '12,345,678'"
    repeated_iterator = [reversed(text)] * grouplen
    groups = itertools.izip_longest(fillvalue='', *repeated_iterator)
    strings = [''.join(reversed(group)) for group in groups] # list compr.
    return sep.join(reversed(strings))

    print separate('12345678')
    print back_separate('12345678')

    # alternate implementation
    # (without "materializing" 'strings' as a list in back_separate):
    def separate(text, grouplen=3, sep=','):
    "separate('12345678') -> '12,345,678'"
    textlen = len(text)
    end = textlen - (textlen % grouplen)
    repeated_iterator = [iter(itertools.islice(text, 0, end))] * grouplen
    strings = itertools.imap(lambda *chars: ''.join(chars),
    *repeated_iterator)
    return sep.join(itertools.chain(strings, (text[end:],)))

    def back_separate(text, grouplen=3, sep=','):
    "back_separate('12345678') -> '12,345,678'"
    beg = len(text) % grouplen
    repeated_iterator = [iter(itertools.islice(text, beg, None))] *
    grouplen
    strings = itertools.imap(lambda *chars: ''.join(chars),
    *repeated_iterator)
    return sep.join(itertools.chain((text[:beg],), strings))

    print separate('12345678')
    print back_separate('12345678')


    http://docs.python.org/library/itertools.html#recipes
    was the inspiration for me (especially grouper).

    Cheers,
    *j
    --
    Jan Kaliszewski (zuo) <>
     
    Jan Kaliszewski, Aug 15, 2009
    #3
  4. 15-08-2009 Jan Kaliszewski <> wrote:

    > 15-08-2009 candide <> wrote:
    >
    >> Suppose you need to split a string into substrings of a given size
    >> (except
    >> possibly the last substring). I make the hypothesis the first slice is
    >> at the end of the string.
    >> A typical example is provided by formatting a decimal string with
    >> thousands separator.

    >
    > I'd use iterators, especially for longer strings...
    >
    >
    > import itertools

    [snip]

    Err... It's too late for coding... Now I see obvious and simpler variant:

    def separate(text, grouplen=3, sep=','):
    "separate('12345678') -> '123,456,78'"
    textlen = len(text)
    end = textlen - (textlen % grouplen)
    strings = (text[i:i+grouplen] for i in xrange(0, end, grouplen))
    return sep.join(itertools.chain(strings, (text[end:],)))

    def back_separate(text, grouplen=3, sep=','):
    "back_separate('12345678') -> '12,345,678'"
    textlen = len(text)
    beg = textlen % grouplen
    strings = (text[i:i+grouplen] for i in xrange(beg, textlen, grouplen))
    return sep.join(itertools.chain((text[:beg],), strings))

    print separate('12345678')
    print back_separate('12345678')

    --
    Jan Kaliszewski (zuo) <>
     
    Jan Kaliszewski, Aug 15, 2009
    #4
  5. candide

    Rascal Guest

    I'm bored for posting this, but here it is:

    def add_commas(str):
    str_list = list(str)
    str_len = len(str)
    for i in range(3, str_len, 3):
    str_list.insert(str_len - i, ',')
    return ''.join(str_list)
     
    Rascal, Aug 15, 2009
    #5
  6. candide

    candide Guest

    Thanks to all for your response. I particularly appreciate Rascal's solution.
     
    candide, Aug 15, 2009
    #6
  7. Dnia 15-08-2009 o 08:08:14 Rascal <> wrote:

    > I'm bored for posting this, but here it is:
    >
    > def add_commas(str):
    > str_list = list(str)
    > str_len = len(str)
    > for i in range(3, str_len, 3):
    > str_list.insert(str_len - i, ',')
    > return ''.join(str_list)


    For short strings (for sure most common case) it's ok: simple and clear.
    But for huge ones, it's better not to materialize additional list for the
    string -- then pure-iterator-sollutions would be better (like Gabriel's or
    mine).

    Cheers,
    *j

    --
    Jan Kaliszewski (zuo) <>
     
    Jan Kaliszewski, Aug 15, 2009
    #7
  8. On 8/14/2009 5:22 PM candide said...
    > Suppose you need to split a string into substrings of a given size (except
    > possibly the last substring). I make the hypothesis the first slice is at the
    > end of the string.
    > A typical example is provided by formatting a decimal string with thousands
    > separator.
    >
    >
    > What is the pythonic way to do this ?


    I like list comps...

    >>> jj = '1234567890123456789'
    >>> ",".join([jj[ii:ii+3] for ii in range(0,len(jj),3)])

    '123,456,789,012,345,678,9'
    >>>


    Emile
     
    Emile van Sebille, Aug 15, 2009
    #8
  9. candide

    Gregor Lingl Guest


    > What is the pythonic way to do this ?
    >
    >
    > For my part, i reach to this rather complicated code:
    >
    >
    > # ----------------------
    >
    > def comaSep(z,k=3, sep=','):
    > z=z[::-1]
    > x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
    > return sep.join(x)
    >
    > # Test
    > for z in ["75096042068045", "509", "12024", "7", "2009"]:
    > print z+" --> ", comaSep(z)
    >


    Just if you are interested, a recursive solution:

    >>> def comaSep(z,k=3,sep=","):

    return comaSep(z[:-3],k,sep)+sep+z[-3:] if len(z)>3 else z

    >>> comaSep("7")

    '7'
    >>> comaSep("2007")

    '2,007'
    >>> comaSep("12024")

    '12,024'
    >>> comaSep("509")

    '509'
    >>> comaSep("75096042068045")

    '75,096,042,068,045'
    >>>


    Gregor
     
    Gregor Lingl, Aug 15, 2009
    #9
  10. candide

    Gregor Lingl Guest


    > What is the pythonic way to do this ?
    >
    >
    > For my part, i reach to this rather complicated code:
    >
    >
    > # ----------------------
    >
    > def comaSep(z,k=3, sep=','):
    > z=z[::-1]
    > x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
    > return sep.join(x)
    >
    > # Test
    > for z in ["75096042068045", "509", "12024", "7", "2009"]:
    > print z+" --> ", comaSep(z)
    >


    Just if you are interested, a recursive solution:

    >>> def comaSep(z,k=3,sep=","):

    return comaSep(z[:-3],k,sep)+sep+z[-3:] if len(z)>3 else z

    >>> comaSep("7")

    '7'
    >>> comaSep("2007")

    '2,007'
    >>> comaSep("12024")

    '12,024'
    >>> comaSep("509")

    '509'
    >>> comaSep("75096042068045")

    '75,096,042,068,045'
    >>>


    Gregor
     
    Gregor Lingl, Aug 15, 2009
    #10
  11. candide

    Gregor Lingl Guest

    Emile van Sebille schrieb:
    > On 8/14/2009 5:22 PM candide said...

    ....
    >> What is the pythonic way to do this ?

    >
    > I like list comps...
    >
    > >>> jj = '1234567890123456789'
    > >>> ",".join([jj[ii:ii+3] for ii in range(0,len(jj),3)])

    > '123,456,789,012,345,678,9'
    > >>>

    >
    > Emile
    >


    Less beautiful but more correct:

    >>> ",".join([jj[max(ii-3,0):ii] for ii in

    range(len(jj)%3,len(jj)+3,3)])
    '1,234,567,890,123,456,789'

    Gregor
     
    Gregor Lingl, Aug 15, 2009
    #11
  12. candide

    Mark Tolonen Guest

    "Gregor Lingl" <> wrote in message
    news:4a87036a$0$2292$...
    > Emile van Sebille schrieb:
    >> On 8/14/2009 5:22 PM candide said...

    > ...
    >>> What is the pythonic way to do this ?

    >>
    >> I like list comps...
    >>
    >> >>> jj = '1234567890123456789'
    >> >>> ",".join([jj[ii:ii+3] for ii in range(0,len(jj),3)])

    >> '123,456,789,012,345,678,9'
    >> >>>

    >>
    >> Emile
    >>

    >
    > Less beautiful but more correct:
    >
    > >>> ",".join([jj[max(ii-3,0):ii] for ii in

    > range(len(jj)%3,len(jj)+3,3)])
    > '1,234,567,890,123,456,789'
    >
    > Gregor


    Is it?

    >>> jj = '234567890123456789'
    >>> ",".join([jj[max(ii-3,0):ii] for ii in range(len(jj)%3,len(jj)+3,3)])

    ',234,567,890,123,456,789'

    At least one other solution in this thread had the same problem.

    -Mark
     
    Mark Tolonen, Aug 15, 2009
    #12
  13. candide

    ryles Guest

    On Aug 14, 8:22 pm, candide <> wrote:
    > Suppose you need to split a string into substrings of a given size (except
    > possibly the last substring). I make the hypothesis the first slice is at the
    > end of the string.
    > A typical example is provided by formatting a decimal string with thousands
    > separator.
    >
    > What is the pythonic way to do this ?
    >
    > For my part, i reach to this rather complicated code:
    >
    > # ----------------------
    >
    > def comaSep(z,k=3, sep=','):
    >     z=z[::-1]
    >     x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
    >     return sep.join(x)
    >
    > # Test
    > for z in ["75096042068045", "509", "12024", "7", "2009"]:
    >     print z+" --> ", comaSep(z)
    >
    > # ----------------------
    >
    > outputting :
    >
    > 75096042068045 -->  75,096,042,068,045
    > 509 -->  509
    > 12024 -->  12,024
    > 7 -->  7
    > 2009 -->  2,009
    >
    > Thanks


    py> s='1234567'
    py> ','.join(_[::-1] for _ in re.findall('.{1,3}',s[::-1])[::-1])
    '1,234,567'
    py> # j/k ;)
     
    ryles, Aug 15, 2009
    #13
  14. candide

    MRAB Guest

    ryles wrote:
    > On Aug 14, 8:22 pm, candide <> wrote:
    >> Suppose you need to split a string into substrings of a given size (except
    >> possibly the last substring). I make the hypothesis the first slice is at the
    >> end of the string.
    >> A typical example is provided by formatting a decimal string with thousands
    >> separator.
    >>
    >> What is the pythonic way to do this ?
    >>
    >> For my part, i reach to this rather complicated code:
    >>
    >> # ----------------------
    >>
    >> def comaSep(z,k=3, sep=','):
    >> z=z[::-1]
    >> x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
    >> return sep.join(x)
    >>
    >> # Test
    >> for z in ["75096042068045", "509", "12024", "7", "2009"]:
    >> print z+" --> ", comaSep(z)
    >>
    >> # ----------------------
    >>
    >> outputting :
    >>
    >> 75096042068045 --> 75,096,042,068,045
    >> 509 --> 509
    >> 12024 --> 12,024
    >> 7 --> 7
    >> 2009 --> 2,009
    >>
    >> Thanks

    >
    > py> s='1234567'
    > py> ','.join(_[::-1] for _ in re.findall('.{1,3}',s[::-1])[::-1])
    > '1,234,567'
    > py> # j/k ;)


    If you're going to use re, then:

    >>> for z in ["75096042068045", "509", "12024", "7", "2009"]:

    print re.sub(r"(?<=.)(?=(?:...)+$)", ",", z)


    75,096,042,068,045
    509
    12,024
    7
    2,009
     
    MRAB, Aug 15, 2009
    #14
  15. candide

    MRAB Guest

    Brian wrote:
    >
    >
    > On Sat, Aug 15, 2009 at 4:06 PM, MRAB <
    > <mailto:p>> wrote:
    >
    > ryles wrote:
    >
    > On Aug 14, 8:22 pm, candide <> wrote:
    >
    > Suppose you need to split a string into substrings of a
    > given size (except
    > possibly the last substring). I make the hypothesis the
    > first slice is at the
    > end of the string.
    > A typical example is provided by formatting a decimal string
    > with thousands
    > separator.
    >
    > What is the pythonic way to do this ?
    >
    > For my part, i reach to this rather complicated code:
    >
    > # ----------------------
    >
    > def comaSep(z,k=3, sep=','):
    > z=z[::-1]
    > x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
    > return sep.join(x)
    >
    > # Test
    > for z in ["75096042068045", "509", "12024", "7", "2009"]:
    > print z+" --> ", comaSep(z)
    >
    > # ----------------------
    >
    > outputting :
    >
    > 75096042068045 --> 75,096,042,068,045
    > 509 --> 509
    > 12024 --> 12,024
    > 7 --> 7
    > 2009 --> 2,009
    >
    > Thanks
    >
    >
    > py> s='1234567'
    > py> ','.join(_[::-1] for _ in re.findall('.{1,3}',s[::-1])[::-1])
    > '1,234,567'
    > py> # j/k ;)
    >
    >
    > If you're going to use re, then:
    >
    >
    > >>> for z in ["75096042068045", "509", "12024", "7", "2009"]:

    > print re.sub(r"(?<=.)(?=(?:...)+$)", ",", z)
    >
    >
    >
    > 75,096,042,068,045
    > 509
    > 12,024
    > 7
    > 2,009
    >
    >
    > Can you please break down this regex?
    >

    The call replaces a zero-width match with a comma, ie inserts a comma,
    if certain conditions are met:

    "(?<=.)"
    Look behind for 1 character. There must be at least one previous
    character. This ensures that a comma is never inserted at the start of
    the string. I could also have used "(?<!^)". Actually, it doesn't check
    whether the first character is a "-". That's left as an exercise for the
    reader. :)

    "(?=(?:...)+$)"
    Look ahead for a multiple of 3 characters, followed by the end of
    the string.
     
    MRAB, Aug 15, 2009
    #15
  16. candide

    ryles Guest

    On Aug 15, 6:28 pm, MRAB <> wrote:

    > >      >>> for z in ["75096042068045", "509", "12024", "7", "2009"]:
    > >            print re.sub(r"(?<=.)(?=(?:...)+$)", ",", z)

    >
    > >     75,096,042,068,045
    > >     509
    > >     12,024
    > >     7
    > >     2,009

    >
    > The call replaces a zero-width match with a comma, ie inserts a comma,
    > if certain conditions are met:
    >
    > "(?<=.)"
    >      Look behind for 1 character. There must be at least one previous
    > character. This ensures that a comma is never inserted at the start of
    > the string. I could also have used "(?<!^)". Actually, it doesn't check
    > whether the first character is a "-". That's left as an exercise for the
    > reader. :)
    >
    > "(?=(?:...)+$)"
    >      Look ahead for a multiple of 3 characters, followed by the end of
    > the string.


    Wow, well done. An exceptional recipe from Python's unofficial regex
    guru. And thanks for sharing the explanation.
     
    ryles, Aug 15, 2009
    #16
  17. candide

    Gregor Lingl Guest

    Mark Tolonen schrieb:
    >
    > "Gregor Lingl" <> wrote in message
    > news:4a87036a$0$2292$...
    >> Emile van Sebille schrieb:
    >>> On 8/14/2009 5:22 PM candide said...

    >> ...
    >>>> What is the pythonic way to do this ?
    >>>
    >>> I like list comps...
    >>>
    >>> >>> jj = '1234567890123456789'
    >>> >>> ",".join([jj[ii:ii+3] for ii in range(0,len(jj),3)])
    >>> '123,456,789,012,345,678,9'
    >>> >>>
    >>>
    >>> Emile
    >>>

    >>
    >> Less beautiful but more correct:
    >>
    >> >>> ",".join([jj[max(ii-3,0):ii] for ii in

    >> range(len(jj)%3,len(jj)+3,3)])
    >> '1,234,567,890,123,456,789'
    >>
    >> Gregor

    >
    > Is it?
    >
    >>>> jj = '234567890123456789'
    >>>> ",".join([jj[max(ii-3,0):ii] for ii in range(len(jj)%3,len(jj)+3,3)])

    > ',234,567,890,123,456,789'


    Schluck!

    Even more ugly:

    ",".join([jj[max(ii-3,0):ii] for ii in
    range(len(jj)%3,len(jj)+3,3)]).strip(",")
    '234,567,890,123,456,789'

    Gregor

    >
    > At least one other solution in this thread had the same problem.
    >
    > -Mark
    >
    >
     
    Gregor Lingl, Aug 16, 2009
    #17
  18. candide

    Simon Forman Guest

    On Aug 14, 8:22 pm, candide <> wrote:
    > Suppose you need to split a string into substrings of a given size (except
    > possibly the last substring). I make the hypothesis the first slice is at the
    > end of the string.
    > A typical example is provided by formatting a decimal string with thousands
    > separator.
    >
    > What is the pythonic way to do this ?
    >
    > For my part, i reach to this rather complicated code:
    >
    > # ----------------------
    >
    > def comaSep(z,k=3, sep=','):
    >     z=z[::-1]
    >     x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
    >     return sep.join(x)
    >
    > # Test
    > for z in ["75096042068045", "509", "12024", "7", "2009"]:
    >     print z+" --> ", comaSep(z)
    >
    > # ----------------------
    >
    > outputting :
    >
    > 75096042068045 -->  75,096,042,068,045
    > 509 -->  509
    > 12024 -->  12,024
    > 7 -->  7
    > 2009 -->  2,009
    >
    > Thanks


    FWIW:

    def chunks(s, length=3):
    stop = len(s)
    start = stop - length
    while start > 0:
    yield s[start:stop]
    stop, start = start, start - length
    yield s[:stop]


    s = '1234567890'
    print ','.join(reversed(list(chunks(s))))
    # prints '1,234,567,890'
     
    Simon Forman, Aug 16, 2009
    #18
  19. candide

    Gregor Lingl Guest

    Simon Forman schrieb:
    > On Aug 14, 8:22 pm, candide <> wrote:
    >> Suppose you need to split a string into substrings of a given size (except
    >> possibly the last substring). I make the hypothesis the first slice is at the
    >> end of the string.
    >> A typical example is provided by formatting a decimal string with thousands
    >> separator.
    >>
    >> What is the pythonic way to do this ?
    >>

    ....
    >> Thanks

    >
    > FWIW:
    >
    > def chunks(s, length=3):
    > stop = len(s)
    > start = stop - length
    > while start > 0:
    > yield s[start:stop]
    > stop, start = start, start - length
    > yield s[:stop]
    >
    >
    > s = '1234567890'
    > print ','.join(reversed(list(chunks(s))))
    > # prints '1,234,567,890'


    or:

    >>> def chunks(s, length=3):

    i, j = 0, len(s) % length or length
    while i < len(s):
    yield s[i:j]
    i, j = j, j + length

    >>> print(','.join(list(chunks(s))))

    1,234,567,890
    >>> print(','.join(list(chunks(s,2))))

    12,34,56,78,90
    >>> print(','.join(list(chunks(s,4))))

    12,3456,7890

    Regards,
    Gregor
     
    Gregor Lingl, Aug 18, 2009
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.

Share This Page