# Splitting a string into substrings of equal size

Discussion in 'Python' started by candide, Aug 15, 2009.

1. ### candideGuest

Suppose you need to split a string into substrings of a given size (except
possibly the last substring). I make the hypothesis the first slice is at the
end of the string.
A typical example is provided by formatting a decimal string with thousands
separator.

What is the pythonic way to do this ?

For my part, i reach to this rather complicated code:

# ----------------------

def comaSep(z,k=3, sep=','):
z=z[::-1]
x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
return sep.join(x)

# Test
for z in ["75096042068045", "509", "12024", "7", "2009"]:
print z+" --> ", comaSep(z)

# ----------------------

outputting :

75096042068045 --> 75,096,042,068,045
509 --> 509
12024 --> 12,024
7 --> 7
2009 --> 2,009

Thanks

candide, Aug 15, 2009

2. ### Gabriel GenellinaGuest

En Fri, 14 Aug 2009 21:22:57 -0300, candide <>
escribió:

> Suppose you need to split a string into substrings of a given size
> (except
> possibly the last substring). I make the hypothesis the first slice is
> at the
> end of the string.
> A typical example is provided by formatting a decimal string with
> thousands
> separator.
>
>
> What is the pythonic way to do this ?

py> import locale
py> locale.setlocale(locale.LC_ALL, '')
'Spanish_Argentina.1252'
py> locale.format("%d", 75096042068045, True)
'75.096.042.068.045'

> For my part, i reach to this rather complicated code:

Mine isn't very simple either:

py> def genparts(z):
.... n = len(z)
.... i = n%3
.... if i: yield z[:i]
.... for i in xrange(i, n, 3):
.... yield z[i:i+3]
....
py> ','.join(genparts("75096042068045"))
'75,096,042,068,045'

--
Gabriel Genellina

Gabriel Genellina, Aug 15, 2009

3. ### Jan KaliszewskiGuest

15-08-2009 candide <> wrote:

> Suppose you need to split a string into substrings of a given size
> (except
> possibly the last substring). I make the hypothesis the first slice is
> at the end of the string.
> A typical example is provided by formatting a decimal string with
> thousands separator.

I'd use iterators, especially for longer strings...

import itertools

def separate(text, grouplen=3, sep=','):
"separate('12345678') -> '123,456,78'"
repeated_iterator = [iter(text)] * grouplen
groups = itertools.izip_longest(fillvalue='', *repeated_iterator)
strings = (''.join(group) for group in groups) # gen. expr.
return sep.join(strings)

def back_separate(text, grouplen=3, sep=','):
"back_separate('12345678') -> '12,345,678'"
repeated_iterator = [reversed(text)] * grouplen
groups = itertools.izip_longest(fillvalue='', *repeated_iterator)
strings = [''.join(reversed(group)) for group in groups] # list compr.
return sep.join(reversed(strings))

print separate('12345678')
print back_separate('12345678')

# alternate implementation
# (without "materializing" 'strings' as a list in back_separate):
def separate(text, grouplen=3, sep=','):
"separate('12345678') -> '12,345,678'"
textlen = len(text)
end = textlen - (textlen % grouplen)
repeated_iterator = [iter(itertools.islice(text, 0, end))] * grouplen
strings = itertools.imap(lambda *chars: ''.join(chars),
*repeated_iterator)
return sep.join(itertools.chain(strings, (text[end:],)))

def back_separate(text, grouplen=3, sep=','):
"back_separate('12345678') -> '12,345,678'"
beg = len(text) % grouplen
repeated_iterator = [iter(itertools.islice(text, beg, None))] *
grouplen
strings = itertools.imap(lambda *chars: ''.join(chars),
*repeated_iterator)
return sep.join(itertools.chain((text[:beg],), strings))

print separate('12345678')
print back_separate('12345678')

http://docs.python.org/library/itertools.html#recipes
was the inspiration for me (especially grouper).

Cheers,
*j
--
Jan Kaliszewski (zuo) <>

Jan Kaliszewski, Aug 15, 2009
4. ### Jan KaliszewskiGuest

15-08-2009 Jan Kaliszewski <> wrote:

> 15-08-2009 candide <> wrote:
>
>> Suppose you need to split a string into substrings of a given size
>> (except
>> possibly the last substring). I make the hypothesis the first slice is
>> at the end of the string.
>> A typical example is provided by formatting a decimal string with
>> thousands separator.

>
> I'd use iterators, especially for longer strings...
>
>
> import itertools

[snip]

Err... It's too late for coding... Now I see obvious and simpler variant:

def separate(text, grouplen=3, sep=','):
"separate('12345678') -> '123,456,78'"
textlen = len(text)
end = textlen - (textlen % grouplen)
strings = (text[i:i+grouplen] for i in xrange(0, end, grouplen))
return sep.join(itertools.chain(strings, (text[end:],)))

def back_separate(text, grouplen=3, sep=','):
"back_separate('12345678') -> '12,345,678'"
textlen = len(text)
beg = textlen % grouplen
strings = (text[i:i+grouplen] for i in xrange(beg, textlen, grouplen))
return sep.join(itertools.chain((text[:beg],), strings))

print separate('12345678')
print back_separate('12345678')

--
Jan Kaliszewski (zuo) <>

Jan Kaliszewski, Aug 15, 2009
5. ### RascalGuest

I'm bored for posting this, but here it is:

str_list = list(str)
str_len = len(str)
for i in range(3, str_len, 3):
str_list.insert(str_len - i, ',')
return ''.join(str_list)

Rascal, Aug 15, 2009
6. ### candideGuest

Thanks to all for your response. I particularly appreciate Rascal's solution.

candide, Aug 15, 2009
7. ### Jan KaliszewskiGuest

Dnia 15-08-2009 o 08:08:14 Rascal <> wrote:

> I'm bored for posting this, but here it is:
>
> str_list = list(str)
> str_len = len(str)
> for i in range(3, str_len, 3):
> str_list.insert(str_len - i, ',')
> return ''.join(str_list)

For short strings (for sure most common case) it's ok: simple and clear.
But for huge ones, it's better not to materialize additional list for the
string -- then pure-iterator-sollutions would be better (like Gabriel's or
mine).

Cheers,
*j

--
Jan Kaliszewski (zuo) <>

Jan Kaliszewski, Aug 15, 2009
8. ### Emile van SebilleGuest

On 8/14/2009 5:22 PM candide said...
> Suppose you need to split a string into substrings of a given size (except
> possibly the last substring). I make the hypothesis the first slice is at the
> end of the string.
> A typical example is provided by formatting a decimal string with thousands
> separator.
>
>
> What is the pythonic way to do this ?

I like list comps...

>>> jj = '1234567890123456789'
>>> ",".join([jj[ii:ii+3] for ii in range(0,len(jj),3)])

'123,456,789,012,345,678,9'
>>>

Emile

Emile van Sebille, Aug 15, 2009
9. ### Gregor LinglGuest

> What is the pythonic way to do this ?
>
>
> For my part, i reach to this rather complicated code:
>
>
> # ----------------------
>
> def comaSep(z,k=3, sep=','):
> z=z[::-1]
> x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
> return sep.join(x)
>
> # Test
> for z in ["75096042068045", "509", "12024", "7", "2009"]:
> print z+" --> ", comaSep(z)
>

Just if you are interested, a recursive solution:

>>> def comaSep(z,k=3,sep=","):

return comaSep(z[:-3],k,sep)+sep+z[-3:] if len(z)>3 else z

>>> comaSep("7")

'7'
>>> comaSep("2007")

'2,007'
>>> comaSep("12024")

'12,024'
>>> comaSep("509")

'509'
>>> comaSep("75096042068045")

'75,096,042,068,045'
>>>

Gregor

Gregor Lingl, Aug 15, 2009
10. ### Gregor LinglGuest

> What is the pythonic way to do this ?
>
>
> For my part, i reach to this rather complicated code:
>
>
> # ----------------------
>
> def comaSep(z,k=3, sep=','):
> z=z[::-1]
> x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
> return sep.join(x)
>
> # Test
> for z in ["75096042068045", "509", "12024", "7", "2009"]:
> print z+" --> ", comaSep(z)
>

Just if you are interested, a recursive solution:

>>> def comaSep(z,k=3,sep=","):

return comaSep(z[:-3],k,sep)+sep+z[-3:] if len(z)>3 else z

>>> comaSep("7")

'7'
>>> comaSep("2007")

'2,007'
>>> comaSep("12024")

'12,024'
>>> comaSep("509")

'509'
>>> comaSep("75096042068045")

'75,096,042,068,045'
>>>

Gregor

Gregor Lingl, Aug 15, 2009
11. ### Gregor LinglGuest

Emile van Sebille schrieb:
> On 8/14/2009 5:22 PM candide said...

....
>> What is the pythonic way to do this ?

>
> I like list comps...
>
> >>> jj = '1234567890123456789'
> >>> ",".join([jj[ii:ii+3] for ii in range(0,len(jj),3)])

> '123,456,789,012,345,678,9'
> >>>

>
> Emile
>

Less beautiful but more correct:

>>> ",".join([jj[max(ii-3,0):ii] for ii in

range(len(jj)%3,len(jj)+3,3)])
'1,234,567,890,123,456,789'

Gregor

Gregor Lingl, Aug 15, 2009
12. ### Mark TolonenGuest

"Gregor Lingl" <> wrote in message
news:4a87036a\$0\$2292\$...
> Emile van Sebille schrieb:
>> On 8/14/2009 5:22 PM candide said...

> ...
>>> What is the pythonic way to do this ?

>>
>> I like list comps...
>>
>> >>> jj = '1234567890123456789'
>> >>> ",".join([jj[ii:ii+3] for ii in range(0,len(jj),3)])

>> '123,456,789,012,345,678,9'
>> >>>

>>
>> Emile
>>

>
> Less beautiful but more correct:
>
> >>> ",".join([jj[max(ii-3,0):ii] for ii in

> range(len(jj)%3,len(jj)+3,3)])
> '1,234,567,890,123,456,789'
>
> Gregor

Is it?

>>> jj = '234567890123456789'
>>> ",".join([jj[max(ii-3,0):ii] for ii in range(len(jj)%3,len(jj)+3,3)])

',234,567,890,123,456,789'

At least one other solution in this thread had the same problem.

-Mark

Mark Tolonen, Aug 15, 2009
13. ### rylesGuest

On Aug 14, 8:22 pm, candide <> wrote:
> Suppose you need to split a string into substrings of a given size (except
> possibly the last substring). I make the hypothesis the first slice is at the
> end of the string.
> A typical example is provided by formatting a decimal string with thousands
> separator.
>
> What is the pythonic way to do this ?
>
> For my part, i reach to this rather complicated code:
>
> # ----------------------
>
> def comaSep(z,k=3, sep=','):
>     z=z[::-1]
>     x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
>     return sep.join(x)
>
> # Test
> for z in ["75096042068045", "509", "12024", "7", "2009"]:
>     print z+" --> ", comaSep(z)
>
> # ----------------------
>
> outputting :
>
> 75096042068045 -->  75,096,042,068,045
> 509 -->  509
> 12024 -->  12,024
> 7 -->  7
> 2009 -->  2,009
>
> Thanks

py> s='1234567'
py> ','.join(_[::-1] for _ in re.findall('.{1,3}',s[::-1])[::-1])
'1,234,567'
py> # j/k

ryles, Aug 15, 2009
14. ### MRABGuest

ryles wrote:
> On Aug 14, 8:22 pm, candide <> wrote:
>> Suppose you need to split a string into substrings of a given size (except
>> possibly the last substring). I make the hypothesis the first slice is at the
>> end of the string.
>> A typical example is provided by formatting a decimal string with thousands
>> separator.
>>
>> What is the pythonic way to do this ?
>>
>> For my part, i reach to this rather complicated code:
>>
>> # ----------------------
>>
>> def comaSep(z,k=3, sep=','):
>> z=z[::-1]
>> x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
>> return sep.join(x)
>>
>> # Test
>> for z in ["75096042068045", "509", "12024", "7", "2009"]:
>> print z+" --> ", comaSep(z)
>>
>> # ----------------------
>>
>> outputting :
>>
>> 75096042068045 --> 75,096,042,068,045
>> 509 --> 509
>> 12024 --> 12,024
>> 7 --> 7
>> 2009 --> 2,009
>>
>> Thanks

>
> py> s='1234567'
> py> ','.join(_[::-1] for _ in re.findall('.{1,3}',s[::-1])[::-1])
> '1,234,567'
> py> # j/k

If you're going to use re, then:

>>> for z in ["75096042068045", "509", "12024", "7", "2009"]:

print re.sub(r"(?<=.)(?=(?:...)+\$)", ",", z)

75,096,042,068,045
509
12,024
7
2,009

MRAB, Aug 15, 2009
15. ### MRABGuest

Brian wrote:
>
>
> On Sat, Aug 15, 2009 at 4:06 PM, MRAB <
> <mailto>> wrote:
>
> ryles wrote:
>
> On Aug 14, 8:22 pm, candide <> wrote:
>
> Suppose you need to split a string into substrings of a
> given size (except
> possibly the last substring). I make the hypothesis the
> first slice is at the
> end of the string.
> A typical example is provided by formatting a decimal string
> with thousands
> separator.
>
> What is the pythonic way to do this ?
>
> For my part, i reach to this rather complicated code:
>
> # ----------------------
>
> def comaSep(z,k=3, sep=','):
> z=z[::-1]
> x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
> return sep.join(x)
>
> # Test
> for z in ["75096042068045", "509", "12024", "7", "2009"]:
> print z+" --> ", comaSep(z)
>
> # ----------------------
>
> outputting :
>
> 75096042068045 --> 75,096,042,068,045
> 509 --> 509
> 12024 --> 12,024
> 7 --> 7
> 2009 --> 2,009
>
> Thanks
>
>
> py> s='1234567'
> py> ','.join(_[::-1] for _ in re.findall('.{1,3}',s[::-1])[::-1])
> '1,234,567'
> py> # j/k
>
>
> If you're going to use re, then:
>
>
> >>> for z in ["75096042068045", "509", "12024", "7", "2009"]:

> print re.sub(r"(?<=.)(?=(?:...)+\$)", ",", z)
>
>
>
> 75,096,042,068,045
> 509
> 12,024
> 7
> 2,009
>
>
> Can you please break down this regex?
>

The call replaces a zero-width match with a comma, ie inserts a comma,
if certain conditions are met:

"(?<=.)"
Look behind for 1 character. There must be at least one previous
character. This ensures that a comma is never inserted at the start of
the string. I could also have used "(?<!^)". Actually, it doesn't check
whether the first character is a "-". That's left as an exercise for the

"(?=(?:...)+\$)"
Look ahead for a multiple of 3 characters, followed by the end of
the string.

MRAB, Aug 15, 2009
16. ### rylesGuest

On Aug 15, 6:28 pm, MRAB <> wrote:

> >      >>> for z in ["75096042068045", "509", "12024", "7", "2009"]:
> >            print re.sub(r"(?<=.)(?=(?:...)+\$)", ",", z)

>
> >     75,096,042,068,045
> >     509
> >     12,024
> >     7
> >     2,009

>
> The call replaces a zero-width match with a comma, ie inserts a comma,
> if certain conditions are met:
>
> "(?<=.)"
>      Look behind for 1 character. There must be at least one previous
> character. This ensures that a comma is never inserted at the start of
> the string. I could also have used "(?<!^)". Actually, it doesn't check
> whether the first character is a "-". That's left as an exercise for the
>
> "(?=(?:...)+\$)"
>      Look ahead for a multiple of 3 characters, followed by the end of
> the string.

Wow, well done. An exceptional recipe from Python's unofficial regex
guru. And thanks for sharing the explanation.

ryles, Aug 15, 2009
17. ### Gregor LinglGuest

Mark Tolonen schrieb:
>
> "Gregor Lingl" <> wrote in message
> news:4a87036a\$0\$2292\$...
>> Emile van Sebille schrieb:
>>> On 8/14/2009 5:22 PM candide said...

>> ...
>>>> What is the pythonic way to do this ?
>>>
>>> I like list comps...
>>>
>>> >>> jj = '1234567890123456789'
>>> >>> ",".join([jj[ii:ii+3] for ii in range(0,len(jj),3)])
>>> '123,456,789,012,345,678,9'
>>> >>>
>>>
>>> Emile
>>>

>>
>> Less beautiful but more correct:
>>
>> >>> ",".join([jj[max(ii-3,0):ii] for ii in

>> range(len(jj)%3,len(jj)+3,3)])
>> '1,234,567,890,123,456,789'
>>
>> Gregor

>
> Is it?
>
>>>> jj = '234567890123456789'
>>>> ",".join([jj[max(ii-3,0):ii] for ii in range(len(jj)%3,len(jj)+3,3)])

> ',234,567,890,123,456,789'

Schluck!

Even more ugly:

",".join([jj[max(ii-3,0):ii] for ii in
range(len(jj)%3,len(jj)+3,3)]).strip(",")
'234,567,890,123,456,789'

Gregor

>
> At least one other solution in this thread had the same problem.
>
> -Mark
>
>

Gregor Lingl, Aug 16, 2009
18. ### Simon FormanGuest

On Aug 14, 8:22 pm, candide <> wrote:
> Suppose you need to split a string into substrings of a given size (except
> possibly the last substring). I make the hypothesis the first slice is at the
> end of the string.
> A typical example is provided by formatting a decimal string with thousands
> separator.
>
> What is the pythonic way to do this ?
>
> For my part, i reach to this rather complicated code:
>
> # ----------------------
>
> def comaSep(z,k=3, sep=','):
>     z=z[::-1]
>     x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
>     return sep.join(x)
>
> # Test
> for z in ["75096042068045", "509", "12024", "7", "2009"]:
>     print z+" --> ", comaSep(z)
>
> # ----------------------
>
> outputting :
>
> 75096042068045 -->  75,096,042,068,045
> 509 -->  509
> 12024 -->  12,024
> 7 -->  7
> 2009 -->  2,009
>
> Thanks

FWIW:

def chunks(s, length=3):
stop = len(s)
start = stop - length
while start > 0:
yield s[start:stop]
stop, start = start, start - length
yield s[:stop]

s = '1234567890'
print ','.join(reversed(list(chunks(s))))
# prints '1,234,567,890'

Simon Forman, Aug 16, 2009
19. ### Gregor LinglGuest

Simon Forman schrieb:
> On Aug 14, 8:22 pm, candide <> wrote:
>> Suppose you need to split a string into substrings of a given size (except
>> possibly the last substring). I make the hypothesis the first slice is at the
>> end of the string.
>> A typical example is provided by formatting a decimal string with thousands
>> separator.
>>
>> What is the pythonic way to do this ?
>>

....
>> Thanks

>
> FWIW:
>
> def chunks(s, length=3):
> stop = len(s)
> start = stop - length
> while start > 0:
> yield s[start:stop]
> stop, start = start, start - length
> yield s[:stop]
>
>
> s = '1234567890'
> print ','.join(reversed(list(chunks(s))))
> # prints '1,234,567,890'

or:

>>> def chunks(s, length=3):

i, j = 0, len(s) % length or length
while i < len(s):
yield s[i:j]
i, j = j, j + length

>>> print(','.join(list(chunks(s))))

1,234,567,890
>>> print(','.join(list(chunks(s,2))))

12,34,56,78,90
>>> print(','.join(list(chunks(s,4))))

12,3456,7890

Regards,
Gregor

Gregor Lingl, Aug 18, 2009