Splitting a string into substrings of equal size

C

candide

Suppose you need to split a string into substrings of a given size (except
possibly the last substring). I make the hypothesis the first slice is at the
end of the string.
A typical example is provided by formatting a decimal string with thousands
separator.


What is the pythonic way to do this ?


For my part, i reach to this rather complicated code:


# ----------------------

def comaSep(z,k=3, sep=','):
z=z[::-1]
x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
return sep.join(x)

# Test
for z in ["75096042068045", "509", "12024", "7", "2009"]:
print z+" --> ", comaSep(z)

# ----------------------

outputting :

75096042068045 --> 75,096,042,068,045
509 --> 509
12024 --> 12,024
7 --> 7
2009 --> 2,009


Thanks
 
G

Gabriel Genellina

Suppose you need to split a string into substrings of a given size
(except
possibly the last substring). I make the hypothesis the first slice is
at the
end of the string.
A typical example is provided by formatting a decimal string with
thousands
separator.


What is the pythonic way to do this ?

py> import locale
py> locale.setlocale(locale.LC_ALL, '')
'Spanish_Argentina.1252'
py> locale.format("%d", 75096042068045, True)
'75.096.042.068.045'

:)
For my part, i reach to this rather complicated code:

Mine isn't very simple either:

py> def genparts(z):
.... n = len(z)
.... i = n%3
.... if i: yield z[:i]
.... for i in xrange(i, n, 3):
.... yield z[i:i+3]
....
py> ','.join(genparts("75096042068045"))
'75,096,042,068,045'
 
J

Jan Kaliszewski

15-08-2009 candide said:
Suppose you need to split a string into substrings of a given size
(except
possibly the last substring). I make the hypothesis the first slice is
at the end of the string.
A typical example is provided by formatting a decimal string with
thousands separator.

I'd use iterators, especially for longer strings...


import itertools

def separate(text, grouplen=3, sep=','):
"separate('12345678') -> '123,456,78'"
repeated_iterator = [iter(text)] * grouplen
groups = itertools.izip_longest(fillvalue='', *repeated_iterator)
strings = (''.join(group) for group in groups) # gen. expr.
return sep.join(strings)

def back_separate(text, grouplen=3, sep=','):
"back_separate('12345678') -> '12,345,678'"
repeated_iterator = [reversed(text)] * grouplen
groups = itertools.izip_longest(fillvalue='', *repeated_iterator)
strings = [''.join(reversed(group)) for group in groups] # list compr.
return sep.join(reversed(strings))

print separate('12345678')
print back_separate('12345678')

# alternate implementation
# (without "materializing" 'strings' as a list in back_separate):
def separate(text, grouplen=3, sep=','):
"separate('12345678') -> '12,345,678'"
textlen = len(text)
end = textlen - (textlen % grouplen)
repeated_iterator = [iter(itertools.islice(text, 0, end))] * grouplen
strings = itertools.imap(lambda *chars: ''.join(chars),
*repeated_iterator)
return sep.join(itertools.chain(strings, (text[end:],)))

def back_separate(text, grouplen=3, sep=','):
"back_separate('12345678') -> '12,345,678'"
beg = len(text) % grouplen
repeated_iterator = [iter(itertools.islice(text, beg, None))] *
grouplen
strings = itertools.imap(lambda *chars: ''.join(chars),
*repeated_iterator)
return sep.join(itertools.chain((text[:beg],), strings))

print separate('12345678')
print back_separate('12345678')


http://docs.python.org/library/itertools.html#recipes
was the inspiration for me (especially grouper).

Cheers,
*j
 
J

Jan Kaliszewski

15-08-2009 Jan Kaliszewski said:
I'd use iterators, especially for longer strings...


import itertools
[snip]

Err... It's too late for coding... Now I see obvious and simpler variant:

def separate(text, grouplen=3, sep=','):
"separate('12345678') -> '123,456,78'"
textlen = len(text)
end = textlen - (textlen % grouplen)
strings = (text[i:i+grouplen] for i in xrange(0, end, grouplen))
return sep.join(itertools.chain(strings, (text[end:],)))

def back_separate(text, grouplen=3, sep=','):
"back_separate('12345678') -> '12,345,678'"
textlen = len(text)
beg = textlen % grouplen
strings = (text[i:i+grouplen] for i in xrange(beg, textlen, grouplen))
return sep.join(itertools.chain((text[:beg],), strings))

print separate('12345678')
print back_separate('12345678')
 
R

Rascal

I'm bored for posting this, but here it is:

def add_commas(str):
str_list = list(str)
str_len = len(str)
for i in range(3, str_len, 3):
str_list.insert(str_len - i, ',')
return ''.join(str_list)
 
J

Jan Kaliszewski

Dnia 15-08-2009 o 08:08:14 Rascal said:
I'm bored for posting this, but here it is:

def add_commas(str):
str_list = list(str)
str_len = len(str)
for i in range(3, str_len, 3):
str_list.insert(str_len - i, ',')
return ''.join(str_list)

For short strings (for sure most common case) it's ok: simple and clear.
But for huge ones, it's better not to materialize additional list for the
string -- then pure-iterator-sollutions would be better (like Gabriel's or
mine).

Cheers,
*j
 
E

Emile van Sebille

On 8/14/2009 5:22 PM candide said...
Suppose you need to split a string into substrings of a given size (except
possibly the last substring). I make the hypothesis the first slice is at the
end of the string.
A typical example is provided by formatting a decimal string with thousands
separator.


What is the pythonic way to do this ?

I like list comps...
>>> jj = '1234567890123456789'
>>> ",".join([jj[ii:ii+3] for ii in range(0,len(jj),3)]) '123,456,789,012,345,678,9'
>>>

Emile
 
G

Gregor Lingl

What is the pythonic way to do this ?


For my part, i reach to this rather complicated code:


# ----------------------

def comaSep(z,k=3, sep=','):
z=z[::-1]
x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
return sep.join(x)

# Test
for z in ["75096042068045", "509", "12024", "7", "2009"]:
print z+" --> ", comaSep(z)

Just if you are interested, a recursive solution:
return comaSep(z[:-3],k,sep)+sep+z[-3:] if len(z)>3 else z

Gregor
 
G

Gregor Lingl

What is the pythonic way to do this ?


For my part, i reach to this rather complicated code:


# ----------------------

def comaSep(z,k=3, sep=','):
z=z[::-1]
x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
return sep.join(x)

# Test
for z in ["75096042068045", "509", "12024", "7", "2009"]:
print z+" --> ", comaSep(z)

Just if you are interested, a recursive solution:
return comaSep(z[:-3],k,sep)+sep+z[-3:] if len(z)>3 else z

Gregor
 
G

Gregor Lingl

Emile said:
On 8/14/2009 5:22 PM candide said... ....
What is the pythonic way to do this ?

I like list comps...
jj = '1234567890123456789'
",".join([jj[ii:ii+3] for ii in range(0,len(jj),3)]) '123,456,789,012,345,678,9'

Emile

Less beautiful but more correct:
>>> ",".join([jj[max(ii-3,0):ii] for ii in
range(len(jj)%3,len(jj)+3,3)])
'1,234,567,890,123,456,789'

Gregor
 
M

Mark Tolonen

Gregor Lingl said:
Emile said:
On 8/14/2009 5:22 PM candide said... ...
What is the pythonic way to do this ?

I like list comps...
jj = '1234567890123456789'
",".join([jj[ii:ii+3] for ii in range(0,len(jj),3)]) '123,456,789,012,345,678,9'

Emile

Less beautiful but more correct:
",".join([jj[max(ii-3,0):ii] for ii in
range(len(jj)%3,len(jj)+3,3)])
'1,234,567,890,123,456,789'

Gregor

Is it?
jj = '234567890123456789'
",".join([jj[max(ii-3,0):ii] for ii in range(len(jj)%3,len(jj)+3,3)])
',234,567,890,123,456,789'

At least one other solution in this thread had the same problem.

-Mark
 
R

ryles

Suppose you need to split a string into substrings of a given size (except
possibly the last substring). I make the hypothesis the first slice is at the
end of the string.
A typical example is provided by formatting a decimal string with thousands
separator.

What is the pythonic way to do this ?

For my part, i reach to this rather complicated code:

# ----------------------

def comaSep(z,k=3, sep=','):
    z=z[::-1]
    x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
    return sep.join(x)

# Test
for z in ["75096042068045", "509", "12024", "7", "2009"]:
    print z+" --> ", comaSep(z)

# ----------------------

outputting :

75096042068045 -->  75,096,042,068,045
509 -->  509
12024 -->  12,024
7 -->  7
2009 -->  2,009

Thanks

py> s='1234567'
py> ','.join(_[::-1] for _ in re.findall('.{1,3}',s[::-1])[::-1])
'1,234,567'
py> # j/k ;)
 
M

MRAB

ryles said:
Suppose you need to split a string into substrings of a given size (except
possibly the last substring). I make the hypothesis the first slice is at the
end of the string.
A typical example is provided by formatting a decimal string with thousands
separator.

What is the pythonic way to do this ?

For my part, i reach to this rather complicated code:

# ----------------------

def comaSep(z,k=3, sep=','):
z=z[::-1]
x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
return sep.join(x)

# Test
for z in ["75096042068045", "509", "12024", "7", "2009"]:
print z+" --> ", comaSep(z)

# ----------------------

outputting :

75096042068045 --> 75,096,042,068,045
509 --> 509
12024 --> 12,024
7 --> 7
2009 --> 2,009

Thanks

py> s='1234567'
py> ','.join(_[::-1] for _ in re.findall('.{1,3}',s[::-1])[::-1])
'1,234,567'
py> # j/k ;)

If you're going to use re, then:
>>> for z in ["75096042068045", "509", "12024", "7", "2009"]:
print re.sub(r"(?<=.)(?=(?:...)+$)", ",", z)


75,096,042,068,045
509
12,024
7
2,009
 
M

MRAB

Brian said:
On Sat, Aug 15, 2009 at 4:06 PM, MRAB <[email protected]

ryles wrote:


Suppose you need to split a string into substrings of a
given size (except
possibly the last substring). I make the hypothesis the
first slice is at the
end of the string.
A typical example is provided by formatting a decimal string
with thousands
separator.

What is the pythonic way to do this ?

For my part, i reach to this rather complicated code:

# ----------------------

def comaSep(z,k=3, sep=','):
z=z[::-1]
x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
return sep.join(x)

# Test
for z in ["75096042068045", "509", "12024", "7", "2009"]:
print z+" --> ", comaSep(z)

# ----------------------

outputting :

75096042068045 --> 75,096,042,068,045
509 --> 509
12024 --> 12,024
7 --> 7
2009 --> 2,009

Thanks


py> s='1234567'
py> ','.join(_[::-1] for _ in re.findall('.{1,3}',s[::-1])[::-1])
'1,234,567'
py> # j/k ;)


If you're going to use re, then:

for z in ["75096042068045", "509", "12024", "7", "2009"]:
print re.sub(r"(?<=.)(?=(?:...)+$)", ",", z)



75,096,042,068,045
509
12,024
7
2,009


Can you please break down this regex?
The call replaces a zero-width match with a comma, ie inserts a comma,
if certain conditions are met:

"(?<=.)"
Look behind for 1 character. There must be at least one previous
character. This ensures that a comma is never inserted at the start of
the string. I could also have used "(?<!^)". Actually, it doesn't check
whether the first character is a "-". That's left as an exercise for the
reader. :)

"(?=(?:...)+$)"
Look ahead for a multiple of 3 characters, followed by the end of
the string.
 
R

ryles

     >>> for z in ["75096042068045", "509", "12024", "7", "2009"]:
           print re.sub(r"(?<=.)(?=(?:...)+$)", ",", z)
    75,096,042,068,045
    509
    12,024
    7
    2,009

The call replaces a zero-width match with a comma, ie inserts a comma,
if certain conditions are met:

"(?<=.)"
     Look behind for 1 character. There must be at least one previous
character. This ensures that a comma is never inserted at the start of
the string. I could also have used "(?<!^)". Actually, it doesn't check
whether the first character is a "-". That's left as an exercise for the
reader. :)

"(?=(?:...)+$)"
     Look ahead for a multiple of 3 characters, followed by the end of
the string.

Wow, well done. An exceptional recipe from Python's unofficial regex
guru. And thanks for sharing the explanation.
 
G

Gregor Lingl

Mark said:
Gregor Lingl said:
Emile said:
On 8/14/2009 5:22 PM candide said... ...
What is the pythonic way to do this ?

I like list comps...

jj = '1234567890123456789'
",".join([jj[ii:ii+3] for ii in range(0,len(jj),3)])
'123,456,789,012,345,678,9'


Emile

Less beautiful but more correct:
",".join([jj[max(ii-3,0):ii] for ii in
range(len(jj)%3,len(jj)+3,3)])
'1,234,567,890,123,456,789'

Gregor

Is it?
jj = '234567890123456789'
",".join([jj[max(ii-3,0):ii] for ii in range(len(jj)%3,len(jj)+3,3)])
',234,567,890,123,456,789'

Schluck!

Even more ugly:

",".join([jj[max(ii-3,0):ii] for ii in
range(len(jj)%3,len(jj)+3,3)]).strip(",")
'234,567,890,123,456,789'

Gregor
 
S

Simon Forman

Suppose you need to split a string into substrings of a given size (except
possibly the last substring). I make the hypothesis the first slice is at the
end of the string.
A typical example is provided by formatting a decimal string with thousands
separator.

What is the pythonic way to do this ?

For my part, i reach to this rather complicated code:

# ----------------------

def comaSep(z,k=3, sep=','):
    z=z[::-1]
    x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
    return sep.join(x)

# Test
for z in ["75096042068045", "509", "12024", "7", "2009"]:
    print z+" --> ", comaSep(z)

# ----------------------

outputting :

75096042068045 -->  75,096,042,068,045
509 -->  509
12024 -->  12,024
7 -->  7
2009 -->  2,009

Thanks

FWIW:

def chunks(s, length=3):
stop = len(s)
start = stop - length
while start > 0:
yield s[start:stop]
stop, start = start, start - length
yield s[:stop]


s = '1234567890'
print ','.join(reversed(list(chunks(s))))
# prints '1,234,567,890'
 
G

Gregor Lingl

Simon said:
Suppose you need to split a string into substrings of a given size (except
possibly the last substring). I make the hypothesis the first slice is at the
end of the string.
A typical example is provided by formatting a decimal string with thousands
separator.

What is the pythonic way to do this ?
....
Thanks

FWIW:

def chunks(s, length=3):
stop = len(s)
start = stop - length
while start > 0:
yield s[start:stop]
stop, start = start, start - length
yield s[:stop]


s = '1234567890'
print ','.join(reversed(list(chunks(s))))
# prints '1,234,567,890'
or:
i, j = 0, len(s) % length or length
while i < len(s):
yield s[i:j]
i, j = j, j + length
12,3456,7890

Regards,
Gregor
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top