Discussion in 'Python' started by Mensanator, Oct 16, 2009.

1. ### MensanatorGuest

All I wanted to do is split a binary number into two lists,
a list of blocks of consecutive ones and another list of
blocks of consecutive zeroes.

But no, you can't do that.

>>> c = '0010000110'
>>> c.split('0')

['', '', '1', '', '', '', '11', '']

Ok, the consecutive delimiters appear as empty strings for
reasons unknown (except for the first one). Except when they
start or end the string in which case the first one is included.

Maybe there's a reason for this inconsistent behaviour but you
won't find it in the documentation.

And the re module doesn't help.

>>> f = ' 1 2 3 4 '
>>> re.split(' ',f)

['', '', '1', '2', '', '3', '', '', '4', '', '', '', '']

OTOH, if my digits were seperated by whitespace, I could use
str.split(), which behaves differently (but not re.split()
because it requires a string argument).

>>> ' 1 11 111 11 '.split()

['1', '11', '111', '11']

That means I can use re to solve my problem after all.

>>> c = '0010000110'
>>> re.sub('0',' ',c).split()

['1', '11']
>>> re.sub('1',' ',c).split()

['00', '0000', '0']

Would it have been that difficult to show in the documentation
how to do this?

Mensanator, Oct 16, 2009

2. ### MelGuest

Mensanator wrote:

> All I wanted to do is split a binary number into two lists,
> a list of blocks of consecutive ones and another list of
> blocks of consecutive zeroes.
>
> But no, you can't do that.
>
>>>> c = '0010000110'
>>>> c.split('0')

> ['', '', '1', '', '', '', '11', '']

[ ... ]
> OTOH, if my digits were seperated by whitespace, I could use
> str.split(), which behaves differently

Hmm. You could.

Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
[GCC 4.3.3] on linux2
>>> c = '0010000110'
>>> c1 = c.replace ('01', '0 1')
>>> c1 = c1.replace ('10', '1 0')
>>> c1.split()

['00', '1', '0000', '11', '0']

Mel.

Mel, Oct 16, 2009

3. ### Ishwor GurungGuest

Ishwor Gurung, Oct 16, 2009
4. ### John O'HaganGuest

On Fri, 16 Oct 2009, Mensanator wrote:
> All I wanted to do is split a binary number into two lists,
> a list of blocks of consecutive ones and another list of
> blocks of consecutive zeroes.

[...]
> That means I can use re to solve my problem after all.
>
> >>> c = '0010000110'
> >>> re.sub('0',' ',c).split()

>
> ['1', '11']
>
> >>> re.sub('1',' ',c).split()

>
> ['00', '0000', '0']
>

[...]

Or without resorting to re:

c.replace('0', ' ').split()
c.replace('1', ' ').split()

Three or four times faster, too!

Regards,

John

John O'Hagan, Oct 16, 2009
5. ### Paul RubinGuest

Mensanator <> writes:
> And the re module doesn't help.
>
> >>> f = ' 1 2 3 4 '
> >>> re.split(' ',f)

> ['', '', '1', '2', '', '3', '', '', '4', '', '', '', '']

filter(bool, re.split(' ', f))

You might also like:

from itertools import groupby
c = '0010000110'
print list(list(xs) for k,xs in groupby(c))

Paul Rubin, Oct 16, 2009
6. ### Ishwor GurungGuest

2009/10/16 Paul Rubin <http://>:
[...]
> You might also like:
>
>    from itertools import groupby
>    c = '0010000110'
>    print list(list(xs) for k,xs in groupby(c))

Too bad groupby is only available in Python2.6+
Since you're here, any chance of getting your NDK team to look into
getting some small subset of STL, Boost into Android? That'd be
awesome thing you know.
--
Regards,
Ishwor Gurung

Ishwor Gurung, Oct 16, 2009
7. ### Ishwor GurungGuest

2009/10/16 Ishwor Gurung <>:
> 2009/10/16 Paul Rubin <http://>:
> [...]
>> You might also like:
>>
>>    from itertools import groupby
>>    c = '0010000110'
>>    print list(list(xs) for k,xs in groupby(c))

> Too bad groupby is only available in Python2.6+

OK. I stand corrected ;-)

> Since you're here, any chance of getting your NDK team to look into
> getting some small subset of STL, Boost into Android? That'd be
> awesome thing you know.

Yeah? Anything forthcoming in the releases to address this? thanks.
--
Regards,
Ishwor Gurung

Ishwor Gurung, Oct 16, 2009
8. ### Paul RubinGuest

Ishwor Gurung <> writes:
> Since you're here, any chance of getting your NDK team to look into
> getting some small subset of STL, Boost into Android? That'd be
> awesome thing you know.

My what who where? You are confusing me with someone else.

Paul Rubin, Oct 16, 2009
9. ### Ishwor GurungGuest

> My what who where?  You are confusing me with someone else.

Andy Rubin- http://en.wikipedia.org/wiki/Andy_Rubin

Sorry to bother you.
--
Regards,
Ishwor Gurung

Ishwor Gurung, Oct 16, 2009
10. ### ThomasGuest

On Oct 15, 9:18 pm, Mensanator <> wrote:
> All I wanted to do is split a binary number into two lists,
> a list of blocks of consecutive ones and another list of
> blocks of consecutive zeroes.
>
> But no, you can't do that.
>
> >>> c = '0010000110'
> >>> c.split('0')

>
> ['', '', '1', '', '', '', '11', '']
>
> Ok, the consecutive delimiters appear as empty strings for
> reasons unknown (except for the first one). Except when they
> start or end the string in which case the first one is included.
>
> Maybe there's a reason for this inconsistent behaviour but you
> won't find it in the documentation.
>
> And the re module doesn't help.
>
> >>> f = '  1 2  3   4    '
> >>> re.split(' ',f)

>
> ['', '', '1', '2', '', '3', '', '', '4', '', '', '', '']
>
> OTOH, if my digits were seperated by whitespace, I could use
> str.split(), which behaves differently (but not re.split()
> because it requires a string argument).
>
> >>> ' 1  11   111 11    '.split()

>
> ['1', '11', '111', '11']
>
> That means I can use re to solve my problem after all.
>
> >>> c = '0010000110'
> >>> re.sub('0',' ',c).split()

> ['1', '11']
> >>> re.sub('1',' ',c).split()

>
> ['00', '0000', '0']
>
> Would it have been that difficult to show in the documentation
> how to do this?

PythonWin 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit
(Intel)] on win32.
>>> list('001010111100101')

['0', '0', '1', '0', '1', '0', '1', '1', '1', '1', '0', '0', '1', '0',
'1']
>>>

TC

Thomas, Oct 17, 2009
11. ### Carl BanksGuest

On Oct 15, 6:57 pm, Ishwor Gurung <> wrote:
> Too bad groupby is only available in Python2.6+
> Since you're here, any chance of getting your NDK team to look into
> getting some small subset of STL, Boost into Android?

Carl Banks

Carl Banks, Oct 17, 2009
12. ### MensanatorGuest

On Oct 16, 8:00ï¿½pm, Thomas <> wrote:
> On Oct 15, 9:18ï¿½pm, Mensanator <> wrote:
>
>
>
>
>
> > All I wanted to do is split a binary number into two lists,
> > a list of blocks of consecutive ones and another list of
> > blocks of consecutive zeroes.

>
> > But no, you can't do that.

>
> > >>> c = '0010000110'
> > >>> c.split('0')

>
> > ['', '', '1', '', '', '', '11', '']

>
> > Ok, the consecutive delimiters appear as empty strings for
> > reasons unknown (except for the first one). Except when they
> > start or end the string in which case the first one is included.

>
> > Maybe there's a reason for this inconsistent behaviour but you
> > won't find it in the documentation.

>
> > And the re module doesn't help.

>
> > >>> f = ' ï¿½1 2 ï¿½3 ï¿½ 4 ï¿½ ï¿½'
> > >>> re.split(' ',f)

>
> > ['', '', '1', '2', '', '3', '', '', '4', '', '', '', '']

>
> > OTOH, if my digits were seperated by whitespace, I could use
> > str.split(), which behaves differently (but not re.split()
> > because it requires a string argument).

>
> > >>> ' 1 ï¿½11 ï¿½ 111 11 ï¿½ ï¿½'.split()

>
> > ['1', '11', '111', '11']

>
> > That means I can use re to solve my problem after all.

>
> > >>> c = '0010000110'
> > >>> re.sub('0',' ',c).split()

> > ['1', '11']
> > >>> re.sub('1',' ',c).split()

>
> > ['00', '0000', '0']

>
> > Would it have been that difficult to show in the documentation
> > how to do this?

>
> PythonWin 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit
> (Intel)] on win32.
> for further copyright information.>>> list('001010111100101')
>
> ['0', '0', '1', '0', '1', '0', '1', '1', '1', '1', '0', '0', '1', '0',
> '1']

Thanks, but what I wanted was
['00','1','0','1','0','1111','00','1','0' '1'].

>
>
>
> TC

Mensanator, Oct 17, 2009
13. ### Paul RubinGuest

Mensanator <> writes:
> Thanks, but what I wanted was
> ['00','1','0','1','0','1111','00','1','0' '1'].

>>> c = '001010111100101'
>>> list(''.join(g) for k,g in groupby(c))

['00', '1', '0', '1', '0', '1111', '00', '1', '0', '1']

is really not that unnatural.

Paul Rubin, Oct 17, 2009
14. ### MensanatorGuest

On Oct 16, 11:41ï¿½pm, Paul Rubin <http://> wrote:
> Mensanator <> writes:
> > Thanks, but what I wanted was
> > ['00','1','0','1','0','1111','00','1','0' '1'].

>
> ï¿½ ï¿½ >>> c = '001010111100101'
> ï¿½ ï¿½ >>> list(''.join(g) for k,g in groupby(c))
> ï¿½ ï¿½ ['00', '1', '0', '1', '0', '1111', '00', '1', '0', '1']
>
> is really not that unnatural.

I thought someone else had suggested this solution earlier.
Oh yeah, some guy named Paul Rubin. At first, I thought I
needed to keep the 1's and 0's in seperate lists, but now that
I see this example, I'm rethinking that.

Thanks, and thanks to Paul Rubin.

Mensanator, Oct 17, 2009
15. ### MensanatorGuest

On Oct 20, 1:51 pm, David C Ullrich <> wrote:
> On Thu, 15 Oct 2009 18:18:09 -0700, Mensanator wrote:
> > All I wanted to do is split a binary number into two lists, a list of
> > blocks of consecutive ones and another list of blocks of consecutive
> > zeroes.

>
> > But no, you can't do that.

>
> >>>> c = '0010000110'
> >>>> c.split('0')

> > ['', '', '1', '', '', '', '11', '']

>
> > Ok, the consecutive delimiters appear as empty strings for reasons
> > unknown (except for the first one). Except when they start or end the
> > string in which case the first one is included.

>
> > Maybe there's a reason for this inconsistent behaviour but you won't
> > find it in the documentation.

>
> Wanna bet? I'm not sure whether you're claiming that the behavior
> is not specified in the docs or the reason for it. The behavior
> certainly is specified. I conjecture you think the behavior itself
> is not specified,

The problem is that the docs give a single example

>>> '1,,2'.split(',')

['1','','2']

ignoring the special case of leading/trailing delimiters. Yes, if you
think it through, ',1,,2,'.split(',') should return ['','1','','2','']
for exactly the reasons you give.

Trouble is, we often find ourselves doing ' 1 2 '.split() which
returns
['1','2'].

I'm not saying either behaviour is wrong, it's just not obvious that
the
one behaviour doesn't follow from the other and the documentation
could be
a little clearer on this matter. It might make a bit more sense to
actually
mention the slpit(sep) behavior that split() doesn't do.

> because your description of what's happening,
>
> "consecutive delimiters appear as empty strings for reasons
>
> > unknown (except for the first one). Except when they start or end the
> > string in which case the first one is included"

>
> is at best an awkward way to look at it. The delimiters
> are not appearing as empty strings.
>
> You're asking to split  '0010000110' on '0'.
> So you're asking for strings a, b, c, etc such that
>
> (*) '0010000110' = a + '0' + b + '0' + c + '0' + etc
>
> The sequence of strings you're getting as output satisfies
> (*) exactly; the first '' is what appears before the first
> delimiter, the second '' is what's between the first and
> second delimiters, etc.

Mensanator, Oct 20, 2009
16. ### MensanatorGuest

On Oct 21, 2:46 pm, David C Ullrich <> wrote:
> On Tue, 20 Oct 2009 15:22:55 -0700, Mensanator wrote:
> > On Oct 20, 1:51 pm, David C Ullrich <> wrote:
> >> On Thu, 15 Oct 2009 18:18:09 -0700, Mensanator wrote:
> >> > All I wanted to do is split a binary number into two lists, a list of
> >> > blocks of consecutive ones and another list of blocks of consecutive
> >> > zeroes.

>
> >> > But no, you can't do that.

>
> >> >>>> c = '0010000110'
> >> >>>> c.split('0')
> >> > ['', '', '1', '', '', '', '11', '']

>
> >> > Ok, the consecutive delimiters appear as empty strings for reasons
> >> > unknown (except for the first one). Except when they start or end the
> >> > string in which case the first one is included.

>
> >> > Maybe there's a reason for this inconsistent behaviour but you won't
> >> > find it in the documentation.

>
> >> Wanna bet? I'm not sure whether you're claiming that the behavior is
> >> not specified in the docs or the reason for it. The behavior certainly
> >> is specified. I conjecture you think the behavior itself is not
> >> specified,

>
> > The problem is that the docs give a single example

>
> >>>> '1,,2'.split(',')

> > ['1','','2']

>
> > ignoring the special case of leading/trailing delimiters. Yes, if you
> > think it through, ',1,,2,'.split(',') should return ['','1','','2','']
> > for exactly the reasons you give.

>
> > Trouble is, we often find ourselves doing ' 1  2  '.split() which
> > returns
> > ['1','2'].

>
> > I'm not saying either behaviour is wrong, it's just not obvious that the
> > one behaviour doesn't follow from the other and the documentation could
> > be
> > a little clearer on this matter. It might make a bit more sense to
> > actually
> > mention the slpit(sep) behavior that split() doesn't do.

>
> Have you _read_ the docs?

Yes.

> They're quite clear on the difference
> between no sep (or sep=None) and sep=something:

I disagree that they are "quite clear". The first paragraph makes no
mention of leading or trailing delimiters and they show no example
of such usage. An example would at least force me to think about it
if it isn't specifically mentioned in the paragraph.

One could infer from the second paragraph that, as it doesn't return
empty stings from leading and trailing whitespace, slpit(sep) does
for leading/trailing delimiters. Of course, why would I even be
this paragraph when I'm trying to understand split(sep)?

The splitting of real strings is just as important, if not more so,
than the behaviour of splitting empty strings. Especially when the

>>> '010000110'.split('0')

['', '1', '', '', '', '11', '']

is a perfect example. It shows the empty strings generated from the
leading and trailing delimiters, and also that you get 3 empty
strings
between the '1's, not 4. When creating documentation, it is always a
good idea to document such cases.

And you'll then want to compare this to the equivalent whitespace
case:
>>> ' 1 11 '.split()

['1', '11']

And it wouldn't hurt to point this out:
>>> c = '010000110'.split('0')
>>> '0'.join(c)

'010000110'

and note that it won't work with the whitespace version.

No, I have not submitted a request to change the documentation, I was
looking for some feedback here. And it seems that no one else
considers
the documentation wanting.

>
> "If sep is given, consecutive delimiters are not grouped together and are
> deemed to delimit empty strings (for example, '1,,2'.split(',') returns
> ['1', '', '2']). The sep argument may consist of multiple characters (for
> example, '1<>2<>3'.split('<>') returns ['1', '2', '3']). Splitting an
> empty string with a specified separator returns [''].
>
> If sep is not specified or is None, a different splitting algorithm is
> applied: runs of consecutive whitespace are regarded as a single
> separator, and the result will contain no empty strings at the start or
> end if the string has leading or trailing whitespace. Consequently,
> splitting an empty string or a string consisting of just whitespace with
> a None separator returns []."
>
>
>
>
>
> >> because your description of what's happening,

>
> >> "consecutive delimiters appear as empty strings for reasons

>
> >> > unknown (except for the first one). Except when they start or end the
> >> > string in which case the first one is included"

>
> >> is at best an awkward way to look at it. The delimiters are not
> >> appearing as empty strings.

>
> >> You're asking to split  '0010000110' on '0'. So you're asking for
> >> strings a, b, c, etc such that

>
> >> (*) '0010000110' = a + '0' + b + '0' + c + '0' + etc

>
> >> The sequence of strings you're getting as output satisfies (*) exactly;
> >> the first '' is what appears before the first delimiter, the second ''
> >> is what's between the first and second delimiters, etc.

Mensanator, Oct 21, 2009
17. ### John YeungGuest

On Oct 21, 5:43 pm, Mensanator <> wrote:

> >>> '010000110'.split('0')

>
> ['', '1', '', '', '', '11', '']
>
> is a perfect example. It shows the empty strings
> generated from the leading and trailing delimiters,
> and also that you get 3 empty strings between the
> '1's, not 4. When creating documentation, it is
> always a good idea to document such cases.

It's documented. It's even in the example (that you cited yourself):

'1,,2'.split(',') returns ['1', '', '2']

There are two commas between the '1' and the '2', but "only" one empty
string between them. To me, it's obvious that

'1,,2'.split(',')

is equivalent to

'1002'.split('0')

> And you'll then want to compare this to the
> equivalent whitespace case:
>
> >>> ' 1    11 '.split()

> ['1', '11']

The documentation could not be more explicit that when the separator
is not specified or is None, it behaves very differently.

Have you tried to see what happens with

' 1 11 '.split(' ')

(Hint: The separator is (a kind of) white space... yet IS specified.)

> I was looking for some feedback here.
> And it seems that no one else considers the
> documentation wanting.

This particular section of documentation, no. I have issues with some
of the documentation here and there; this is not one of those areas.

You kept using phrases in your arguments like "Yes, if you
think it through" and "An example would at least force me to think
about it". Um... are we not supposed to think?

John

John Yeung, Oct 22, 2009
18. ### Carl BanksGuest

On Oct 21, 12:46 pm, David C Ullrich <> wrote:
> On Tue, 20 Oct 2009 15:22:55 -0700, Mensanator wrote:
> > On Oct 20, 1:51 pm, David C Ullrich <> wrote:
> > I'm not saying either behaviour is wrong, it's just not obvious that the
> > one behaviour doesn't follow from the other and the documentation could
> > be
> > a little clearer on this matter. It might make a bit more sense to
> > actually
> > mention the slpit(sep) behavior that split() doesn't do.

>
> Have you _read_ the docs? They're quite clear on the difference
> between no sep (or sep=None) and sep=something:

Even if the docs do describe the behavior adequately, he has a point
that the documents should emphasize the counterintutive split
personality of the method better.

s.split() and s.split(sep) do different things, and there is no string
sep that can make s.split(sep) behave like s.split(). That's not
unheard of but it does go against our typical expectations. It would
have been a better library design if s.split() and s.split(sep) were
different methods.

That they are the same method isn't the end of the world but the
documentation really ought to emphasize its dual nature.

Carl Banks

Carl Banks, Oct 22, 2009
19. ### Guest

On 10/21/2009 11:47 PM, Carl Banks wrote:
> On Oct 21, 12:46 pm, David C Ullrich <> wrote:
>> On Tue, 20 Oct 2009 15:22:55 -0700, Mensanator wrote:
>> > On Oct 20, 1:51 pm, David C Ullrich <> wrote:
>> > I'm not saying either behaviour is wrong, it's just not obvious that the
>> > one behaviour doesn't follow from the other and the documentation could
>> > be
>> > a little clearer on this matter. It might make a bit more sense to
>> > actually
>> > mention the slpit(sep) behavior that split() doesn't do.

>>
>> Have you _read_ the docs? They're quite clear on the difference
>> between no sep (or sep=None) and sep=something:

>
> Even if the docs do describe the behavior adequately, he has a point
> that the documents should emphasize the counterintutive split
> personality of the method better.
>
> s.split() and s.split(sep) do different things, and there is no string
> sep that can make s.split(sep) behave like s.split(). That's not
> unheard of but it does go against our typical expectations. It would
> have been a better library design if s.split() and s.split(sep) were
> different methods.
>
> That they are the same method isn't the end of the world but the
> documentation really ought to emphasize its dual nature.

I would also offer that the example

'1,,2'.split(',') returns ['1', '', '2'])

could be improved by including a sep instance at the
beginning or end of the string, like

'1,,2,'.split(',') returns ['1', '', '2', ''])

since that illustrates another difference between the
sep and non-sep cases.

, Oct 22, 2009
20. ### MensanatorGuest

On Oct 21, 11:21ï¿½pm, John Yeung <> wrote:
> On Oct 21, 5:43ï¿½pm, Mensanator <> wrote:
>
> > >>> '010000110'.split('0')

>
> > ['', '1', '', '', '', '11', '']

>
> > is a perfect example. It shows the empty strings
> > generated from the leading and trailing delimiters,
> > and also that you get 3 empty strings between the
> > '1's, not 4. When creating documentation, it is
> > always a good idea to document such cases.

>
> It's documented. ï¿½

What does 'it' refer to? A leading or trailing
delimiter? That's what _I_ was refering to.

> It's even in the example

No, it is not.

> (that you cited yourself):
>
> ï¿½ '1,,2'.split(',') returns ['1', '', '2']
>
> There are two commas between the '1' and the '2', but "only" one empty
> string between them. ï¿½To me, it's obvious that
>
> ï¿½ '1,,2'.split(',')
>
> is equivalent to
>
> ï¿½ '1002'.split('0')

That wasn't what I was complaining about.

>
> > And you'll then want to compare this to the
> > equivalent whitespace case:

>
> > >>> ' 1 ï¿½ ï¿½11 '.split()

> > ['1', '11']

>
> The documentation could not be more explicit that when the separator
> is not specified or is None, it behaves very differently.

I am not complaining that it behaves differently, but
the description of said difference could be better
explained.

>
> Have you tried to see what happens with
>
> ï¿½ ' 1 ï¿½ ï¿½11 '.split(' ')

Yes, I actually did that test.

>
> (Hint: ï¿½The separator is (a kind of) white space... yet IS specified.

And yet doesn't behave like .split(). In other words,
when specified, whitespace does not behave like
whitespace. Is it any wonder I have a headache?

>)
>
> > I was looking for some feedback here.
> > And it seems that no one else considers the
> > documentation wanting.

>
> This particular section of documentation, no. ï¿½I have issues with some
> of the documentation here and there; this is not one of those areas.
>
> You kept using phrases in your arguments like "Yes, if you
> think it through" and "An example would at least force me to think
> about it". ï¿½Um... are we not supposed to think?

No, you are not. Documentation isn't supposed to give
you hints so that you can work out the way things
behave. It should provide adequete explantion along
with unambiguous, complete examples. The thinking part
comes into play as you try to figure out how to apply
what you have just learned.

>
> John

Mensanator, Oct 22, 2009