R
Robin Munn
How is re.split supposed to work? This wasn't at all what I expected:
[rmunn@localhost ~]$ python
Python 2.2.2 (#1, Jan 12 2003, 12:07:20)
[GCC 3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Since \b matches the empty string, but only at the beginning and end of
a word, I would have expected re.split(r'\b', 'a b c d' to produce
either:
['', 'a', ' ', 'b', ' ', 'c', ' ', 'd', '']
or:
['a', ' ', 'b', ' ', 'c', ' ', 'd']
But I didn't expect that re.split(r'\b', 'a b c d') would yield no splits
whatsoever. The module doc says "split(pattern, string[, maxsplit = 0]):
split string by the occurrences of pattern". re.findall() seems to think
that \b occurs eight times in 'a b c d':
['', '', '', '', '', '', '', '']
So why doesn't re.split() think so? I'm puzzled.
[rmunn@localhost ~]$ python
Python 2.2.2 (#1, Jan 12 2003, 12:07:20)
[GCC 3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
import re
re.split(r'\W+', 'a b c d') ['a', 'b', 'c', 'd']
# Expected result. ....
re.split(r'\b', 'a b c d') ['a b c d']
# Huh?
Since \b matches the empty string, but only at the beginning and end of
a word, I would have expected re.split(r'\b', 'a b c d' to produce
either:
['', 'a', ' ', 'b', ' ', 'c', ' ', 'd', '']
or:
['a', ' ', 'b', ' ', 'c', ' ', 'd']
But I didn't expect that re.split(r'\b', 'a b c d') would yield no splits
whatsoever. The module doc says "split(pattern, string[, maxsplit = 0]):
split string by the occurrences of pattern". re.findall() seems to think
that \b occurs eight times in 'a b c d':
['', '', '', '', '', '', '', '']
So why doesn't re.split() think so? I'm puzzled.