Or for a slightly less simple minded splitting you could try
re.split:
re.split("(\w+)", "The quick brown fox jumps, and falls
over.")[1::2]
['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']
Perhaps I'm missing something, but the above regex does the exact
same thing as line.split() except it is significantly slower and
harder to read.
Neither deal with quoted text, apostrophes, hyphens, punctuation or
any other details of real-world text. That's what I mean by
"simple-minded".
s = "The quick brown fox jumps, and falls over."
import re
re.split(r"(\w+)", s)[1::2] ['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']
s.split()
['The', 'quick', 'brown', 'fox', 'jumps,', 'and', 'falls',
'over.']
Note the difference in "jumps" vs. "jumps," (extra comma in the
string.split() version) and likewise the period after "over".
Thus not quite "the exact same thing as line.split()".
I think an easier-to-read variant would be
['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']
which just finds words. One could also just limit it to letters with
re.findall("[a-zA-Z]", s)
as "\w" is a little more encompassing (letters and underscores)
if that's a problem.
-tkc