how to split text into lines?

K

kj

In Perl, one can break a chunk of text into an array of lines while
preserving the trailing line-termination sequence in each line, if
any, by splitting the text on the regular expression /^/:

DB<1> x split(/^/, "foo\nbar\nbaz")
0 'foo
'
1 'bar
'
2 'baz'

But nothing like this seems to work in Python:
['foo\nbar\nbaz']

(One gets the same result if one adds the re.MULTILINE flag to the
re.split call.)

What's the Python idiom for splitting text into lines, preserving
the end-of-line sequence in each line?
 
K

kj

In Perl, one can break a chunk of text into an array of lines while
preserving the trailing line-termination sequence in each line, if
any, by splitting the text on the regular expression /^/:
DB<1> x split(/^/, "foo\nbar\nbaz")
0 'foo
'
1 'bar
'
2 'baz'
But nothing like this seems to work in Python:
re.split('^', 'foo\nbar\nbaz')
['foo\nbar\nbaz']

(One gets the same result if one adds the re.MULTILINE flag to the
re.split call.)
What's the Python idiom for splitting text into lines, preserving
the end-of-line sequence in each line?


Sorry, I should have googled this first. I just found splitlines()...

Still, for my own edification, is there a way to achieve the same
effect using re.split?

TIA!

kynn
 
A

alex23

kj said:
Sorry, I should have googled this first.  I just found splitlines()...

Still, for my own edification, is there a way to achieve the same
effect using re.split?

re.split(os.linesep, <string>) works the same as <string>.splitlines()

Neither retain the EOL for each line, though. The only way I'm aware
of is to re-add it:

[s+os.linesep for s in re.split(os.linesep, <string>)]

Was that what you were after?
 
C

Chris

kj said:
Sorry, I should have googled this first.  I just found splitlines()....
Still, for my own edification, is there a way to achieve the same
effect using re.split?

re.split(os.linesep, <string>) works the same as <string>.splitlines()

Neither retain the EOL for each line, though. The only way I'm aware
of is to re-add it:

[s+os.linesep for s in re.split(os.linesep, <string>)]

Was that what you were after?

or what about 'string'.splitlines(True) as that retains newline
characters. ;)
 
A

alex23

Chris said:
or what about 'string'.splitlines(True) as that retains newline
characters. ;)

Okay, you win :)

Man, you'd think with the ease of object introspection I'd have at
least looked at its docstring :)

Cheers, Chris!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top