How to prevent re.split() from removing part of string

Jeremy · Nov 30, 2009

I am using re.split to... well, split a string into sections. I want
to split when, following a new line, there are 4 or fewer spaces. The
pattern I use is:

sections = re.split('\n\s{,4}[^\s]', lineoftext)

This splits appropriately but I lose the character matched by [^s]. I
know I can put parentheses around [^s] and keep the matched character,
but the character is placed in it's own element of the list instead of
with the rest of the lineoftext.

Does anyone know how I can accomplish this without losing the matched
character?

Thanks,
Jeremy

MRAB · Dec 1, 2009

Jeremy said:
I am using re.split to... well, split a string into sections. I want
to split when, following a new line, there are 4 or fewer spaces. The
pattern I use is:

sections = re.split('\n\s{,4}[^\s]', lineoftext)

This splits appropriately but I lose the character matched by [^s]. I
know I can put parentheses around [^s] and keep the matched character,
but the character is placed in it's own element of the list instead of
with the rest of the lineoftext.

Does anyone know how I can accomplish this without losing the matched
character?

First of all, \s matches any character that's _whitespace_, such as
space, "\t", "\n", "\r", "\f". There's also \S, which matches any
character that's not whitespace.

But in answer to your question, use a look-ahead:

sections = re.split('\n {,4}(?=\S)', lineoftext)

Jeremy · Dec 1, 2009

Jeremy said:
Jeremy said:

I am using re.split to... well, split a string into sections. I want
to split when, following a new line, there are 4 or fewer spaces. The
pattern I use is:

Click to expand...

sections = re.split('\n\s{,4}[^\s]', lineoftext)

Click to expand...

This splits appropriately but I lose the character matched by [^s]. I
know I can put parentheses around [^s] and keep the matched character,
but the character is placed in it's own element of the list instead of
with the rest of the lineoftext.

Click to expand...

Does anyone know how I can accomplish this without losing the matched
character?

Click to expand...

First of all, \s matches any character that's _whitespace_, such as
space, "\t", "\n", "\r", "\f". There's also \S, which matches any
character that's not whitespace.

Thanks for the reminder. I knew \S existed, but must have forgotten
about it.

But in answer to your question, use a look-ahead:

sections = re.split('\n {,4}(?=\S)', lineoftext)

Yep, that does the trick. Thanks for the help!

Jeremy

Copying part of a vector element to a string variable	3	Oct 8, 2013
how to remove the string part between the two marks	4	Mar 9, 2011
How to split string	7	Dec 5, 2007
Split a string based on change of character	2	Jul 29, 2007
KirbyBase : replacing string exceptions	2	Nov 23, 2009
Python point location of intersect between two lines	0	Feb 28, 2018
Regex Issue: Removing all or part of a pattern from the start of a string	3	Jun 10, 2008
How to get a part of string which follows a particular pattern using shell script	3	May 8, 2006

How to prevent re.split() from removing part of string

Jeremy

MRAB

Jeremy

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads