How to prevent re.split() from removing part of string

J

Jeremy

I am using re.split to... well, split a string into sections. I want
to split when, following a new line, there are 4 or fewer spaces. The
pattern I use is:

sections = re.split('\n\s{,4}[^\s]', lineoftext)

This splits appropriately but I lose the character matched by [^s]. I
know I can put parentheses around [^s] and keep the matched character,
but the character is placed in it's own element of the list instead of
with the rest of the lineoftext.

Does anyone know how I can accomplish this without losing the matched
character?

Thanks,
Jeremy
 
M

MRAB

Jeremy said:
I am using re.split to... well, split a string into sections. I want
to split when, following a new line, there are 4 or fewer spaces. The
pattern I use is:

sections = re.split('\n\s{,4}[^\s]', lineoftext)

This splits appropriately but I lose the character matched by [^s]. I
know I can put parentheses around [^s] and keep the matched character,
but the character is placed in it's own element of the list instead of
with the rest of the lineoftext.

Does anyone know how I can accomplish this without losing the matched
character?
First of all, \s matches any character that's _whitespace_, such as
space, "\t", "\n", "\r", "\f". There's also \S, which matches any
character that's not whitespace.

But in answer to your question, use a look-ahead:

sections = re.split('\n {,4}(?=\S)', lineoftext)
 
J

Jeremy

Jeremy said:
I am using re.split to... well, split a string into sections.  I want
to split when, following a new line, there are 4 or fewer spaces.  The
pattern I use is:
        sections = re.split('\n\s{,4}[^\s]', lineoftext)
This splits appropriately but I lose the character matched by [^s].  I
know I can put parentheses around [^s] and keep the matched character,
but the character is placed in it's own element of the list instead of
with the rest of the lineoftext.
Does anyone know how I can accomplish this without losing the matched
character?

First of all, \s matches any character that's _whitespace_, such as
space, "\t", "\n", "\r", "\f". There's also \S, which matches any
character that's not whitespace.

Thanks for the reminder. I knew \S existed, but must have forgotten
about it.
But in answer to your question, use a look-ahead:

     sections = re.split('\n {,4}(?=\S)', lineoftext)

Yep, that does the trick. Thanks for the help!

Jeremy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top