How to prevent re.split() from removing part of string

Discussion in 'Python' started by Jeremy, Nov 30, 2009.

  1. Jeremy

    Jeremy Guest

    I am using re.split to... well, split a string into sections. I want
    to split when, following a new line, there are 4 or fewer spaces. The
    pattern I use is:

    sections = re.split('\n\s{,4}[^\s]', lineoftext)

    This splits appropriately but I lose the character matched by [^s]. I
    know I can put parentheses around [^s] and keep the matched character,
    but the character is placed in it's own element of the list instead of
    with the rest of the lineoftext.

    Does anyone know how I can accomplish this without losing the matched
    character?

    Thanks,
    Jeremy
     
    Jeremy, Nov 30, 2009
    #1
    1. Advertising

  2. Jeremy

    MRAB Guest

    Jeremy wrote:
    > I am using re.split to... well, split a string into sections. I want
    > to split when, following a new line, there are 4 or fewer spaces. The
    > pattern I use is:
    >
    > sections = re.split('\n\s{,4}[^\s]', lineoftext)
    >
    > This splits appropriately but I lose the character matched by [^s]. I
    > know I can put parentheses around [^s] and keep the matched character,
    > but the character is placed in it's own element of the list instead of
    > with the rest of the lineoftext.
    >
    > Does anyone know how I can accomplish this without losing the matched
    > character?
    >

    First of all, \s matches any character that's _whitespace_, such as
    space, "\t", "\n", "\r", "\f". There's also \S, which matches any
    character that's not whitespace.

    But in answer to your question, use a look-ahead:

    sections = re.split('\n {,4}(?=\S)', lineoftext)
     
    MRAB, Dec 1, 2009
    #2
    1. Advertising

  3. Jeremy

    Jeremy Guest

    On Nov 30, 5:24 pm, MRAB <> wrote:
    > Jeremy wrote:
    > > I am using re.split to... well, split a string into sections.  I want
    > > to split when, following a new line, there are 4 or fewer spaces.  The
    > > pattern I use is:

    >
    > >         sections = re.split('\n\s{,4}[^\s]', lineoftext)

    >
    > > This splits appropriately but I lose the character matched by [^s].  I
    > > know I can put parentheses around [^s] and keep the matched character,
    > > but the character is placed in it's own element of the list instead of
    > > with the rest of the lineoftext.

    >
    > > Does anyone know how I can accomplish this without losing the matched
    > > character?

    >
    > First of all, \s matches any character that's _whitespace_, such as
    > space, "\t", "\n", "\r", "\f". There's also \S, which matches any
    > character that's not whitespace.


    Thanks for the reminder. I knew \S existed, but must have forgotten
    about it.
    >
    > But in answer to your question, use a look-ahead:
    >
    >      sections = re.split('\n {,4}(?=\S)', lineoftext)


    Yep, that does the trick. Thanks for the help!

    Jeremy
     
    Jeremy, Dec 1, 2009
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Carlos Ribeiro
    Replies:
    11
    Views:
    736
    Alex Martelli
    Sep 17, 2004
  2. Replies:
    4
    Views:
    415
  3. Hal Vaughan
    Replies:
    3
    Views:
    266
    Jussi Piitulainen
    Jun 12, 2008
  4. Sam Kong
    Replies:
    5
    Views:
    275
    Rick DeNatale
    Aug 12, 2006
  5. Stanley Xu
    Replies:
    2
    Views:
    707
    Stanley Xu
    Mar 23, 2011
Loading...

Share This Page