remove unwanted parts from strings

Discussion in 'Perl Misc' started by bingster, Nov 5, 2004.

  1. bingster

    bingster Guest

    Hello,

    If there is a string like this:

    $test = 'a bc (B, M, D),d e (B, M),lfm (D)'

    how can I remove all the '(*.)' parts to make it something like:

    'a bc,d e,lfm'

    I tried:

    $test =~ s/\(*.\)//g;

    But the result is 'a bc (B, M, ,d e (B, ,lfm '.

    Thanks in advance for any help,

    bingster
    bingster, Nov 5, 2004
    #1
    1. Advertising

  2. bingster

    Lars Eighner Guest

    In our last episode,
    <>,
    the lovely and talented bingster
    broadcast on comp.lang.perl.misc:

    > Hello,


    > If there is a string like this:


    > $test = 'a bc (B, M, D),d e (B, M),lfm (D)'


    > how can I remove all the '(*.)' parts to make it something like:


    > 'a bc,d e,lfm'


    > I tried:


    > $test =~ s/\(*.\)//g;


    This doesn't do what you think. I think you have got *. where you
    meant .*, but even correcting that won't do what you think.

    As you have written it above, you are looking to match zero or more
    (s followed by any single character, followed by ). This will remove
    the (D) at the end of your string and all the )s with the character
    that precedes them.

    > But the result is 'a bc (B, M, ,d e (B, ,lfm '.


    Exactly.

    Now I don't know whether when you wrote *. it was a typo for .* or
    whether you are really confused about what * and . mean. But let's
    try it the other way, in case it was a typo.

    You may have meant:

    $test =~ s/\(.*\)//g;

    which says match on ( followed by zero or more of any character
    followed by ).

    But the result in this case would be 'a bc '. You see REGULAR
    EXPRESSIONS ARE GREEDY (write this in stone), which means they
    will match the biggest string they can. And the biggest match
    here begins with the first ( and ends with the last ). But that
    is not what you want. You want to match the first ( and everything
    up to and including the first ), and then you want to match the
    second ( and everything up to and including the second ) and so
    forth.

    So try this:

    $test =~ s/\([^)]*\)//g;

    This says, match a ( followed by zero or more characters that
    are not ) and then a ). Notice that you do not escape the ) in
    the square brackets because ) is not special in square brackets
    - the characters that are special in square brackets are -]\^$
    ..

    This gives you:

    a bc ,d e ,lfm

    which isn't quite what you want because you want the leading space
    with (s out too, if there is one, but it is a step in thr right direction.

    In order to remove that white space character if there is one, this
    will work (you may want to adjust it if you have more than one white
    space character or if you really only want to remove space characters
    and not any white space character):

    $test =~ s/\s?\([^)]*\)//g;

    This gives you:

    a bc,d e,lfm

    which is exactly what you asked for:

    > $test = 'a bc,d e,lfm'


    I believe there are other ways to make regular expressions less
    greedy, and perhaps some of them are better, but this makes sense
    to me.

    --
    Lars Eighner -finger for geek code- http://www.io.com/~eighner/
    If it wasn't for muscle spasms, I wouldn't get any exercise at all.
    Lars Eighner, Nov 5, 2004
    #2
    1. Advertising

  3. bingster wrote:
    > If there is a string like this:
    > $test = 'a bc (B, M, D),d e (B, M),lfm (D)'
    > how can I remove all the '(*.)' parts to make it something like:
    > 'a bc,d e,lfm'
    >
    > I tried:
    > $test =~ s/\(*.\)//g;
    > But the result is 'a bc (B, M, ,d e (B, ,lfm '.


    As others have pointed out you probably don't want /*./ but rather /.*/
    Add a non-greedy marker to the recipe and you are done:

    $_ = 'a bc (B, M, D),d e (B, M),lfm (D)';
    s/\(.*?\)//g;
    print;

    For further details please see "perldoc perlre", section "Regular
    Expressions", paragraph starting with
    "By default, a quantified subpattern is "greedy", that is, ... "

    jue
    Jürgen Exner, Nov 6, 2004
    #3
  4. Lars Eighner wrote:
    [convoluted way to create a non-greedy expression snipped]

    > I believe there are other ways to make regular expressions less
    > greedy, and perhaps some of them are better, but this makes sense
    > to me.


    Yep, there is. Just append a "?" to the quantifier, exactly as described in
    the very paragraph you started to quote.

    jue
    Jürgen Exner, Nov 6, 2004
    #4
  5. bingster

    bingster Guest

    Many thanks to all who replied. I've not been programming for a while.
    With Lars lucid explanation, I started remembering a lot. Yeah, '*.'
    was my typo. With this problem resovled, I can move on now.

    Bing

    Lars Eighner wrote:

    > In our last episode,
    > <>,
    > the lovely and talented bingster
    > broadcast on comp.lang.perl.misc:
    >
    >
    >>Hello,

    >
    >
    >>If there is a string like this:

    >
    >
    >>$test = 'a bc (B, M, D),d e (B, M),lfm (D)'

    >
    >
    >>how can I remove all the '(*.)' parts to make it something like:

    >
    >
    >>'a bc,d e,lfm'

    >
    >
    >>I tried:

    >
    >
    >>$test =~ s/\(*.\)//g;

    >
    >
    > This doesn't do what you think. I think you have got *. where you
    > meant .*, but even correcting that won't do what you think.
    >
    > As you have written it above, you are looking to match zero or more
    > (s followed by any single character, followed by ). This will remove
    > the (D) at the end of your string and all the )s with the character
    > that precedes them.
    >
    >
    >>But the result is 'a bc (B, M, ,d e (B, ,lfm '.

    >
    >
    > Exactly.
    >
    > Now I don't know whether when you wrote *. it was a typo for .* or
    > whether you are really confused about what * and . mean. But let's
    > try it the other way, in case it was a typo.
    >
    > You may have meant:
    >
    > $test =~ s/\(.*\)//g;
    >
    > which says match on ( followed by zero or more of any character
    > followed by ).
    >
    > But the result in this case would be 'a bc '. You see REGULAR
    > EXPRESSIONS ARE GREEDY (write this in stone), which means they
    > will match the biggest string they can. And the biggest match
    > here begins with the first ( and ends with the last ). But that
    > is not what you want. You want to match the first ( and everything
    > up to and including the first ), and then you want to match the
    > second ( and everything up to and including the second ) and so
    > forth.
    >
    > So try this:
    >
    > $test =~ s/\([^)]*\)//g;
    >
    > This says, match a ( followed by zero or more characters that
    > are not ) and then a ). Notice that you do not escape the ) in
    > the square brackets because ) is not special in square brackets
    > - the characters that are special in square brackets are -]\^$
    > .
    >
    > This gives you:
    >
    > a bc ,d e ,lfm
    >


    > which isn't quite what you want because you want the leading space
    > with (s out too, if there is one, but it is a step in thr right direction.
    >
    > In order to remove that white space character if there is one, this
    > will work (you may want to adjust it if you have more than one white
    > space character or if you really only want to remove space characters
    > and not any white space character):
    >
    > $test =~ s/\s?\([^)]*\)//g;
    >
    > This gives you:
    >
    > a bc,d e,lfm
    >
    > which is exactly what you asked for:
    >
    >
    >>$test = 'a bc,d e,lfm'

    >
    >
    > I believe there are other ways to make regular expressions less
    > greedy, and perhaps some of them are better, but this makes sense
    > to me.
    >
    bingster, Nov 8, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Chris Kettenbach

    Remove unwanted charcters from a string

    Chris Kettenbach, Jul 20, 2005, in forum: ASP .Net
    Replies:
    1
    Views:
    477
    Chris Kettenbach
    Jul 20, 2005
  2. Replies:
    3
    Views:
    595
    Keith Thompson
    Mar 31, 2007
  3. Replies:
    1
    Views:
    948
    =?Utf-8?B?UGV0ZXIgQnJvbWJlcmcgW0MjIE1WUF0=?=
    Apr 12, 2007
  4. kizk
    Replies:
    0
    Views:
    586
Loading...

Share This Page