remove unwanted parts from strings

B

bingster

Hello,

If there is a string like this:

$test = 'a bc (B, M, D),d e (B, M),lfm (D)'

how can I remove all the '(*.)' parts to make it something like:

'a bc,d e,lfm'

I tried:

$test =~ s/\(*.\)//g;

But the result is 'a bc (B, M, ,d e (B, ,lfm '.

Thanks in advance for any help,

bingster
 
L

Lars Eighner

In our last episode,
the lovely and talented bingster
broadcast on comp.lang.perl.misc:
If there is a string like this:
$test = 'a bc (B, M, D),d e (B, M),lfm (D)'
how can I remove all the '(*.)' parts to make it something like:
'a bc,d e,lfm'
$test =~ s/\(*.\)//g;

This doesn't do what you think. I think you have got *. where you
meant .*, but even correcting that won't do what you think.

As you have written it above, you are looking to match zero or more
(s followed by any single character, followed by ). This will remove
the (D) at the end of your string and all the )s with the character
that precedes them.
But the result is 'a bc (B, M, ,d e (B, ,lfm '.

Exactly.

Now I don't know whether when you wrote *. it was a typo for .* or
whether you are really confused about what * and . mean. But let's
try it the other way, in case it was a typo.

You may have meant:

$test =~ s/\(.*\)//g;

which says match on ( followed by zero or more of any character
followed by ).

But the result in this case would be 'a bc '. You see REGULAR
EXPRESSIONS ARE GREEDY (write this in stone), which means they
will match the biggest string they can. And the biggest match
here begins with the first ( and ends with the last ). But that
is not what you want. You want to match the first ( and everything
up to and including the first ), and then you want to match the
second ( and everything up to and including the second ) and so
forth.

So try this:

$test =~ s/\([^)]*\)//g;

This says, match a ( followed by zero or more characters that
are not ) and then a ). Notice that you do not escape the ) in
the square brackets because ) is not special in square brackets
- the characters that are special in square brackets are -]\^$
..

This gives you:

a bc ,d e ,lfm

which isn't quite what you want because you want the leading space
with (s out too, if there is one, but it is a step in thr right direction.

In order to remove that white space character if there is one, this
will work (you may want to adjust it if you have more than one white
space character or if you really only want to remove space characters
and not any white space character):

$test =~ s/\s?\([^)]*\)//g;

This gives you:

a bc,d e,lfm

which is exactly what you asked for:
$test = 'a bc,d e,lfm'

I believe there are other ways to make regular expressions less
greedy, and perhaps some of them are better, but this makes sense
to me.
 
J

Jürgen Exner

bingster said:
If there is a string like this:
$test = 'a bc (B, M, D),d e (B, M),lfm (D)'
how can I remove all the '(*.)' parts to make it something like:
'a bc,d e,lfm'

I tried:
$test =~ s/\(*.\)//g;
But the result is 'a bc (B, M, ,d e (B, ,lfm '.

As others have pointed out you probably don't want /*./ but rather /.*/
Add a non-greedy marker to the recipe and you are done:

$_ = 'a bc (B, M, D),d e (B, M),lfm (D)';
s/\(.*?\)//g;
print;

For further details please see "perldoc perlre", section "Regular
Expressions", paragraph starting with
"By default, a quantified subpattern is "greedy", that is, ... "

jue
 
J

Jürgen Exner

Lars Eighner wrote:
[convoluted way to create a non-greedy expression snipped]
I believe there are other ways to make regular expressions less
greedy, and perhaps some of them are better, but this makes sense
to me.

Yep, there is. Just append a "?" to the quantifier, exactly as described in
the very paragraph you started to quote.

jue
 
B

bingster

Many thanks to all who replied. I've not been programming for a while.
With Lars lucid explanation, I started remembering a lot. Yeah, '*.'
was my typo. With this problem resovled, I can move on now.

Bing

Lars said:
In our last episode,
the lovely and talented bingster
broadcast on comp.lang.perl.misc:

Hello,

If there is a string like this:

$test = 'a bc (B, M, D),d e (B, M),lfm (D)'

how can I remove all the '(*.)' parts to make it something like:

'a bc,d e,lfm'

I tried:

$test =~ s/\(*.\)//g;


This doesn't do what you think. I think you have got *. where you
meant .*, but even correcting that won't do what you think.

As you have written it above, you are looking to match zero or more
(s followed by any single character, followed by ). This will remove
the (D) at the end of your string and all the )s with the character
that precedes them.

But the result is 'a bc (B, M, ,d e (B, ,lfm '.


Exactly.

Now I don't know whether when you wrote *. it was a typo for .* or
whether you are really confused about what * and . mean. But let's
try it the other way, in case it was a typo.

You may have meant:

$test =~ s/\(.*\)//g;

which says match on ( followed by zero or more of any character
followed by ).

But the result in this case would be 'a bc '. You see REGULAR
EXPRESSIONS ARE GREEDY (write this in stone), which means they
will match the biggest string they can. And the biggest match
here begins with the first ( and ends with the last ). But that
is not what you want. You want to match the first ( and everything
up to and including the first ), and then you want to match the
second ( and everything up to and including the second ) and so
forth.

So try this:

$test =~ s/\([^)]*\)//g;

This says, match a ( followed by zero or more characters that
are not ) and then a ). Notice that you do not escape the ) in
the square brackets because ) is not special in square brackets
- the characters that are special in square brackets are -]\^$
.

This gives you:

a bc ,d e ,lfm
which isn't quite what you want because you want the leading space
with (s out too, if there is one, but it is a step in thr right direction.

In order to remove that white space character if there is one, this
will work (you may want to adjust it if you have more than one white
space character or if you really only want to remove space characters
and not any white space character):

$test =~ s/\s?\([^)]*\)//g;

This gives you:

a bc,d e,lfm

which is exactly what you asked for:

$test = 'a bc,d e,lfm'


I believe there are other ways to make regular expressions less
greedy, and perhaps some of them are better, but this makes sense
to me.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,023
Latest member
websitedesig25

Latest Threads

Top