group but do not capture

N

naren

Hi,

I need some help with a regular expression parsing,

I have to group a string but want to exclude some characters from the
group, for example, I have a string :
gnl|genbank|2398 this is a test gene

would like to get genbank2398

I have tried following reg ex, but it doesn't work, can any body
help??

m/\|(\w+(?:\|)\d+)/

(?:\|), group but do not capture | , is not working, I am getting
genbank|2398

Thanks in advance,
Naren.
 
P

Paul Lalli

Hi,

I need some help with a regular expression parsing,

I have to group a string but want to exclude some characters from the
group, for example, I have a string :


would like to get genbank2398

I have tried following reg ex, but it doesn't work, can any body
help??

m/\|(\w+(?:\|)\d+)/

(?:\|), group but do not capture | , is not working, I am getting
genbank|2398


You're confused as to what (?:) does. It doesn't exclude from capturing
whatever's in the parens. It simply means that these particular
parentheses will not capture any text for setting in $1, $2, $3, etc.

In your example, I would probably break it to two lines:

m/\|(\w+)\|(\d+)/;
$string = $1 . $2;

Paul Lalli
 
D

David K. Wall

naren said:
I need some help with a regular expression parsing,

I have to group a string but want to exclude some characters from the
group, for example, I have a string :


would like to get genbank2398

I have tried following reg ex, but it doesn't work, can any body
help??

m/\|(\w+(?:\|)\d+)/

(?:\|), group but do not capture | , is not working, I am getting
genbank|2398

Actually, it is working, or $2 would be set to '|'.

You could capture only the parts you want and then concatenate them:

my $string = 'gnl|genbank|2398 this is a test gene';
my $result;
if ($string =~ /\w+\|(\w+)\|(\d+)/) {
$result = $1 . $2;
}


or you could grab everything including the unwanted | and then remove it:

my $string = 'gnl|genbank|2398 this is a test gene';
my $result;
if ($string =~ /^\w+\|(\w+\|\d+)/) {
($result = $1) =~ s/\|//;
}

Or you could split() the string on the |s and then modify the pieces.
Whatever is most convenient....

(and if I were Someone Who Must Not Be Named I'd write it using index()and
substr(), but that's far too painful....)
 
N

naren

Hi,

Thank you very much!!
I understand that we can get this in $1 and $2,
but the challenge I faced is to get this in one step,
basically I feed this regex to a configuration file,
which will use this regex to parse the line, it can
only take $1, it can't append $1 and $2.
That is why I considered to use (?:\|), group but do
not capture,I haven't undestood how this works??

But thanks for your feedback,

Naren.
 
B

Ben Morrow

[don't top-post]

I understand that we can get this in $1 and $2, but the challenge I
faced is to get this in one step, basically I feed this regex to a
configuration file, which will use this regex to parse the line, it
can only take $1, it can't append $1 and $2.

Can't be done. Each $N captures a contiguous sequence of characters
from the target string, so you can't get two sections from different
places into $1.
That is why I considered to use (?:\|), group but do not capture,I
haven't undestood how this works??

No... () captures *everything* inside it. Even if some of the inside
is captured again. If you execute

"abc" =~ /(.(.).)/

then $1="abc" and $2="b": the "b" has been captured twice. If that had
been

"abc" =~ /(.(?:.).)/

then you would have $1="abc" still but no $2 as there's only one set
of capturing parens.

Ben
 
N

naren

Thanks!! Ben

Ben Morrow said:
[don't top-post]

I understand that we can get this in $1 and $2, but the challenge I
faced is to get this in one step, basically I feed this regex to a
configuration file, which will use this regex to parse the line, it
can only take $1, it can't append $1 and $2.

Can't be done. Each $N captures a contiguous sequence of characters
from the target string, so you can't get two sections from different
places into $1.
That is why I considered to use (?:\|), group but do not capture,I
haven't undestood how this works??

No... () captures *everything* inside it. Even if some of the inside
is captured again. If you execute

"abc" =~ /(.(.).)/

then $1="abc" and $2="b": the "b" has been captured twice. If that had
been

"abc" =~ /(.(?:.).)/

then you would have $1="abc" still but no $2 as there's only one set
of capturing parens.

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,135
Latest member
VeronaShap
Top