group but do not capture

naren · Feb 3, 2004

Hi,

I need some help with a regular expression parsing,

I have to group a string but want to exclude some characters from the
group, for example, I have a string :

gnl|genbank|2398 this is a test gene

Paul Lalli · Feb 3, 2004

Hi,

I need some help with a regular expression parsing,

I have to group a string but want to exclude some characters from the
group, for example, I have a string :

would like to get genbank2398

I have tried following reg ex, but it doesn't work, can any body
help??

m/\|(\w+(?:\|)\d+)/

(?:\|), group but do not capture | , is not working, I am getting
genbank|2398

You're confused as to what (?

does. It doesn't exclude from capturing
whatever's in the parens. It simply means that these particular
parentheses will not capture any text for setting in $1, $2, $3, etc.

In your example, I would probably break it to two lines:

m/\|(\w+)\|(\d+)/;
$string = $1 . $2;

Paul Lalli

David K. Wall · Feb 3, 2004

naren said:
I need some help with a regular expression parsing,

I have to group a string but want to exclude some characters from the
group, for example, I have a string :

would like to get genbank2398

I have tried following reg ex, but it doesn't work, can any body
help??

m/\|(\w+(?:\|)\d+)/

(?:\|), group but do not capture | , is not working, I am getting
genbank|2398

Actually, it is working, or $2 would be set to '|'.

You could capture only the parts you want and then concatenate them:

my $string = 'gnl|genbank|2398 this is a test gene';
my $result;
if ($string =~ /\w+\|(\w+)\|(\d+)/) {
$result = $1 . $2;
}

or you could grab everything including the unwanted | and then remove it:

my $string = 'gnl|genbank|2398 this is a test gene';
my $result;
if ($string =~ /^\w+\|(\w+\|\d+)/) {
($result = $1) =~ s/\|//;
}

Or you could split() the string on the |s and then modify the pieces.
Whatever is most convenient....

(and if I were Someone Who Must Not Be Named I'd write it using index()and
substr(), but that's far too painful....)

naren · Feb 4, 2004

Hi,

Thank you very much!!
I understand that we can get this in $1 and $2,
but the challenge I faced is to get this in one step,
basically I feed this regex to a configuration file,
which will use this regex to parse the line, it can
only take $1, it can't append $1 and $2.
That is why I considered to use (?:\|), group but do
not capture,I haven't undestood how this works??

But thanks for your feedback,

Naren.

Ben Morrow · Feb 5, 2004

[don't top-post]

I understand that we can get this in $1 and $2, but the challenge I
faced is to get this in one step, basically I feed this regex to a
configuration file, which will use this regex to parse the line, it
can only take $1, it can't append $1 and $2.

Can't be done. Each $N captures a contiguous sequence of characters
from the target string, so you can't get two sections from different
places into $1.

That is why I considered to use (?:\|), group but do not capture,I
haven't undestood how this works??

No... () captures *everything* inside it. Even if some of the inside
is captured again. If you execute

"abc" =~ /(.(.).)/

then $1="abc" and $2="b": the "b" has been captured twice. If that had
been

"abc" =~ /(.(?:.).)/

then you would have $1="abc" still but no $2 as there's only one set
of capturing parens.

Ben

naren · Feb 5, 2004

Thanks!! Ben

Ben Morrow said:
[don't top-post]

I understand that we can get this in $1 and $2, but the challenge I
faced is to get this in one step, basically I feed this regex to a
configuration file, which will use this regex to parse the line, it
can only take $1, it can't append $1 and $2.

Click to expand...

Can't be done. Each $N captures a contiguous sequence of characters
from the target string, so you can't get two sections from different
places into $1.

That is why I considered to use (?:\|), group but do not capture,I
haven't undestood how this works??

Click to expand...

No... () captures *everything* inside it. Even if some of the inside
is captured again. If you execute

"abc" =~ /(.(.).)/

then $1="abc" and $2="b": the "b" has been captured twice. If that had
been

"abc" =~ /(.(?:.).)/

then you would have $1="abc" still but no $2 as there's only one set
of capturing parens.

Ben

Strange behavior of 'Alternative capture group numbering'	2	Jan 1, 2012
Page do not work, when adding php code	1	Sep 16, 2022
A process take input from /proc/<pid>/fd/0, but won't process it	0	Oct 29, 2023
RegExp - Match specific words, but not if they're inside parenthesis (with or without other words within)	6	Jan 29, 2023
Connected SQLite to my java program but information are not submitted	2	Aug 2, 2022
Uhhhhh, What can I do next?	6	Nov 25, 2023
Problem with perl group capture.	2	May 17, 2007
I need some help on a format issue that should be simple for someone here (but not me!)	0	Jul 6, 2023

group but do not capture

naren

Paul Lalli

David K. Wall

naren

Ben Morrow

naren

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads