group but do not capture

Discussion in 'Perl Misc' started by naren, Feb 3, 2004.

  1. naren

    naren Guest

    Hi,

    I need some help with a regular expression parsing,

    I have to group a string but want to exclude some characters from the
    group, for example, I have a string :

    >gnl|genbank|2398 this is a test gene


    would like to get genbank2398

    I have tried following reg ex, but it doesn't work, can any body
    help??

    m/\|(\w+(?:\|)\d+)/

    (?:\|), group but do not capture | , is not working, I am getting
    genbank|2398

    Thanks in advance,
    Naren.
    naren, Feb 3, 2004
    #1
    1. Advertising

  2. naren

    Paul Lalli Guest

    On Tue, 3 Feb 2004, naren wrote:

    > Hi,
    >
    > I need some help with a regular expression parsing,
    >
    > I have to group a string but want to exclude some characters from the
    > group, for example, I have a string :
    >
    > >gnl|genbank|2398 this is a test gene

    >
    > would like to get genbank2398
    >
    > I have tried following reg ex, but it doesn't work, can any body
    > help??
    >
    > m/\|(\w+(?:\|)\d+)/
    >
    > (?:\|), group but do not capture | , is not working, I am getting
    > genbank|2398
    >



    You're confused as to what (?:) does. It doesn't exclude from capturing
    whatever's in the parens. It simply means that these particular
    parentheses will not capture any text for setting in $1, $2, $3, etc.

    In your example, I would probably break it to two lines:

    m/\|(\w+)\|(\d+)/;
    $string = $1 . $2;

    Paul Lalli
    Paul Lalli, Feb 3, 2004
    #2
    1. Advertising

  3. naren <> wrote:

    > I need some help with a regular expression parsing,
    >
    > I have to group a string but want to exclude some characters from the
    > group, for example, I have a string :
    >
    >>gnl|genbank|2398 this is a test gene

    >
    > would like to get genbank2398
    >
    > I have tried following reg ex, but it doesn't work, can any body
    > help??
    >
    > m/\|(\w+(?:\|)\d+)/
    >
    > (?:\|), group but do not capture | , is not working, I am getting
    > genbank|2398


    Actually, it is working, or $2 would be set to '|'.

    You could capture only the parts you want and then concatenate them:

    my $string = 'gnl|genbank|2398 this is a test gene';
    my $result;
    if ($string =~ /\w+\|(\w+)\|(\d+)/) {
    $result = $1 . $2;
    }


    or you could grab everything including the unwanted | and then remove it:

    my $string = 'gnl|genbank|2398 this is a test gene';
    my $result;
    if ($string =~ /^\w+\|(\w+\|\d+)/) {
    ($result = $1) =~ s/\|//;
    }

    Or you could split() the string on the |s and then modify the pieces.
    Whatever is most convenient....

    (and if I were Someone Who Must Not Be Named I'd write it using index()and
    substr(), but that's far too painful....)

    --
    David Wall
    David K. Wall, Feb 3, 2004
    #3
  4. naren

    naren Guest

    Hi,

    Thank you very much!!
    I understand that we can get this in $1 and $2,
    but the challenge I faced is to get this in one step,
    basically I feed this regex to a configuration file,
    which will use this regex to parse the line, it can
    only take $1, it can't append $1 and $2.
    That is why I considered to use (?:\|), group but do
    not capture,I haven't undestood how this works??

    But thanks for your feedback,

    Naren.

    "David K. Wall" <> wrote in message news:<Xns948499A38D6AAdkwwashere@216.168.3.30>...
    > naren <> wrote:
    >
    > > I need some help with a regular expression parsing,
    > >
    > > I have to group a string but want to exclude some characters from the
    > > group, for example, I have a string :
    > >
    > >>gnl|genbank|2398 this is a test gene

    > >
    > > would like to get genbank2398
    > >
    > > I have tried following reg ex, but it doesn't work, can any body
    > > help??
    > >
    > > m/\|(\w+(?:\|)\d+)/
    > >
    > > (?:\|), group but do not capture | , is not working, I am getting
    > > genbank|2398

    >
    > Actually, it is working, or $2 would be set to '|'.
    >
    > You could capture only the parts you want and then concatenate them:
    >
    > my $string = 'gnl|genbank|2398 this is a test gene';
    > my $result;
    > if ($string =~ /\w+\|(\w+)\|(\d+)/) {
    > $result = $1 . $2;
    > }
    >
    >
    > or you could grab everything including the unwanted | and then remove it:
    >
    > my $string = 'gnl|genbank|2398 this is a test gene';
    > my $result;
    > if ($string =~ /^\w+\|(\w+\|\d+)/) {
    > ($result = $1) =~ s/\|//;
    > }
    >
    > Or you could split() the string on the |s and then modify the pieces.
    > Whatever is most convenient....
    >
    > (and if I were Someone Who Must Not Be Named I'd write it using index()and
    > substr(), but that's far too painful....)
    naren, Feb 4, 2004
    #4
  5. naren

    Ben Morrow Guest

    [don't top-post]

    (naren) wrote:
    > I understand that we can get this in $1 and $2, but the challenge I
    > faced is to get this in one step, basically I feed this regex to a
    > configuration file, which will use this regex to parse the line, it
    > can only take $1, it can't append $1 and $2.


    Can't be done. Each $N captures a contiguous sequence of characters
    from the target string, so you can't get two sections from different
    places into $1.

    > That is why I considered to use (?:\|), group but do not capture,I
    > haven't undestood how this works??


    No... () captures *everything* inside it. Even if some of the inside
    is captured again. If you execute

    "abc" =~ /(.(.).)/

    then $1="abc" and $2="b": the "b" has been captured twice. If that had
    been

    "abc" =~ /(.(?:.).)/

    then you would have $1="abc" still but no $2 as there's only one set
    of capturing parens.

    Ben

    --
    Musica Dei donum optimi, trahit homines, trahit deos. |
    Musica truces mollit animos, tristesque mentes erigit. |
    Musica vel ipsas arbores et horridas movet feras. |
    Ben Morrow, Feb 5, 2004
    #5
  6. naren

    naren Guest

    Thanks!! Ben

    Ben Morrow <> wrote in message news:<bvs1n8$cm0$>...
    > [don't top-post]
    >
    > (naren) wrote:
    > > I understand that we can get this in $1 and $2, but the challenge I
    > > faced is to get this in one step, basically I feed this regex to a
    > > configuration file, which will use this regex to parse the line, it
    > > can only take $1, it can't append $1 and $2.

    >
    > Can't be done. Each $N captures a contiguous sequence of characters
    > from the target string, so you can't get two sections from different
    > places into $1.
    >
    > > That is why I considered to use (?:\|), group but do not capture,I
    > > haven't undestood how this works??

    >
    > No... () captures *everything* inside it. Even if some of the inside
    > is captured again. If you execute
    >
    > "abc" =~ /(.(.).)/
    >
    > then $1="abc" and $2="b": the "b" has been captured twice. If that had
    > been
    >
    > "abc" =~ /(.(?:.).)/
    >
    > then you would have $1="abc" still but no $2 as there's only one set
    > of capturing parens.
    >
    > Ben
    naren, Feb 5, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Max
    Replies:
    7
    Views:
    9,076
  2. Replies:
    3
    Views:
    1,550
  3. Jon Clements
    Replies:
    3
    Views:
    285
    Jon Clements
    Sep 17, 2010
  4. S. Robert James

    Regex group without capture

    S. Robert James, Feb 22, 2007, in forum: Ruby
    Replies:
    1
    Views:
    92
    Peña, Botp
    Feb 22, 2007
  5. Ersin Er
    Replies:
    8
    Views:
    154
    Sherm Pendley
    Oct 2, 2005
Loading...

Share This Page