Regex backref returns extra data

Discussion in 'Perl Misc' started by Dan Rawson, Sep 25, 2003.

  1. Dan Rawson

    Dan Rawson Guest

    I'm attempting to extract a single word from a line which looks like:

    TARGET = file1 file2 file3 .....

    I need 'file1' from this line. Using:

    s/.*?\=\s*([a-zA-Z_\.0-9]+).*/$1/ (Note the "+" following the char. class)

    works fine as long as file1 is present. However, if I have a line which ends after the "=" sign, it returns the entire
    line, instead of an empty string.

    If I change the pattern to:

    s/.*?\=\s*([a-zA-Z_\.0-9]*).*/$1/ (Note the "*" following the char. class)

    it works; I get 'file1' back if it's there, and an empty string if not.

    Questions:
    1. My understanding was that the "+" would cause it to match one or more of the char. class (which would make it a
    legal file name in this case). Why does it return the entire line if there's no match??
    2. Why does the second pattern work differently than the first one if there's nothing after the "=" sign?

    TIA . . . .

    Dan
    Dan Rawson, Sep 25, 2003
    #1
    1. Advertising

  2. Dan Rawson

    Stefan Guest

    Hi Dan

    I think you are using a substitution (s///) where you actually want a
    match (m//). If your substitution regex doesn't match, nothing will be
    replaced and you will get the whole line back. If you want to extract
    the first file name, you would be better off writing...

    'TARGET = file1 file2 file3' => m/.*?=\s*(\w.+)/;

    ....after which $1 will contain 'file1'.

    Hope this helps.


    Stefan


    Dan Rawson wrote:
    > I'm attempting to extract a single word from a line which looks like:
    >
    > TARGET = file1 file2 file3 .....
    >
    > I need 'file1' from this line. Using:
    >
    > s/.*?\=\s*([a-zA-Z_\.0-9]+).*/$1/ (Note the "+" following the char. class)
    >
    > works fine as long as file1 is present. However, if I have a line which ends after the "=" sign, it returns the entire
    > line, instead of an empty string.
    >
    > If I change the pattern to:
    >
    > s/.*?\=\s*([a-zA-Z_\.0-9]*).*/$1/ (Note the "*" following the char. class)
    >
    > it works; I get 'file1' back if it's there, and an empty string if not.
    >
    > Questions:
    > 1. My understanding was that the "+" would cause it to match one or more of the char. class (which would make it a
    > legal file name in this case). Why does it return the entire line if there's no match??
    > 2. Why does the second pattern work differently than the first one if there's nothing after the "=" sign?
    >
    > TIA . . . .
    >
    > Dan
    >
    Stefan, Sep 25, 2003
    #2
    1. Advertising

  3. Dan Rawson

    Dan Rawson Guest

    Christian Winter wrote:
    > "Stefan" <> wrote:
    >
    >>Hi Dan
    >>
    >>I think you are using a substitution (s///) where you actually want a
    >>match (m//). If your substitution regex doesn't match, nothing will be
    >>replaced and you will get the whole line back. If you want to extract
    >>the first file name, you would be better off writing...
    >>
    >>'TARGET = file1 file2 file3' => m/.*?=\s*(\w.+)/;
    >>
    >>...after which $1 will contain 'file1'.

    >
    >
    > But be careful with $1, as it will not
    > be reset after an unsuccessful match.
    > e.g.:
    >
    > my $xx = "a b c";
    > $xx =~ /^(a)/; # matches
    > print "\$1 = $1\n";
    > $xx =~ /^(b)/; # does not match
    > print "\$1 = $1\n";
    >
    > which prints out:
    > $1 = a
    > $1 = a
    > and may not be what one expects.
    >
    > So one should either use capturing (see my reply
    > to the OP) or add a check like
    > if( /.*?=\s*(\w.+)/ ) { do something with $1... }
    >
    > -Christian
    >

    Thanks to everyone for helping me shake out the cobwebs; I guess I should have had another cup of coffee before posting
    this morning!!

    I modified it so it checks for the match, then does the substitution if the match succeeds (duh!).

    Thanks again . . . .

    Dan
    Dan Rawson, Sep 25, 2003
    #3
  4. Dan Rawson

    Uri Guttman Guest

    >>>>> "DR" == Dan Rawson <daniel.rawson.take!this!out!@asml.nl> writes:


    DR> I modified it so it checks for the match, then does the
    DR> substitution if the match succeeds (duh!).

    that sounds redundant. just do the s/// and check if it succeeds. it if
    doesn't your original string is untouched. i have seen newbie code that
    looks like:

    if ( /match/ ) {

    s/match/replace/ ;
    }

    which is the same as just the s/// by itself.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
    Damian Conway Class in Boston - Sept 2003 -- http://www.stemsystems.com/class
    Uri Guttman, Sep 25, 2003
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    3
    Views:
    725
    Reedick, Andrew
    Jul 1, 2008
  2. mathieu
    Replies:
    3
    Views:
    580
    Bo Persson
    Sep 4, 2009
  3. trans.  (T. Onoma)

    Regexp backref w/o consumption?

    trans. (T. Onoma), Nov 13, 2004, in forum: Ruby
    Replies:
    0
    Views:
    89
    trans. (T. Onoma)
    Nov 13, 2004
  4. Replies:
    2
    Views:
    145
    Anno Siegel
    Dec 29, 2004
  5. Akim Demaille

    Using a backref for arity

    Akim Demaille, Sep 29, 2010, in forum: Perl Misc
    Replies:
    4
    Views:
    97
    Ilya Zakharevich
    Sep 30, 2010
Loading...

Share This Page