Matching neighbouring words of a pattern using Regex

Discussion in 'Perl Misc' started by CV, Aug 30, 2004.

  1. CV

    CV Guest

    How can I match 'n' number of neighbouring words of a pattern using regular
    expressions?

    For example, suppose I am looking for the pattern "length xyz cm" in some
    text. where xyz is a number - integer or fraction or decimal point. How can
    I also grab about 3-5 words on either side of the pattern "length xyz cm"?
    The surrounding words are not always constant & may be variable. Also, the
    original text to be matched is not just a single sentence, but lines from a
    file concatenated together - so the text has many newline characters too. I
    only want the words on the same line as the pattern.

    I have tried using regex of the form
    /\b(\w*)\b(\w*)\b(\w*)\b($pattern)\b(\w*)\b(\w*)\b(\w*), but this doesn't
    work for some reason. Could someone please offer some suggestions?

    thanks!
    CV, Aug 30, 2004
    #1
    1. Advertising

  2. [ Reply not posted to the defunct group comp.lang.perl ]

    CV wrote:
    > How can I match 'n' number of neighbouring words of a pattern using
    > regular expressions?
    >
    > For example, suppose I am looking for the pattern "length xyz cm"
    > in some text. where xyz is a number - integer or fraction or
    > decimal point. How can I also grab about 3-5 words on either side
    > of the pattern "length xyz cm"? The surrounding words are not
    > always constant & may be variable. Also, the original text to be
    > matched is not just a single sentence, but lines from a file
    > concatenated together - so the text has many newline characters
    > too. I only want the words on the same line as the pattern.
    >
    > I have tried using regex of the form
    > /\b(\w*)\b(\w*)\b(\w*)\b($pattern)\b(\w*)\b(\w*)\b(\w*), but this
    > doesn't work for some reason.


    It doesn't work for several reasons, such as:

    - No space characters.
    - '\w*\b\w*' is an impossible combination that can never match (check
    out the description of \b in "perldoc perlre" to learn why).
    - The \w character class does not include e.g. the '$' character,
    while you mentioned that a "word" may be a variable.

    > Could someone please offer some suggestions?


    Try something like this:

    /((?:\S+ +){0,3})\b($pattern)\b((?: +\S+){0,3})/

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Aug 30, 2004
    #2
    1. Advertising

  3. Gunnar Hjalmarsson <> wrote:

    > - '\w*\b\w*' is an impossible combination that can never match



    It will match any string with at least one \w character in it:

    $_ = 'hi';
    print "matched '$&'\n" if /\w*\b\w*/;


    > (check
    > out the description of \b in "perldoc perlre" to learn why).



    Check out this part too:

    ... counting the imaginary characters off the
    beginning and end of the string as matching a \W

    :)


    \W could be the beginning of string in the OP's regex.


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Aug 31, 2004
    #3
  4. In article <>,
    CV <> wrote:
    >How can I match 'n' number of neighbouring words of a pattern using regular
    >expressions?
    >
    >For example, suppose I am looking for the pattern "length xyz cm" in some
    >text. where xyz is a number - integer or fraction or decimal point. How can
    >I also grab about 3-5 words on either side of the pattern "length xyz cm"?
    >The surrounding words are not always constant & may be variable. Also, the
    >original text to be matched is not just a single sentence, but lines from a
    >file concatenated together - so the text has many newline characters too. I
    >only want the words on the same line as the pattern.
    >
    >I have tried using regex of the form
    >/\b(\w*)\b(\w*)\b(\w*)\b($pattern)\b(\w*)\b(\w*)\b(\w*), but this doesn't
    >work for some reason. Could someone please offer some suggestions?
    >


    You may be confused about the \b assertion. Did you intend for
    something with \w and \W..? Also, what if the pattern falls
    at the beginning or end of the line... do you want to capture
    the patterns that may not have 3-5 surrounding words?

    One possibility presuming you intend to capture 3-5 surrounding
    words:


    my $text = "...";
    my $pattern = 'length ... cm ';

    my $words = '(?:\w+[^\w\n]+){3,5}';
    #my $words = '(?:\w+[^\w\n]+){0,5}'; # to catch every pattern

    print $1 while /($words$pattern$words)/g;


    [ Note the 3-5 surrounding words may consume another
    adjacent $pattern instance but you don't specify what
    to do in that case. }


    hth,
    --
    Charles DeRykus
    Charles DeRykus, Aug 31, 2004
    #4
  5. Tad McClellan wrote:
    > Gunnar Hjalmarsson wrote:
    >>
    >> - '\w*\b\w*' is an impossible combination that can never match

    >
    > It will match any string with at least one \w character in it:
    >
    > $_ = 'hi';
    > print "matched '$&'\n" if /\w*\b\w*/;
    >
    >> (check
    >> out the description of \b in "perldoc perlre" to learn why).

    >
    > Check out this part too:
    >
    > ... counting the imaginary characters off the
    > beginning and end of the string as matching a \W
    >
    > :)
    >
    > \W could be the beginning of string in the OP's regex.


    Thanks, Tad, I stand corrected (even if it doesn't do what the OP
    wanted it to do...).

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Aug 31, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. CV
    Replies:
    2
    Views:
    582
    Charles DeRykus
    Aug 31, 2004
  2. Xah Lee
    Replies:
    1
    Views:
    927
    Ilias Lazaridis
    Sep 22, 2006
  3. Xah Lee
    Replies:
    8
    Views:
    454
    Ilias Lazaridis
    Sep 26, 2006
  4. Xah Lee
    Replies:
    2
    Views:
    209
    Xah Lee
    Sep 25, 2006
  5. Replies:
    2
    Views:
    381
Loading...

Share This Page