FAQ 6.18 Why don't word-boundary searches with "\b" work for me?

Discussion in 'Perl Misc' started by PerlFAQ Server, Jan 31, 2011.

  1. This is an excerpt from the latest version perlfaq6.pod, which
    comes with the standard Perl distribution. These postings aim to
    reduce the number of repeated questions as well as allow the community
    to review and update the answers. The latest version of the complete
    perlfaq is at http://faq.perl.org .


    6.18: Why don't word-boundary searches with "\b" work for me?

    (contributed by brian d foy)

    Ensure that you know what \b really does: it's the boundary between a
    word character, \w, and something that isn't a word character. That
    thing that isn't a word character might be \W, but it can also be the
    start or end of the string.

    It's not (not!) the boundary between whitespace and non-whitespace, and
    it's not the stuff between words we use to create sentences.

    In regex speak, a word boundary (\b) is a "zero width assertion",
    meaning that it doesn't represent a character in the string, but a
    condition at a certain position.

    For the regular expression, /\bPerl\b/, there has to be a word boundary
    before the "P" and after the "l". As long as something other than a word
    character precedes the "P" and succeeds the "l", the pattern will match.
    These strings match /\bPerl\b/.

    "Perl" # no word char before P or after l
    "Perl " # same as previous (space is not a word char)
    "'Perl'" # the ' char is not a word char
    "Perl's" # no word char before P, non-word char after "l"

    These strings do not match /\bPerl\b/.

    "Perl_" # _ is a word char!
    "Perler" # no word char before P, but one after l

    You don't have to use \b to match words though. You can look for
    non-word characters surrounded by word characters. These strings match
    the pattern /\b'\b/.

    "don't" # the ' char is surrounded by "n" and "t"
    "qep'a'" # the ' char is surrounded by "p" and "a"

    These strings do not match /\b'\b/.

    "foo'" # there is no word char after non-word '

    You can also use the complement of \b, \B, to specify that there should
    not be a word boundary.

    In the pattern /\Bam\B/, there must be a word character before the "a"
    and after the "m". These patterns match /\Bam\B/:

    "llama" # "am" surrounded by word chars
    "Samuel" # same

    These strings do not match /\Bam\B/

    "Sam" # no word boundary before "a", but one after "m"
    "I am Sam" # "am" surrounded by non-word chars


    The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
    are not necessarily experts in every domain where Perl might show up,
    so please include as much information as possible and relevant in any
    corrections. The perlfaq-workers also don't have access to every
    operating system or platform, so please include relevant details for
    corrections to examples that do not work on particular platforms.
    Working code is greatly appreciated.

    If you'd like to help maintain the perlfaq, see the details in
    PerlFAQ Server, Jan 31, 2011
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mr. SweatyFinger
    Smokey Grindel
    Dec 2, 2006
  2. Singleton

    what is word boundary?

    Singleton, Sep 18, 2005, in forum: C++
    Andrew Koenig
    Sep 19, 2005
  3. Peng Yu

    How to match word boundary?

    Peng Yu, Jul 22, 2008, in forum: Python
    Fredrik Lundh
    Jul 22, 2008
  4. arun
  5. PerlFAQ Server
    PerlFAQ Server
    Apr 24, 2011

Share This Page