Regular expression for matching words containing underscore _character

Discussion in 'Perl Misc' started by Raj, Dec 12, 2007.

  1. Raj

    Raj Guest

    I have large text passages containing names of database tables,
    procedures, packages, variables etc having the underscore character as
    a part of the name. eg. rsp_names_friends_master. I tried "\b[a-zA-
    Z0-9_]+\b" but it matches all words in the passage.

    Thanks in advance for the help.

    Raj
     
    Raj, Dec 12, 2007
    #1
    1. Advertising

  2. Raj wrote:
    > I have large text passages containing names of database tables,
    > procedures, packages, variables etc having the underscore character as
    > a part of the name. eg. rsp_names_friends_master. I tried "\b[a-zA-
    > Z0-9_]+\b" but it matches all words in the passage.


    Similarly "[ab]+" matches "aaa" and "aa" though neither contain "b".

    Try "\b[a-zA-Z0-9]+_[a-zA-Z0-9_]+\b"

    Or "\b\w+_\w+\b"
     
    RedGrittyBrick, Dec 12, 2007
    #2
    1. Advertising

  3. Re: Regular expression for matching words containing underscore _ character

    RedGrittyBrick <> wrote:
    > Raj wrote:
    >> I have large text passages containing names of database tables,
    >> procedures, packages, variables etc having the underscore character as
    >> a part of the name. eg. rsp_names_friends_master. I tried "\b[a-zA-
    >> Z0-9_]+\b" but it matches all words in the passage.

    >
    > Similarly "[ab]+" matches "aaa" and "aa" though neither contain "b".
    >
    > Try "\b[a-zA-Z0-9]+_[a-zA-Z0-9_]+\b"
    >
    > Or "\b\w+_\w+\b"



    Three (six?) useless uses of word boundary in the quotes above...

    Every pattern there will behave identically without any \b's.


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
     
    Tad J McClellan, Dec 13, 2007
    #3
  4. Raj

    Raj Guest

    On Dec 12, 8:47 pm, RedGrittyBrick <>
    wrote:
    > Raj wrote:
    > > I have large text passages containing names of database tables,
    > > procedures, packages, variables etc having the underscore character as
    > > a part of the name. eg. rsp_names_friends_master. I tried "\b[a-zA-
    > > Z0-9_]+\b" but it matches all words in the passage.

    >
    > Similarly "[ab]+" matches "aaa" and "aa" though neither contain "b".
    >
    > Try "\b[a-zA-Z0-9]+_[a-zA-Z0-9_]+\b"
    >
    > Or "\b\w+_\w+\b"


    Thanks. It worked.
    Raj
     
    Raj, Dec 13, 2007
    #4
  5. On Dec 12, 4:27 pm, Raj <> wrote:
    > I have large text passages containing names of database tables,
    > procedures, packages, variables etc having the underscore character as
    > a part of the name. eg. rsp_names_friends_master. I tried "\b[a-zA-
    > Z0-9_]+\b" but it matches all words in the passage.
    >
    > Thanks in advance for the help.
    >
    > Raj


    I would use this to merly find lines wich contain what you search
    /\p{IsAlnum}+_)+\p{IsAlnum}+/

    I would use this to get the words you search into an array
    /((?:\p{IsAlnum}+_)+\p{IsAlnum}+)/g

    Example:
    perl -ne 'print @a if (@a = /((?:\p{IsAlnum}+_)+\p{IsAlnum}+)/g)' <<<
    'yyy_yyy saf_;fasl asfd ; xxx_xxx'

    Greetings

    Flo
     
    Florian Kaufmann, Dec 13, 2007
    #5
  6. Tad J McClellan wrote:
    > RedGrittyBrick <> wrote:
    >> Raj wrote:
    >>> I have large text passages containing names of database tables,
    >>> procedures, packages, variables etc having the underscore character as
    >>> a part of the name. eg. rsp_names_friends_master. I tried "\b[a-zA-
    >>> Z0-9_]+\b" but it matches all words in the passage.

    >> Similarly "[ab]+" matches "aaa" and "aa" though neither contain "b".
    >>
    >> Try "\b[a-zA-Z0-9]+_[a-zA-Z0-9_]+\b"
    >>
    >> Or "\b\w+_\w+\b"

    >
    >
    > Three (six?) useless uses of word boundary in the quotes above...
    >
    > Every pattern there will behave identically without any \b's.
    >
    >


    TFTC

    $ perl -e 'print "$_\n" for "_aa-bbb.cc_[d_d]" =~ /\w+/g'
    _aa
    bbb
    cc_
    d_d

    $ perl -e 'print "$_\n" for "_aa-bbb.cc_[d_d]" =~ /\w+_\w+/g'
    d_d

    In Perl programs I've written, I don't think I've ever used \b. Perhaps
    I should have analyzed the OP's RE completely rather than only
    commenting on the primary reason for the problem.
     
    RedGrittyBrick, Dec 13, 2007
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,304
  2. =?iso-8859-1?B?bW9vcJk=?=

    Matching abitrary expression in a regular expression

    =?iso-8859-1?B?bW9vcJk=?=, Dec 1, 2005, in forum: Java
    Replies:
    8
    Views:
    851
    Alan Moore
    Dec 2, 2005
  3. Jens Lechtenboerger
    Replies:
    1
    Views:
    618
    Jens Lechtenboerger
    May 13, 2011
  4. rahul
    Replies:
    12
    Views:
    229
    Gunnar Hjalmarsson
    May 12, 2005
  5. Dylan Nicholson
    Replies:
    6
    Views:
    377
    A. Sinan Unur
    Oct 19, 2007
Loading...

Share This Page