accentuated character - RE

Discussion in 'Perl Misc' started by nicolas_laurent545@hotmail.com, Apr 19, 2006.

  1. Guest

    Hi

    (\w+) does not see accentuated character such as (é).
    [a-zé] sees accentuated character but the problem is that I have to
    enumerate èîô etc.

    Is there any other method in regular expression to include accentuated
    character so I do not
    need to specify them in advance ?

    Thanks
    , Apr 19, 2006
    #1
    1. Advertising

  2. wrote:
    >
    > (\w+) does not see accentuated character such as (é).
    > [a-zé] sees accentuated character but the problem is that I have to
    > enumerate èîô etc.
    >
    > Is there any other method in regular expression to include accentuated
    > character so I do not
    > need to specify them in advance ?


    Put this line near the top of your program:

    use locale;


    perldoc locale
    perldoc perllocale
    etc.


    John
    --
    use Perl;
    program
    fulfillment
    John W. Krahn, Apr 19, 2006
    #2
    1. Advertising

  3. John W. Krahn wrote:
    > wrote:
    >>(\w+) does not see accentuated character such as (é).
    >>[a-zé] sees accentuated character but the problem is that I have to
    >>enumerate èîô etc.
    >>
    >>Is there any other method in regular expression to include accentuated
    >>character so I do not
    >>need to specify them in advance ?

    >
    > Put this line near the top of your program:
    >
    > use locale;


    Or, possibly better, in the smaller block where that behaviour is desired.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Apr 19, 2006
    #3
  4. Dave Guest

    >
    ><> wrote in message
    > >news:...

    >Hi


    >(\w+) does not see accentuated character such as (é).
    >[a-zé] sees accentuated character but the problem is that I have to
    >enumerate èîô etc.


    >Is there any other method in regular expression to include accentuated
    >character so I do not
    >need to specify them in advance ?


    >Thanks


    You would be better off using (\p{IsAlpha}+). This will get all Alphabetical
    characters.
    See the docs on Unicode.
    Dave, Apr 19, 2006
    #4
  5. Dave wrote:
    > <> wrote in message
    >> (\w+) does not see accentuated character such as (é).
    >> [a-zé] sees accentuated character but the problem is that I have to
    >> enumerate èîô etc.

    >
    >> Is there any other method in regular expression to include accentuated
    >> character so I do not need to specify them in advance ?

    >
    > You would be better off using (\p{IsAlpha}+).


    How can you tell?

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Apr 19, 2006
    #5
  6. Dave Guest

    "Gunnar Hjalmarsson" <> wrote in message
    news:...
    > Dave wrote:
    >> <> wrote in message
    >>> (\w+) does not see accentuated character such as (é). [a-zé] sees
    >>> accentuated character but the problem is that I have to enumerate èîô
    >>> etc.

    >>
    >>> Is there any other method in regular expression to include accentuated
    >>> character so I do not need to specify them in advance ?

    >>
    >> You would be better off using (\p{IsAlpha}+).

    >
    > How can you tell?
    >
    > --
    > Gunnar Hjalmarsson
    > Email: http://www.gunnar.cc/cgi-bin/contact.pl


    Fair point I should have had the word 'probably' in that sentence as from
    the original post (which, as you correctly imply, does not give the OP's
    actual goal) I am assuming he is trying to use (\w+) to capture whole words
    (in a natural language) but is finding that it does not work well for this.
    I should have made my assumption explicit. Thanks for pointing this out.
    (Your suggesting of adding use locale; makes similar assumptions it has to
    be said.)
    Dave, Apr 20, 2006
    #6
  7. Dave wrote:
    > Gunnar Hjalmarsson wrote:
    >>Dave wrote:
    >>><> wrote in message
    >>>>(\w+) does not see accentuated character such as (é). [a-zé] sees
    >>>>accentuated character but the problem is that I have to enumerate èîô
    >>>>etc.
    >>>>
    >>>>Is there any other method in regular expression to include accentuated
    >>>>character so I do not need to specify them in advance ?
    >>>
    >>>You would be better off using (\p{IsAlpha}+).

    >>
    >>How can you tell?

    >
    > Fair point I should have had the word 'probably' in that sentence as from
    > the original post (which, as you correctly imply, does not give the OP's
    > actual goal) I am assuming he is trying to use (\w+) to capture whole words
    > (in a natural language) but is finding that it does not work well for this.
    > I should have made my assumption explicit. Thanks for pointing this out.
    > (Your suggesting of adding use locale; makes similar assumptions it has to
    > be said.)


    Not really. I just meant that we don't really know whether he is
    interested in also matching digits. ;-)

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Apr 20, 2006
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Velvet
    Replies:
    9
    Views:
    14,771
    Joerg Jooss
    Jan 19, 2006
  2. raavi
    Replies:
    2
    Views:
    899
    raavi
    Mar 2, 2006
  3. Une Bévue
    Replies:
    8
    Views:
    184
    Une Bévue
    Dec 8, 2007
  4. Une Bévue
    Replies:
    5
    Views:
    117
    7stud --
    Mar 6, 2008
  5. Jean-baptiste Hétier

    Accentuated function names

    Jean-baptiste Hétier, Dec 15, 2008, in forum: Ruby
    Replies:
    6
    Views:
    108
    Ollivier Robert
    Dec 15, 2008
Loading...

Share This Page