How to determine if a word has an extended character?

Discussion in 'Perl Misc' started by ambarish.mitra@gmail.com, May 20, 2008.

  1. Guest

    I have a file which contains just one word. My task is just to find
    out if the word has any extended character. Thats all.

    I can use regex, but am not able to find out a regex pattern for
    extended character. Any hints?


    For example, if the file content is: sample, then the Perl code prints
    false; and if the file content is samplé, then the Perl code prints
    true.

    Thanks.
    , May 20, 2008
    #1
    1. Advertising

  2. wrote:
    >I have a file which contains just one word. My task is just to find
    >out if the word has any extended character. Thats all.
    >
    >I can use regex, but am not able to find out a regex pattern for
    >extended character. Any hints?


    [Interpreting 'extended' as non-ASCII]

    You could simply use the POSIX character class [:ASCII:]

    Another way would be to check for each character, if its ord() is less
    than 128. That should work at least for the most common encodings like
    ISO-Latin-1, Windows-1252, ...

    Or: [untested]
    if (/^[A-Za-z]*$/) {
    print 'false';
    } else {
    print 'true';
    }

    You could probably also set your locale to EN-US and use
    if (/\W/) {
    print 'true';
    } else {
    print 'false';
    }

    All of these do somewhat different things, so you have some options to
    choose the one that most closely matches your needs.

    jue
    Jürgen Exner, May 20, 2008
    #2
    1. Advertising

  3. In <<>>
    schrieb ...
    > I have a file which contains just one word. My task is just to find
    > out if the word has any extended character. Thats all.
    >
    > I can use regex, but am not able to find out a regex pattern for
    > extended character. Any hints?
    >
    >
    > For example, if the file content is: sample, then the Perl code prints
    > false; and if the file content is samplé, then the Perl code prints
    > true.



    $string =~ m/[^\w]/ ? print "\nhas extended." : print "\nOK.";

    should do the trick.

    This prints "has extended" if $string contains any characters other
    ([^...]) then 'a' to 'z', 'A' to 'Z', '0' to '9' plus '_' (the \w
    character class).

    If you want to exclude the '_' (contained in \w), use [^a-zA-Z0-9]
    If you want to include more "valid" characters, expand the [^...]
    accordingly (note: if you want to inlcude '-' as valid character, put it
    at the very end of the characters list).

    See
    perldoc perlre
    perldoc perlrequick
    perldoc perlreref
    perldoc perlretut



    hth, Hartmut

    --
    ------------------------------------------------
    Hartmut Camphausen h.camp[bei]textix[punkt]de
    Hartmut Camphausen, May 20, 2008
    #3
  4. Hartmut Camphausen wrote:
    > In <<>>
    > schrieb ...
    >> I have a file which contains just one word. My task is just to find
    >> out if the word has any extended character. Thats all.
    >>
    >> I can use regex, but am not able to find out a regex pattern for
    >> extended character. Any hints?
    >>
    >>
    >> For example, if the file content is: sample, then the Perl code prints
    >> false; and if the file content is samplé, then the Perl code prints
    >> true.

    >
    >
    > $string =~ m/[^\w]/ ? print "\nhas extended." : print "\nOK.";


    [^\w] is usually written as \W.


    > should do the trick.
    >
    > This prints "has extended" if $string contains any characters other
    > ([^...]) then 'a' to 'z', 'A' to 'Z', '0' to '9' plus '_' (the \w
    > character class).


    From perlre.pod:

    <QUOTE>
    If "use locale" is in effect, the list of alphabetic characters
    generated by "\w" is taken from the current locale. See perllocale.
    </QUOTE>

    In other words, if your locale supports it then 'é' will be included in\w.


    > If you want to exclude the '_' (contained in \w), use [^a-zA-Z0-9]


    [^a-zA-Z0-9] means any character that is *not* alphanumeric. You
    probably meant [a-zA-Z0-9].



    John
    --
    Perl isn't a toolbox, but a small machine shop where you
    can special-order certain sorts of tools at low cost and
    in short order. -- Larry Wall
    John W. Krahn, May 21, 2008
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Rob Nicholson
    Replies:
    12
    Views:
    801
    Edwin Knoppert
    Dec 6, 2005
  2. engineer
    Replies:
    1
    Views:
    314
    Boudewijn Dijkstra
    Jul 11, 2005
  3. Guest
    Replies:
    1
    Views:
    804
    Catalin Pitis
    Oct 21, 2004
  4. Guest
    Replies:
    1
    Views:
    471
    Ron Natalie
    Oct 21, 2004
  5. Simon Harris

    Extended Character Set - CDONTS

    Simon Harris, Feb 9, 2004, in forum: ASP General
    Replies:
    0
    Views:
    128
    Simon Harris
    Feb 9, 2004
Loading...

Share This Page