How to determine if a word has an extended character?

ambarish.mitra · May 20, 2008

I have a file which contains just one word. My task is just to find
out if the word has any extended character. Thats all.

I can use regex, but am not able to find out a regex pattern for
extended character. Any hints?

For example, if the file content is: sample, then the Perl code prints
false; and if the file content is samplé, then the Perl code prints
true.

Thanks.

Jürgen Exner · May 20, 2008

I have a file which contains just one word. My task is just to find
out if the word has any extended character. Thats all.

I can use regex, but am not able to find out a regex pattern for
extended character. Any hints?

[Interpreting 'extended' as non-ASCII]

You could simply use the POSIX character class [:ASCII:]

Another way would be to check for each character, if its ord() is less
than 128. That should work at least for the most common encodings like
ISO-Latin-1, Windows-1252, ...

Or: [untested]
if (/^[A-Za-z]*$/) {
print 'false';
} else {
print 'true';
}

You could probably also set your locale to EN-US and use
if (/\W/) {
print 'true';
} else {
print 'false';
}

All of these do somewhat different things, so you have some options to
choose the one that most closely matches your needs.

jue

Hartmut Camphausen · May 20, 2008

In said:
I have a file which contains just one word. My task is just to find
out if the word has any extended character. Thats all.

I can use regex, but am not able to find out a regex pattern for
extended character. Any hints?

For example, if the file content is: sample, then the Perl code prints
false; and if the file content is samplé, then the Perl code prints
true.

$string =~ m/[^\w]/ ? print "\nhas extended." : print "\nOK.";

should do the trick.

This prints "has extended" if $string contains any characters other
([^...]) then 'a' to 'z', 'A' to 'Z', '0' to '9' plus '_' (the \w
character class).

If you want to exclude the '_' (contained in \w), use [^a-zA-Z0-9]
If you want to include more "valid" characters, expand the [^...]
accordingly (note: if you want to inlcude '-' as valid character, put it
at the very end of the characters list).

See
perldoc perlre
perldoc perlrequick
perldoc perlreref
perldoc perlretut

hth, Hartmut

John W. Krahn · May 21, 2008

Hartmut said:
In said:

I have a file which contains just one word. My task is just to find
out if the word has any extended character. Thats all.

I can use regex, but am not able to find out a regex pattern for
extended character. Any hints?

For example, if the file content is: sample, then the Perl code prints
false; and if the file content is samplé, then the Perl code prints
true.

Click to expand...

$string =~ m/[^\w]/ ? print "\nhas extended." : print "\nOK.";

[^\w] is usually written as \W.

should do the trick.

This prints "has extended" if $string contains any characters other
([^...]) then 'a' to 'z', 'A' to 'Z', '0' to '9' plus '_' (the \w
character class).

From perlre.pod:

<QUOTE>
If "use locale" is in effect, the list of alphabetic characters
generated by "\w" is taken from the current locale. See perllocale.
</QUOTE>

In other words, if your locale supports it then 'é' will be included in\w.

If you want to exclude the '_' (contained in \w), use [^a-zA-Z0-9]

[^a-zA-Z0-9] means any character that is *not* alphanumeric. You
probably meant [a-zA-Z0-9].

John

How can I execute a function ONLY if fetch request returns 404 status?	0	Sep 17, 2022
Extended ASCII character handeling	3	Nov 17, 2010
I have to finish this code for my assignment but I cant figure out how to solve it	1	Jun 27, 2023
How do I save information from an GUI into a XML-file?	0	Aug 17, 2022
Best way to search for a string which has N% in a character class?	5	Mar 2, 2012
How to not load an insanely big dataset in less than 50 hrs	1	Sep 2, 2023
Trying to add text into an editable div that is in an iframe	0	Dec 15, 2022
FAQ 4.73 How do I determine whether a scalar is a number/whole/integer/float?	0	Jan 30, 2011

How to determine if a word has an extended character?

ambarish.mitra

Jürgen Exner

Hartmut Camphausen

John W. Krahn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads