European Language Support - Regular Expressions

P

pramodx

Has anyone ever faced issues with Regex Validation for European
Language Support

Scenario: I was to validate a name of a person which should not have
special characters BUT should accept german umlauts, french
circumflex, carets and other such characters which form part of valid
European names. We are supporting French, German, Dutch, Portugese,
Italian, Spanish and somewhere in the future, Russian as well.

Any pointers would be very helpful

Thanks in advance,
Pramod
 
M

Martin Honnen

pramodx said:
Has anyone ever faced issues with Regex Validation for European
Language Support

Scenario: I was to validate a name of a person which should not have
special characters BUT should accept german umlauts, french
circumflex, carets and other such characters which form part of valid
European names. We are supporting French, German, Dutch, Portugese,
Italian, Spanish and somewhere in the future, Russian as well.

Any pointers would be very helpful

With the regular expression language supported in JavaScript/ECMAScript
you can't use \w alone, you will need to use character ranges (by the
Unicode e.g. \u00C0-\u00D6) that list the characters you want to allow.
 
B

Bart Van der Donck

pramodx said:
Has anyone ever faced issues with Regex Validation for European
Language Support

Scenario: I was to validate a name of a person which should not have
special characters BUT should accept german umlauts, french
circumflex, carets and other such characters which form part of valid
European names. We are supporting French, German, Dutch, Portugese,
Italian, Spanish and somewhere in the future, Russian as well.

<input name="txt" id="txt">

<input type="button" value="Check European"
onClick="alert(check_EU(document.getElementById('txt').value))">
<input type="button" value="Check European + Russian"
onClick="alert(check_ER(document.getElementById('txt').value))">

<script type="text/javascript">

function check_EU (s) {
for (var i = 0; i < s.length; ++i) {
if (s.charCodeAt(i) > 255) // [1]
return 'not okay'
}
return 'okay'
}

function check_ER (s) {
for (var i = 0; i < s.length; ++i) {
if (s.charCodeAt(i) > 255 &&
(s.charCodeAt(i) < 1024 || s.charCodeAt(i) > 1280)) // [2]
return 'not okay'
}
return 'okay'
}

</script>

[1] http://en.wikipedia.org/wiki/ISO_8859-1
[2] http://unicode.org/charts/PDF/U0400.pdf (\u0400-\u04ff)

Hope this helps,
 
T

Thomas 'PointedEars' Lahn

Aren't those two the same character?
With the regular expression language supported in JavaScript/ECMAScript
you can't use \w alone, you will need to use character ranges (by the
Unicode e.g. \u00C0-\u00D6) that list the characters you want to allow.

However, in a Unicode-safe implementation, which is required for this,
one does not always need to use the escape sequence for characters are
character ranges if the script source is properly encoded and the
encoding properly declared via the Content-Type header, e.g.

/[\wÀ-Öäëïöüâêîôûáàéèíìóòúù]/

(I can only hope Google Groups gets this right.)


PointedEars
 
D

Dr J R Stockton

In comp.lang.javascript message <3fb98be9-5c88-4d3a-b00f-1dd969adf06c@v5
g2000prm.googlegroups.com>, Wed, 17 Dec 2008 04:19:52, pramodx
Has anyone ever faced issues with Regex Validation for European
Language Support

Scenario: I was to validate a name of a person which should not have
special characters BUT should accept german umlauts, french
circumflex, carets and other such characters which form part of valid
European names. We are supporting French, German, Dutch, Portugese,
Italian, Spanish and somewhere in the future, Russian as well.

Names used in Europe (and elsewhere) are not necessarily words in
European languages. Attempting to validate a name, except by
comparison, is presumptuous. You can ask for it to be entered more than
once, or compare it with a database of existing customers.

AFAICS, Unicode 0000 to 00FF is not even sufficient to represent the
proper names of all UK public utilities - and if you get it wrong, the
Heddlu might be after you for unlawful discrimination.
 
J

Jonas Raoni Soares Silva

Scenario: I was to validate a name of a person which should not have
special characters BUT should accept german umlauts, french
circumflex [...]

I strongly discourage you from validating names.

1. I don't see reason to validate something as "opened" as a name.
2. If you're "France", it doesn't mean that all people have french
names, so if I get a name with not allowed characters, I won't be able
to register myself correctly.

Obs: Pity that language isn't like Math, if all people spoke the same
things would be much easier in all the ways.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,077
Latest member
SangMoor21

Latest Threads

Top