regex with accents

A

albert

Hi,

I can't get the characters with accents in a regex. This is my code :
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
var MyText1 = "éléphant1" ;
var MyText2 = "elephant1" ;
var MyReg = /^[\w]+$/ ;

if(MyReg.test(MyText1))
alert(MyText1 + " is OK") ;
else
alert(MyText1 + " is not valid") ;


if(MyReg.test(MyText2))
alert(MyText2 + " is OK") ;
else
alert(MyText2 + " is not valid") ;
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Here's what I get :
éléphant1 is not valid
elephant1 is OK

I'd like éléphant1 to be OK, but I can't.
Can you help me ?

Thanks in advance,

Albert
 
D

Douglas Crockford

albert said:
I can't get the characters with accents in a regex. This is my code :
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
var MyText1 = "�l�phant1" ;
var MyText2 = "elephant1" ;
var MyReg = /^[\w]+$/ ;
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Here's what I get :
�l�phant1 is not valid
elephant1 is OK

I'd like �l�phant1 to be OK, but I can't.
Can you help me ?

ECMA262 15.10.2.12 defines \w as being equivalent to the character class
[0-1A-za-z_]. The w suggests word, but that is deceptive. Support for
internationalization in JavaScript's RegExp is virtually nonexistent.

You need to define your own character class.

http://javascript.crockford.com/
 
A

albert

ECMA262 15.10.2.12 defines \w as being equivalent to the character class
[0-1A-za-z_]. The w suggests word, but that is deceptive. Support for
internationalization in JavaScript's RegExp is virtually nonexistent.

You need to define your own character class.

How can I do so ?


albert
 
E

Evertjan.

albert wrote on 22 sep 2007 in comp.lang.javascript:
ECMA262 15.10.2.12 defines \w as being equivalent to the character
class [0-1A-za-z_]. The w suggests word, but that is deceptive.
Support for internationalization in JavaScript's RegExp is virtually
nonexistent.

You need to define your own character class.

How can I do so ?

var MyReg = /^[\wáéíóäëiöúàèììù]+$/i;

Depending on your local requirements.
 
A

albert

var MyReg = /^[\wáéíóäëiöúàèììù]+$/i;
Depending on your local requirements.

I've got french... that's no pb.
But I also have arabic & hebrew, this is more difficult.


albert
 
E

Evertjan.

albert wrote on 22 sep 2007 in comp.lang.javascript:
var MyReg = /^[\wáéíóäëiöúàèììù]+$/i;

Depending on your local requirements.

[please do not quote signatures on usenet. removed]
I've got french... that's no pb.

pb? [please no sms-language on usenet]
But I also have arabic & hebrew, this is more difficult.

Why should it be easy?

Javascript accommodates unicode.
 
D

Dr J R Stockton

In comp.lang.javascript message <[email protected]
digy.net>, Sat, 22 Sep 2007 13:44:18, Douglas Crockford
ECMA262 15.10.2.12 defines \w as being equivalent to the character
class [0-1A-za-z_]. The w suggests word, but that is deceptive. Support
for internationalization in JavaScript's RegExp is virtually
nonexistent.

<URL:http://www.merlyn.demon.co.uk/humourous.htm#FredHoyle> advises <G>
:-
Fred Hoyle (1915-2001) :-
"'Dam’ good idea. Always force foreigner to learn English.'"
Alexis Ivan Alexandrov, in "The Black Cloud", Chap. 10, para 4.
 
A

albert

I've got french... that's no pb.
pb? [please no sms-language on usenet]

pb = problem (sorry, I thought it was obvious).
Why should it be easy?

I've never said it should be easy. Don't waste time to answer here...
Javascript accommodates unicode.

Well I tried a simple word in Arabic with the following regex :

^[\w]+$

still, the "test" function always returned false. Do you have any good
working example about it ?


thx, oops, soory I meant "Thanks" ;-)


albert
 
E

Evertjan.

albert wrote on 23 sep 2007 in comp.lang.javascript:
I've got french... that's no pb.

pb? [please no sms-language on usenet]

pb = problem (sorry, I thought it was obvious).

Not to me. Usenet has it's own limited set of abbreviations.
If any Pb perhaps would be lead.
I've never said it should be easy. Don't waste time to answer here...

You are the OP, so ...
Javascript accommodates unicode.

Well I tried a simple word in Arabic with the following regex :

^[\w]+$

Would you allow for figures 0-9?
Otherwise this is better for simple Latin chars:

/^[a-z]+$/i
still, the "test" function always returned false.

I showed you how to do that with accents,
did you understand the regex?

Why would Arabic characters match
where accented characters do not?
Do you have any good
working example about it ?

I am not into working examples, but will gve you a hint.

Arabic should work the same as accented ones:

/^[a-z\u0600-\u06ff]+$/

[http://unicode.org/charts/PDF/U0600.pdf]

Not knowing Arabic I cannot test that.
 
A

albert

You are the OP, so ...

Now it's my turn :)
What does OP mean ?
Well I tried a simple word in Arabic with the following regex :

^[\w]+$

Would you allow for figures 0-9?
Yes

Otherwise this is better for simple Latin chars:

/^[a-z]+$/i
still, the "test" function always returned false.

I showed you how to do that with accents,
did you understand the regex?
Yes


Why would Arabic characters match
where accented characters do not?

You're right.
Do you have any good
working example about it ?

I am not into working examples, but will gve you a hint.

Arabic should work the same as accented ones:

/^[a-z\u0600-\u06ff]+$/

[http://unicode.org/charts/PDF/U0600.pdf]

Not knowing Arabic I cannot test that.

I tested. It works :)

Thank you for your help !


albert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,268
Latest member
AshliMacin

Latest Threads

Top