Checking that user has entered a word or words in text input form using regular expressions...

L

Luke Matuszewski

Hi !

I have faced the problem of checking that the user has entered the
unicode letter (not only ASCII set of letters...). It seems that
ECMAScript 3rd regular expressions do not include posix character
classes like:

\p{L}

, which above stands for Unicode letter. Maybe someone has done it ?
(thru negating other known and defined character classes in RegExp
object).

Please help.

Best regards
Luke M.
 
H

Hal Rosser

Luke Matuszewski said:
Hi !

I have faced the problem of checking that the user has entered the
unicode letter (not only ASCII set of letters...). It seems that
ECMAScript 3rd regular expressions do not include posix character
classes like:

\p{L}

, which above stands for Unicode letter. Maybe someone has done it ?
(thru negating other known and defined character classes in RegExp
object).

Please help.

Best regards
Luke M.
How about if the value.length is > 0 ?
anything they could paste or type would be covered.
 
L

Lasse Reichstein Nielsen

Luke Matuszewski said:
I have faced the problem of checking that the user has entered the
unicode letter (not only ASCII set of letters...).

As you say, RegExp's are not helping. Nor is there the more direct
approach which would be String.prototype.isAlpha. I'm afraid you will
have to do it yourself.

As a shorthand, you can try testing whether
string.toLowerCase() != string.toUpperCase()
but I bet there are letters with only one case.

You could also consider why it's so important that only letters
are entered. After all, there are some pretty weird letters out
there, where a normal digit would look much nicer.

Youmight also consider whether "letter" is the correct description,
or if what you want is what the Unicode specification calls "Alphabetic".
See <URL:http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt>
(You can also see why it's something of a mouthful to create a regexp
for it :)

/L
 
R

RobG

Lasse Reichstein Nielsen said on 18/04/2006 4:06 PM AEST:
I have faced the problem of checking that the user has entered the
unicode letter (not only ASCII set of letters...).


As you say, RegExp's are not helping. Nor is there the more direct
approach which would be String.prototype.isAlpha. I'm afraid you will
have to do it yourself.
[...]

Youmight also consider whether "letter" is the correct description,

The phrase 'the letter' has me confused. You seem to have interpreted
it as 'a letter', which may well be what the OP meant.

or if what you want is what the Unicode specification calls "Alphabetic".
See <URL:http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt>
(You can also see why it's something of a mouthful to create a regexp
for it :)

If that is the requirement, why not:

if ( !/\d/.test(inputValue) )
{
// inputValue doesn't have any digits
}
 
L

Lasse Reichstein Nielsen

RobG said:
Lasse Reichstein Nielsen said on 18/04/2006 4:06 PM AEST:

If that is the requirement, why not:

if ( !/\d/.test(inputValue) )
{
// inputValue doesn't have any digits

Because there's more (much more) to Unicode than letters and digits.
In the file linked, the Grapheme_Base and Math groups contains symbols
that are neither digit nor letter. Take, e.g., codepoint 0x3251:
"circled numer twenty one", or 0x4dc0 "Hexagram for the creative
heaven". :)

/L
 
L

Luke Matuszewski

Lasse said:
Because there's more (much more) to Unicode than letters and digits.
In the file linked, the Grapheme_Base and Math groups contains symbols
that are neither digit nor letter. Take, e.g., codepoint 0x3251:
"circled numer twenty one", or 0x4dc0 "Hexagram for the creative
heaven". :)

There are Unicode letters and Unicode blocks (like InMongolian). For
better understanding what i really mean please read "Unicode support"
paragraph in the followin URL:

<URL:http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html>

(see also: http://www.unicode.org/unicode/reports/tr18/ ).

I did not checked the ECMAScript 4 proposal/standard track, but they
should 'upgrade' regular expressions to support Classes for Unicode
blocks and categories.

Best regards
Luke M.
 
T

Thomas 'PointedEars' Lahn

Lasse said:
As you say, RegExp's are not helping.

But they are. It is just a matter of how complex the RegExp should/can be.
Nor is there the more direct approach which would be
String.prototype.isAlpha. I'm afraid you will
have to do it yourself.

But one does not have to reinvent the wheel completely, and can use the
definition for name characters in XML specifications[1], for example,
instead. That also works for identifiers, BTW.


PointedEars
___________
[1] <URL:http://www.w3.org/XML/Core/#Publications>
 
D

Dr John Stockton

JRS: In article <[email protected]>,
dated Mon, 17 Apr 2006 16:41:43 remote, seen in
Luke Matuszewski
I have faced the problem of checking that the user has entered the
unicode letter (not only ASCII set of letters...).

More generally, ISTM that it would be useful to extend RegExp notation.

\w is really a misnomer, since it refers to more than the general
"English word" characters A-Z; \i for Identifier would have been better.

\z appears to be free (so is \l; but that looks like \1), and could be
used to mean "letter of the current language".

It could be preset by default to A-Z or browser preference, reset by any
recognised language indication among the page headers, and resettable by
giving a country code or some form of expression ( [A-Z]-[AEIOU] meaning
the consonants, for example) ; it should be saveable in a variable.
 
L

Luke Matuszewski

Dr said:
\z appears to be free (so is \l; but that looks like \1), and could be
used to mean "letter of the current language".

Also the Perl \p{prop} and \P{prop} notation could be included.

<quote
url="http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html#ubc">

\p{prop} matches if the input has the property prop, while \P{prop}
does not match if the input has that property. Blocks are specified
with the prefix In, as in InMongolian. Categories may be specified with
the optional prefix Is: Both \p{L} and \p{IsL} denote the category of
Unicode letters. Blocks and categories can be used both inside and
outside of a character class.

</quote>

BR
Luke M.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,900
Latest member
Nell636132

Latest Threads

Top