Checking that user has entered a word or words in text input form using regular expressions...

Discussion in 'Javascript' started by Luke Matuszewski, Apr 18, 2006.

  1. Hi !

    I have faced the problem of checking that the user has entered the
    unicode letter (not only ASCII set of letters...). It seems that
    ECMAScript 3rd regular expressions do not include posix character
    classes like:

    \p{L}

    , which above stands for Unicode letter. Maybe someone has done it ?
    (thru negating other known and defined character classes in RegExp
    object).

    Please help.

    Best regards
    Luke M.
     
    Luke Matuszewski, Apr 18, 2006
    #1
    1. Advertising

  2. Luke Matuszewski

    Hal Rosser Guest

    "Luke Matuszewski" <> wrote in message
    news:...
    > Hi !
    >
    > I have faced the problem of checking that the user has entered the
    > unicode letter (not only ASCII set of letters...). It seems that
    > ECMAScript 3rd regular expressions do not include posix character
    > classes like:
    >
    > \p{L}
    >
    > , which above stands for Unicode letter. Maybe someone has done it ?
    > (thru negating other known and defined character classes in RegExp
    > object).
    >
    > Please help.
    >
    > Best regards
    > Luke M.
    >

    How about if the value.length is > 0 ?
    anything they could paste or type would be covered.
     
    Hal Rosser, Apr 18, 2006
    #2
    1. Advertising

  3. Re: Checking that user has entered a word or words in text inputform using regular expressions...

    "Luke Matuszewski" <> writes:

    > I have faced the problem of checking that the user has entered the
    > unicode letter (not only ASCII set of letters...).


    As you say, RegExp's are not helping. Nor is there the more direct
    approach which would be String.prototype.isAlpha. I'm afraid you will
    have to do it yourself.

    As a shorthand, you can try testing whether
    string.toLowerCase() != string.toUpperCase()
    but I bet there are letters with only one case.

    You could also consider why it's so important that only letters
    are entered. After all, there are some pretty weird letters out
    there, where a normal digit would look much nicer.

    Youmight also consider whether "letter" is the correct description,
    or if what you want is what the Unicode specification calls "Alphabetic".
    See <URL:http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt>
    (You can also see why it's something of a mouthful to create a regexp
    for it :)

    /L
    --
    Lasse Reichstein Nielsen -
    DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
    'Faith without judgement merely degrades the spirit divine.'
     
    Lasse Reichstein Nielsen, Apr 18, 2006
    #3
  4. Luke Matuszewski

    RobG Guest

    Re: Checking that user has entered a word or words in text inputform using regular expressions...

    Lasse Reichstein Nielsen said on 18/04/2006 4:06 PM AEST:
    > "Luke Matuszewski" <> writes:
    >
    >
    >> I have faced the problem of checking that the user has entered the
    >>unicode letter (not only ASCII set of letters...).

    >
    >
    > As you say, RegExp's are not helping. Nor is there the more direct
    > approach which would be String.prototype.isAlpha. I'm afraid you will
    > have to do it yourself.


    [...]

    > Youmight also consider whether "letter" is the correct description,


    The phrase 'the letter' has me confused. You seem to have interpreted
    it as 'a letter', which may well be what the OP meant.


    > or if what you want is what the Unicode specification calls "Alphabetic".
    > See <URL:http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt>
    > (You can also see why it's something of a mouthful to create a regexp
    > for it :)


    If that is the requirement, why not:

    if ( !/\d/.test(inputValue) )
    {
    // inputValue doesn't have any digits
    }


    --
    Rob
    Group FAQ: <URL:http://www.jibbering.com/FAQ>
     
    RobG, Apr 18, 2006
    #4
  5. Re: Checking that user has entered a word or words in text inputform using regular expressions...

    RobG <> writes:

    > Lasse Reichstein Nielsen said on 18/04/2006 4:06 PM AEST:
    >> or if what you want is what the Unicode specification calls "Alphabetic".
    >> See <URL:http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt>
    >> (You can also see why it's something of a mouthful to create a regexp
    >> for it :)

    >
    > If that is the requirement, why not:
    >
    > if ( !/\d/.test(inputValue) )
    > {
    > // inputValue doesn't have any digits


    Because there's more (much more) to Unicode than letters and digits.
    In the file linked, the Grapheme_Base and Math groups contains symbols
    that are neither digit nor letter. Take, e.g., codepoint 0x3251:
    "circled numer twenty one", or 0x4dc0 "Hexagram for the creative
    heaven". :)

    /L
    --
    Lasse Reichstein Nielsen -
    DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
    'Faith without judgement merely degrades the spirit divine.'
     
    Lasse Reichstein Nielsen, Apr 18, 2006
    #5
  6. Lasse Reichstein Nielsen wrote:
    > RobG <> writes:
    >
    > > Lasse Reichstein Nielsen said on 18/04/2006 4:06 PM AEST:
    > >> or if what you want is what the Unicode specification calls "Alphabetic".
    > >> See <URL:http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt>
    > >> (You can also see why it's something of a mouthful to create a regexp
    > >> for it :)

    > >
    > > If that is the requirement, why not:
    > >
    > > if ( !/\d/.test(inputValue) )
    > > {
    > > // inputValue doesn't have any digits

    >
    > Because there's more (much more) to Unicode than letters and digits.
    > In the file linked, the Grapheme_Base and Math groups contains symbols
    > that are neither digit nor letter. Take, e.g., codepoint 0x3251:
    > "circled numer twenty one", or 0x4dc0 "Hexagram for the creative
    > heaven". :)
    >


    There are Unicode letters and Unicode blocks (like InMongolian). For
    better understanding what i really mean please read "Unicode support"
    paragraph in the followin URL:

    <URL:http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html>

    (see also: http://www.unicode.org/unicode/reports/tr18/ ).

    I did not checked the ECMAScript 4 proposal/standard track, but they
    should 'upgrade' regular expressions to support Classes for Unicode
    blocks and categories.

    Best regards
    Luke M.
     
    Luke Matuszewski, Apr 18, 2006
    #6
  7. Lasse Reichstein Nielsen wrote:

    > "Luke Matuszewski" <> writes:
    >> I have faced the problem of checking that the user has entered the
    >> unicode letter (not only ASCII set of letters...).

    >
    > As you say, RegExp's are not helping.


    But they are. It is just a matter of how complex the RegExp should/can be.

    > Nor is there the more direct approach which would be
    > String.prototype.isAlpha. I'm afraid you will
    > have to do it yourself.


    But one does not have to reinvent the wheel completely, and can use the
    definition for name characters in XML specifications[1], for example,
    instead. That also works for identifiers, BTW.


    PointedEars
    ___________
    [1] <URL:http://www.w3.org/XML/Core/#Publications>
     
    Thomas 'PointedEars' Lahn, Apr 18, 2006
    #7
  8. JRS: In article <>,
    dated Mon, 17 Apr 2006 16:41:43 remote, seen in
    news:comp.lang.javascript, Luke Matuszewski
    <> posted :
    >
    > I have faced the problem of checking that the user has entered the
    >unicode letter (not only ASCII set of letters...).


    More generally, ISTM that it would be useful to extend RegExp notation.

    \w is really a misnomer, since it refers to more than the general
    "English word" characters A-Z; \i for Identifier would have been better.

    \z appears to be free (so is \l; but that looks like \1), and could be
    used to mean "letter of the current language".

    It could be preset by default to A-Z or browser preference, reset by any
    recognised language indication among the page headers, and resettable by
    giving a country code or some form of expression ( [A-Z]-[AEIOU] meaning
    the consonants, for example) ; it should be saveable in a variable.

    --
    © John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 IE 4 ©
    <URL:http://www.jibbering.com/faq/> JL/RC: FAQ of news:comp.lang.javascript
    <URL:http://www.merlyn.demon.co.uk/js-index.htm> jscr maths, dates, sources.
    <URL:http://www.merlyn.demon.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.
     
    Dr John Stockton, Apr 19, 2006
    #8
  9. Dr John Stockton wrote:
    >
    > \z appears to be free (so is \l; but that looks like \1), and could be
    > used to mean "letter of the current language".
    >


    Also the Perl \p{prop} and \P{prop} notation could be included.

    <quote
    url="http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html#ubc">

    \p{prop} matches if the input has the property prop, while \P{prop}
    does not match if the input has that property. Blocks are specified
    with the prefix In, as in InMongolian. Categories may be specified with
    the optional prefix Is: Both \p{L} and \p{IsL} denote the category of
    Unicode letters. Blocks and categories can be used both inside and
    outside of a character class.

    </quote>

    BR
    Luke M.
     
    Luke Matuszewski, Apr 22, 2006
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. TN Bella
    Replies:
    1
    Views:
    1,502
    TN Bella
    Jul 1, 2004
  2. wk6pack
    Replies:
    3
    Views:
    134
    Bullschmidt
    Jul 5, 2005
  3. Sudhakar Doddi
    Replies:
    1
    Views:
    157
    Chris Riesbeck
    Aug 22, 2003
  4. Greg
    Replies:
    9
    Views:
    124
    Anno Siegel
    Apr 17, 2005
  5. Noman Shapiro
    Replies:
    0
    Views:
    235
    Noman Shapiro
    Jul 17, 2013
Loading...

Share This Page