regular expressions

Discussion in 'Java' started by 2rajesh.b@gmail.com, Apr 24, 2006.

  1. Guest

    The password is at least six characters long.

    The password contains characters from at least three of the following
    five categories:
    · English uppercase characters (A - Z)
    · English lowercase characters (a - z)
    · Base 10 digits (0 - 9)
    · Non-alphanumeric (for example: !, $, #, or %)
    · Unicode characters

    can u please help me in writing a regular expression for the above
    condition
    , Apr 24, 2006
    #1
    1. Advertising

  2. Rhino Guest

    If you read this -
    http://java.sun.com/docs/books/tutorial/extra/regex/index.html - you should
    be able to do your own homework.

    If that doesn't work, post your best guess to comp.lang.java.help and tell
    us what parts of the code don't work and someone will probably give you a
    hint about what you need to do differently.

    --
    Rhino

    <> wrote in message
    news:...
    The password is at least six characters long.

    The password contains characters from at least three of the following
    five categories:
    · English uppercase characters (A - Z)
    · English lowercase characters (a - z)
    · Base 10 digits (0 - 9)
    · Non-alphanumeric (for example: !, $, #, or %)
    · Unicode characters

    can u please help me in writing a regular expression for the above
    condition
    Rhino, Apr 24, 2006
    #2
    1. Advertising

  3. "Rhino" <> wrote in message
    news:d743g.437$...
    > If you read this -
    > http://java.sun.com/docs/books/tutorial/extra/regex/index.html - you
    > should be able to do your own homework.


    I doubt that it's homework since there isn't a concise regular
    expression that fits the bill. That would be a pretty goofy
    homework question. Either way, I would suggest that a
    regular expression is not the best way to validate that
    password.
    Larry Barowski, Apr 24, 2006
    #3
  4. Roedy Green Guest

    On 24 Apr 2006 05:50:55 -0700, wrote, quoted or
    indirectly quoted someone who said :

    >The password contains characters from at least three of the following
    >five categories:
    >=B7 English uppercase characters (A - Z)
    >=B7 English lowercase characters (a - z)
    >=B7 Base 10 digits (0 - 9)
    >=B7 Non-alphanumeric (for example: !, $, #, or %)
    >=B7 Unicode characters
    >
    >can u please help me in writing a regular expression for the above
    >condition


    I think it would be easier to solve this with a char loop that with a
    regex. Regexes are about pattern character order. To you, order does
    not matter.

    Proceed something like this:

    invent an Category enum with values
    UPPERCASE LOWERCASE DIGITS PUNCT UNICODE

    Write a method that categorises a char.

    Now your code becomes:

    int possibilities = Category.values().length;
    boolean present = new boolean[ possibilities ];
    for ( int i=0; i<pwlen; i++ )
    {
    char c = pw; /* or pw.charAt(i)*/
    Category cat = Category.categorise( c );
    present[ cat.ordinal() ] = true;
    }

    int cats = 0;
    for (int possibility; possibility<possibilities; possibility++)
    {
    if ( present[ possibility ] cats ++;
    }

    if ( cats >= 3 ) System.out.println( "password sufficiently varied");

    For code to generate random passwords, see
    http://mindprod.com/applets/password.html

    You might find it easier to generate them that test them.
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
    Roedy Green, Apr 24, 2006
    #4
  5. Oliver Wong Guest

    <> wrote in message
    news:...
    > The password is at least six characters long.
    >
    > The password contains characters from at least three of the following
    > five categories:
    > · English uppercase characters (A - Z)
    > · English lowercase characters (a - z)
    > · Base 10 digits (0 - 9)
    > · Non-alphanumeric (for example: !, $, #, or %)
    > · Unicode characters
    >
    > can u please help me in writing a regular expression for the above
    > condition


    All of categories are mutually exclusive except for "Unicode
    characters". And any character that you can get in memory via a program
    written in Java is a "unicode character", so that last category seems pretty
    redundant. Perhaps you mean something like a character within Unicode, but
    outside of ASCII?

    I'm asking you for clarification because it sounds like the above
    requirements were not dreamt up by you, and so you should in turn be asking
    whoever assigned you with this task for clarification.

    - Oliver
    Oliver Wong, Apr 25, 2006
    #5
  6. Roedy Green Guest

    On Tue, 25 Apr 2006 19:08:12 GMT, "Oliver Wong" <>
    wrote, quoted or indirectly quoted someone who said :

    > All of categories are mutually exclusive except for "Unicode
    >characters". And any character that you can get in memory via a program
    >written in Java is a "unicode character", so that last category seems pretty
    >redundant. Perhaps you mean something like a character within Unicode, but
    >outside of ASCII?


    I think that is what he meant, something like &oacute; or &rArr; You
    just want to mix up the categories to foil a simple dictionary search.

    You could do it pretty easily with a giant switch. Unfortunately
    switches don't implement ranges, so you have have to code that
    manually if you don't want to spell it out longhand. default handles
    the unicode. You might add control character category and reject
    such passwords. Putting whitespace on either end of a password is not
    a wise idea.

    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
    Roedy Green, Apr 25, 2006
    #6
  7. Oliver Wong Guest

    "Roedy Green" <> wrote in
    message news:...
    > On Tue, 25 Apr 2006 19:08:12 GMT, "Oliver Wong" <>
    > wrote, quoted or indirectly quoted someone who said :
    >
    >> All of categories are mutually exclusive except for "Unicode
    >>characters". And any character that you can get in memory via a program
    >>written in Java is a "unicode character", so that last category seems
    >>pretty
    >>redundant. Perhaps you mean something like a character within Unicode, but
    >>outside of ASCII?

    >
    > I think that is what he meant, something like &oacute; or &rArr; You
    > just want to mix up the categories to foil a simple dictionary search.
    >
    > You could do it pretty easily with a giant switch. Unfortunately
    > switches don't implement ranges, so you have have to code that
    > manually if you don't want to spell it out longhand.


    To test whether a given unicode character is outside of (or inside of,
    for that matter) ASCII, you could serialize it to ASCII, then re-read the
    ASCII data back into an in-memory Java string, and check if you still have
    the same original character that you started with. I believe what most ASCII
    encoders do for characters outside of ASCII is replace them with the '?'
    character.

    > default handles
    > the unicode. You might add control character category and reject
    > such passwords. Putting whitespace on either end of a password is not
    > a wise idea.


    I suspect whitespace isn't that big of a problem, because any password
    validationg system which performs a trim() on the password before processing
    it is probably very poorly designed. Control characters (e.g. backspace,
    EOF, etc.) is probably a very bad idea, because different systems will
    handle them differently. Using outside-of-ASCII characters is also a bit
    risky for web based authentication, because one day you might be trying to
    access your site from a terminal which only supports ASCII. As Unicode
    support becomes more widespread, this will probably be less of an issue.

    One particularly bad password system implementation is Microsoft's ".NET
    Passport" (which actually has very little to do with the .NET platform, to
    which C# usually compiles). When you create your passport account, your
    password is silently truncated to something like 12 or 14 characters; but
    when you validate your password, it doesn't get truncated.

    So if I create a new account with the password "1234567890ABCDEF", the
    database will be updated to say that my password is "1234567890AB", but the
    website never mentions that truncation has occured. Then when I try to log
    on with the password "1234567890ABCDEF", it compares "1234567890ABCDEF"
    (what I wrote) against "1234567890AB" (what's in the DB), sees that they are
    not equal, and tell me that my password is incorrect.

    It took me several days to figure out why my 20 character password
    wasn't working.

    - Oliver
    Oliver Wong, Apr 25, 2006
    #7
  8. Chris Uppal Guest

    Oliver Wong wrote:

    > To test whether a given unicode character is outside of (or inside of,
    > for that matter) ASCII, you could serialize it to ASCII, then re-read the
    > ASCII data back into an in-memory Java string, and check if you still have
    > the same original character that you started with.


    What's wrong with just testing whether it's < 128 ?


    > So if I create a new account with the password "1234567890ABCDEF", the
    > database will be updated to say that my password is "1234567890AB", but
    > the website never mentions that truncation has occured. Then when I try
    > to log on with the password "1234567890ABCDEF", it compares
    > "1234567890ABCDEF" (what I wrote) against "1234567890AB" (what's in the
    > DB), sees that they are not equal, and tell me that my password is
    > incorrect.


    I think it was Sun who (inspired to a display of the very highest technical
    standards), mapped my user-id and password to lower case before entering them
    into the database, but didn't perform the same mapping when checking them
    later...

    -- chris
    Chris Uppal, Apr 26, 2006
    #8
  9. Oliver Wong Guest

    "Chris Uppal" <-THIS.org> wrote in message
    news:444f5db2$1$653$...
    > Oliver Wong wrote:
    >
    >> To test whether a given unicode character is outside of (or inside
    >> of,
    >> for that matter) ASCII, you could serialize it to ASCII, then re-read the
    >> ASCII data back into an in-memory Java string, and check if you still
    >> have
    >> the same original character that you started with.

    >
    > What's wrong with just testing whether it's < 128 ?


    Erm, er... I was trying to write code that didn't depend on the internal
    encoding being UTF-16. Yeah, that's it. More robust and all that. I mean,
    what if in Java 7, they decide to switch to EBCDIC internally, huh?

    - Oliver
    Oliver Wong, Apr 26, 2006
    #9
  10. Morten Alver Guest

    Oliver Wong wrote:
    >
    > "Chris Uppal" <-THIS.org> wrote in message
    > news:444f5db2$1$653$...
    >
    >> Oliver Wong wrote:
    >>
    >>> To test whether a given unicode character is outside of (or
    >>> inside of,
    >>> for that matter) ASCII, you could serialize it to ASCII, then re-read
    >>> the
    >>> ASCII data back into an in-memory Java string, and check if you still
    >>> have
    >>> the same original character that you started with.

    >>
    >>
    >> What's wrong with just testing whether it's < 128 ?

    >
    >
    > Erm, er... I was trying to write code that didn't depend on the
    > internal encoding being UTF-16. Yeah, that's it. More robust and all
    > that. I mean, what if in Java 7, they decide to switch to EBCDIC
    > internally, huh?


    You can also query a CharsetEncoder (which you can get from the
    newEncoder() method of a Charset) whether it can encode a char or a
    CharSequence, using the canEncode() method. This is useful in general
    for detecting whether the charset you are using supports all the
    characters you'd like to write.


    --
    Morten
    Morten Alver, Apr 26, 2006
    #10
  11. Roedy Green Guest

    On Wed, 26 Apr 2006 13:46:45 GMT, "Oliver Wong" <>
    wrote, quoted or indirectly quoted someone who said :

    > Erm, er... I was trying to write code that didn't depend on the internal
    >encoding being UTF-16. Yeah, that's it. More robust and all that. I mean,
    >what if in Java 7, they decide to switch to EBCDIC internally, huh?


    Surely the use of Unicode is cast in stone in the JLS somewhere. If
    they changed that encoding, thousands of programs would break because
    Java uses \uxxxx to encode literals.
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
    Roedy Green, Apr 26, 2006
    #11
  12. Chris Uppal Guest

    Oliver Wong wrote:

    [me:]
    > > What's wrong with just testing whether it's < 128 ?

    >
    > Erm, er... I was trying to write code that didn't depend on the
    > internal encoding being UTF-16. Yeah, that's it. More robust and all
    > that.


    ;-)


    >I mean, what if in Java 7, they decide to switch to EBCDIC
    > internally, huh?


    Like the way they changed from "it's pure Unicode data without any encoding" to
    "Ha-ha! Fooled you! It's actually encoded as UTF-16"...

    -- chris
    Chris Uppal, Apr 27, 2006
    #12
  13. Oliver Wong Guest

    "Roedy Green" <> wrote in
    message news:...
    > On Wed, 26 Apr 2006 13:46:45 GMT, "Oliver Wong" <>
    > wrote, quoted or indirectly quoted someone who said :
    >
    >> Erm, er... I was trying to write code that didn't depend on the internal
    >>encoding being UTF-16. Yeah, that's it. More robust and all that. I mean,
    >>what if in Java 7, they decide to switch to EBCDIC internally, huh?

    >
    > Surely the use of Unicode is cast in stone in the JLS somewhere. If
    > they changed that encoding, thousands of programs would break because
    > Java uses \uxxxx to encode literals.


    The javac compiler could still accept input of the form \uxxxx, and
    translate to some sort of EBCDIC representation to be emitted to the
    classfiles. But yes, I believe somewhere in the JLS, Unicode is explicitly
    mentioned (though I'm too lazy to verify this right now).

    - Oliver
    Oliver Wong, Apr 27, 2006
    #13
  14. Roedy Green Guest

    On Thu, 27 Apr 2006 16:26:42 GMT, "Oliver Wong" <>
    wrote, quoted or indirectly quoted someone who said :

    > The javac compiler could still accept input of the form \uxxxx, and
    >translate to some sort of EBCDIC representation to be emitted to the
    >classfiles. But yes, I believe somewhere in the JLS, Unicode is explicitly
    >mentioned (though I'm too lazy to verify this right now).


    Java is carefully specifies the language so that the internal
    representation of anything is none of your business, and you can't
    find out by writing a program (e.g. they could use UTF-8 for strings
    for example). However, the Unicodeness is built into the language in
    that \uxxxx in the source code will come out with DataOutputStream
    write char to that same binary number, and that \u0xxx will map onto
    the right ASCII subset of Unicode to produce all the Java keywords.


    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
    Roedy Green, Apr 27, 2006
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jay Douglas

    Custom Regular Expressions in ASP.net

    Jay Douglas, Nov 2, 2003, in forum: ASP .Net
    Replies:
    3
    Views:
    605
    mikeb
    Nov 3, 2003
  2. mark

    Regular expressions

    mark, Jun 30, 2003, in forum: Perl
    Replies:
    4
    Views:
    1,719
  3. Dustin D.
    Replies:
    1
    Views:
    11,194
  4. Jay Douglas
    Replies:
    0
    Views:
    600
    Jay Douglas
    Aug 15, 2003
  5. Noman Shapiro
    Replies:
    0
    Views:
    232
    Noman Shapiro
    Jul 17, 2013
Loading...

Share This Page