Password check with regular expressions

Adam Lipscombe · Feb 11, 2009

Folks,

I am struggling with regular expressions to validate passwords in Struts 1.3.

At this stage I simply want to check that there are no repeating characters in the string.
My code is:

String password = "123456"; // Should pass

Pattern p = Pattern.compile("^(?!.*(.)\\1{1})");
Matcher m = p.oatcher(password);
if (true == m.lookingAt())
{
// Error = repeating char
System.out.println("repeating chars");
}

Is this the correct approach?
Can anyone suggest a suitable expression - the one I used does not work.

Later I want to check for at least one uppercase letter, one lowercase letter and a digit.

TIA - Adam

Lew · Feb 11, 2009

Eric said:
ITYM

if ((true == m.lookingAt()) != false)

or possibly

if (((true == m.lookingAt()) != false) == true)

;-)

if ( (((true == m.lookingAt()) != false) == true? true : false) )

John B. Matthews · Feb 11, 2009

Adam Lipscombe said:
At this stage I simply want to check that there are no repeating
characters in the string.

None at all? If your criteria are too arbitrary, users will waste time
trying to dodge your check. Instead, consider assessing the users
password strength as it is typed. If you animate the display using a
progress indicator, users will devote time trying to maximize their
score. Arguably, an improvement.

[...]

Later I want to check for at least one uppercase letter, one
lowercase letter and a digit.

Here's an article that discusses some additional, useful heuristics:

<http://www.cise.ufl.edu/~heliu/project2.pdf>

One common form of attack uses a dictionary. Given a suitable word list
and data structure, fragments of candidate passwords can be checked
quickly with the contains() method of the Set interface:

private static final String NAME = "/usr/share/dict/words";
private static final Set<String> set = new TreeSet<String>();
static {
try {
File file = new File(NAME);
BufferedReader in = new BufferedReader(
new InputStreamReader(new FileInputStream(file)));
String s;
while ((s = in.readLine()) != null) {
set.add(s);
}
} catch (IOException ex) {
System.err.println(ex.getMessage());
}
}

Mark Space · Feb 11, 2009

Adam said:
Later I want to check for at least one uppercase letter, one lowercase
letter and a digit.

The problem here is that regex is ordered, so I think you'd have to
check for each permutation of which comes first: capital letters, lower
case letters or numbers:

..*[A-Z]+.*[a-z]+.*[0-9]+.*|.*[A-Z]+.*[0-9]+.*[a-z]+.*|.*[a-z]+.*[A-Z]+.*[0-9]+.*|

Six terms total, I believe. (There are only three above.) There's no
way, off hand that I can see, to do an "and". You'll have to supply
each permutation in between |'s . Unless your framework allows you to
specify multiple regex that all must match, then just use "[A-Z]+"
"[a-z]+" and "[0-9]+" independently.

I think it would be easier to just check the length with regex, then
write some servlet code to do more detailed checking. Use ".{6,12}"
will require 6 to 12 characters, then do more checking in code.

String password = "123456";
char[] passChars = new char[ password.length() ];
password.getChars( 0, password.length(), passChars, 0 );
Arrays.sort( passChars );
for( int i = 0; i < passChars.length-1; i++ ) {
if( passChars == passChars[i+1] ) {
throw new RepeatedCharacterException();
}
}

Not compiled or tested ...

Martin Gregorie · Feb 12, 2009

I think it would be easier to just check the length with regex, then
write some servlet code to do more detailed checking. Use ".{6,12}"
will require 6 to 12 characters, then do more checking in code.

String password = "123456";
char[] passChars = new char[ password.length() ]; password.getChars(
0, password.length(), passChars, 0 ); Arrays.sort( passChars );
for( int i = 0; i < passChars.length-1; i++ ) {
if( passChars == passChars[i+1] ) {
throw new RepeatedCharacterException();
}
}

Not compiled or tested ...

Why bother with Patterns and Matchers at all? This is only looking at
typed-in passwords, so performance is unlikely to be an issue, so he can
just use String.match(), e.g.

if (pw.matches("\d") && pw.matches("[a-z]") && pw.matches("[A-Z]"))
{
// Found a digit, a capital letter and a lower case letter
}

Another useful test would be to insist that the capital letter doesn't
start the password.

I think you're right about using a plain scan loop: I can't think of a
way to make a regex match two identical adjacent characters, but the sort
may be unnecessary. I may have misunderstood him, but I read the OP's 'no
repeating characters' as meaning that 'saad' would be unacceptable, but
'sada' would be OK.

Final comment: if the OP is going to this trouble I wonder if he's going
to enforce password changing and prevent a previous password from being
reused.

Roedy Green · Feb 12, 2009

Is this the correct approach?
Can anyone suggest a suitable expression - the one I used does not work.

Regexes are not a suitable tool. They don't have memory needed for
such tasks, or for nesting.

I can think of three approaches.

1. for each char in the string, scan the rest of the string to see if
it occurs.

2. for each char in the string, add it to a HashSet. If it is there
already, oops.

3. use a BitSet. Turn off all the bits. Turn on bits indexed by
chars present. If already there, oops.
--
Roedy Green Canadian Mind Products
http://mindprod.com

"Here is a point of no return after which warming becomes unstoppable
and we are probably going to sail right through it.
It is the point at which anthropogenic (human-caused) warming triggers
huge releases of carbon dioxide from warming oceans, or similar releases
of both carbon dioxide and methane from melting permafrost, or both.
Most climate scientists think that point lies not far beyond 2°C (4°F) C hotter."
~ Gwynne Dyer

Arved Sandstrom · Feb 12, 2009

John B. Matthews said:
None at all? If your criteria are too arbitrary, users will waste time
trying to dodge your check. Instead, consider assessing the users
password strength as it is typed. If you animate the display using a
progress indicator, users will devote time trying to maximize their
score. Arguably, an improvement.

[ SNIP ]

Lot of truth to this observation. I think the psychology behind it is, most
people get mildly offended if the password they dream up is rated as Low
Strength or Weak, and put some effort into seeing that colour indicator go
to a green Strong.

Password checkers that also require at least one of lowercase, uppercase,
digit and special character, plus minimum lengths, drive pretty good
passwords. Whenever I end up having to create a password matching these
criteria I invariably end up with something that I literally cannot remember
in and of itself...I need to invent some kind of mnemonic.

AHS

blue indigo · Feb 12, 2009

Arved Sandstrom wrote:
...
...

For me, they result in the worst possible passwords, the ones I have to
write down and carry around with me.

The easiest way for me to remember a reasonably good password is to pick
a sentence or phrase I can remember, and use the initial letters of the
words. Unfortunately, that only works with flexible password checkers.

That's when leet-speak comes in handy. (The ONLY time leet-speak comes in
handy.)

"King Philip Came Over From Geneva, Spain"
|
V
"kpcofgs" (fails many)
|
V
"kPc0fG5" (passes many)
|
V
"|<Pc0fG5" (passes even more, when they accept punctuation)

or

"kPc0fG_5" (passes many, underscore where original phrase had a comma,
period, or other punctuation).

Remembering which leet changes were made is easy if you made all the
classic ones -- zero for O, one for L, five for S, seven for T, and four
for H -- wherever you could. As for the capital letters, put them where
proper names (e.g. Philip, Geneva, and Spain) were in the source phrase.
(But in this case, the S in Spain becomes a 5 so that one's moot.)

The downside is, once enough people do this the black hats will add such
a method to their dictionary attack scripts, and passwords like that will
start to be considered weak, and we'll start all over again.

If the phrase is not widely known, but one you personally know well, it
will be harder for garden-variety script kiddies to crack even so. On the
other hand, people that know you well might be able to guess it.

Security tends to create a twisty little maze of tradeoffs, all different.

Tom McGlynn · Feb 12, 2009

On Feb 11, 7:04 pm, Martin Gregorie

....

I think you're right about using a plain scan loop: I can't think of a
way to make a regex match two identical adjacent characters, but the sort
may be unnecessary. I may have misunderstood him, but I read the OP's 'no
repeating characters' as meaning that 'saad' would be unacceptable, but
'sada' would be OK.

While I don't think it really affects the overall theme of this
thread,
that a single regular expression is not likely to do what the original
poster
wants, generating a regex to find repeated characters isn't hard.
The regular expression
.*(.)\1.*
should match any string with an immediately repeated character
(e.g., 'saad')

Or if you want to check if all characters are unique try
.*(.).*\1.*
to allow intervening characters (e.g., 'sada')

Back references (e.g., the \1 above) are the key to this kind of
regex.

Regards,
Tom McGlynn

Andreas Leitgeb · Feb 12, 2009

Tom McGlynn said:
generating a regex to find repeated characters isn't hard.
The regular expression
.*(.).*\1.*

Just want to point out, that while it is a legal pattern to
Java's regex library and a good solution for that kind of
problem, it is not really a "regular expression" in the
mathematical sense.

That feature is so special, that even context free grammars
wouldn't be able to match it. Otoh, context free grammars
can match e.g. "ww^r" (w: any word, w^r: the same word
reversed) that Java/perl's regex cannot.

PS: the leading and trailing ".*" are redundant, unless
you explicitly want the whole line in matcher's group(0).

Tom McGlynn · Feb 12, 2009

Just want to point out, that while it is a legal pattern to
Java's regex library and a good solution for that kind of
problem, it is not really a "regular expression" in the
mathematical sense.

That feature is so special, that even context free grammars
wouldn't be able to match it. Otoh, context free grammars
can match e.g. "ww^r" (w: any word, w^r: the same word
reversed) that Java/perl's regex cannot.

PS: the leading and trailing ".*" are redundant, unless
you explicitly want the whole line in matcher's group(0).

In the Java context, the Java definition seems appropriate. In the
more general computing context, back references are a pretty
established feature of regular expressions -- at least those that
derive from Perl.
This is likely a case where 'computer science' differs from computing
(or at least programming) practice.

I had explicitly included the leading and trailing .* in light of an
earlier mention in this discussion of using the String.matches() which
(erroneously I believe) suggested that it would match with just part
of the string.

Regards,
Tom

Tom McGlynn · Feb 12, 2009

Adam said:
Adam said:

Later I want to check for at least one uppercase letter, one lowercase
letter and a digit.

Click to expand...

The problem here is that regex is ordered, so I think you'd have to
check for each permutation of which comes first: capital letters, lower
case letters or numbers:

.*[A-Z]+.*[a-z]+.*[0-9]+.*|.*[A-Z]+.*[0-9]+.*[a-z]+.*|.*[a-z]+.*[A-Z]+.*[0-9]+.*|

Six terms total, I believe. (There are only three above.) There's no
way, off hand that I can see, to do an "and". You'll have to supply
each permutation in between |'s . Unless your framework allows you to
specify multiple regex that all must match, then just use "[A-Z]+"
"[a-z]+" and "[0-9]+" independently.

I think it would be easier to just check the length with regex, then
write some servlet code to do more detailed checking. Use ".{6,12}"
will require 6 to 12 characters, then do more checking in code.

String password = "123456";
char[] passChars = new char[ password.length() ];
password.getChars( 0, password.length(), passChars, 0 );
Arrays.sort( passChars );
for( int i = 0; i < passChars.length-1; i++ ) {
if( passChars == passChars[i+1] ) {
throw new RepeatedCharacterException();
}
}

Not compiled or tested ...

If you're restricting yourself to String.matches() this may seem
pretty odious. It's a lot simpler if you're willing to use Matcher's
group methods. E.g., say the requirement is only upper and lower case
letters and numbers are allowed and one of each is required.

// Basic idea tested but not the actual code...
Pattern p = new Pattern("(([a-z])|([A-Z])|(0-9))*");
Matcher m = p.matcher(pwd);

if (!m.matches()) {
... invalid characters...
} else {
boolean hasLC = m.group(1) != null;
boolean hasUC = m.group(2) != null;
boolean hasNum = m.group(3) != null;
if (hasLC && hasUC && hasNum) {
... has all required fields ...
}
}

It's easy to allow or require other classes of characters too. The *
can be replaced by something that puts appropriate limits on the
length of the password. To me this looks like pretty nice way to
encapsulate these requirements. It's easily extensible and
customizable.

I'm not sure how to easily combine this check with the limit on
repetition (see other posts for how this might be done as a regex),
so I'd still agree than using a single regular expression may not be
the best solution. A sequence of two or three fairly simple ones
might do the trick though for a simple checker.

Regards,
Tom McGlynn

Andreas Leitgeb · Feb 12, 2009

Tom McGlynn said:
In the Java context, the Java definition seems appropriate. In the
more general computing context, back references are a pretty
established feature of regular expressions -- at least those that
derive from Perl.

I just wanted to point out that backref is one of the difference between
certain programming languages' definition of RE, and the (much older)
mathematical term "regular expression" going back to the 1950s.

http://en.wikipedia.org/wiki/Regular_expression
" It is worth noting that many real-world "regular expression" engines
" implement features that cannot be expressed in the regular expression
" algebra; ...

I had explicitly included the leading and trailing .* in light of an
earlier mention in this discussion of using the String.matches() which
(erroneously I believe) suggested that it would match with just part
of the string.

I correct myself, in that those ".*" are indeed *not* redundant,
unless one used Matcher's "find()" instead of "matches()".
Obviously I had mixed up Matcher.matches() with Matcher.find().
Sorry.

Mark Space · Feb 12, 2009

Tom said:
Or if you want to check if all characters are unique try
.*(.).*\1.*
to allow intervening characters (e.g., 'sada')

Nice. However, is this really practical for a password checker? What's
actually wrong with sadaxyz as a password, for example? You reduce
the number of permutations quite a bit if you remove each character as a
choice once it has been used.

Is there anyway to supply a regex that allows no more than X repeated
characters? How about one where the maximum number of repeated
characters is half the length of the total string?

class RepeatedCharacterException extends Exception
{
}

static void testPasswordRepeats( String password )
throws RepeatedCharacterException
{
final int maxRepeats = password.length() / 2;
char[] passChars = new char[password.length()];
password.getChars( 0, password.length(), passChars, 0 );
Arrays.sort( passChars );
int repeats = 0;
for( int i = 0; i < passChars.length - 1; i++ ) {
if( passChars == passChars[i + 1] ) {
System.err.println( password + ": repeated: " +
passChars );
repeats++;
}
}
if( repeats > maxRepeats )
throw new RepeatedCharacterException();
}

Tom McGlynn · Feb 12, 2009

Nice. However, is this really practical for a password checker? What's
actually wrong with sadaxyz as a password, for example? You reduce
the number of permutations quite a bit if you remove each character as a
choice once it has been used.

Just illustrating how regular expressions would solve certain problems
as posed. The applicability to the original problem is up to the OP!

Is there anyway to supply a regex that allows no more than X repeated
characters?

Sure. That one's easy. Suppose you want no more than three repeated
characters.
You can do
.*(.)(\1){3}.*
Any string that matches this fails.

... How about one where the maximum number of repeated
characters is half the length of the total string?

I'm not a regular expression expert but I'd guess that it's possible.
I suspect it's not especially comprehensible though. However this IS
easily handled by making the regex a variable.

E.g.,
int maxRepeat = 1;

if (pwd.length() > 2) {
maxRepeat = pwd.length()/2;
}

String regex= ".*(.)(\\1){" + maxRepeat + "}.*";
Pattern p = new Pattern(regex);
if (p.matcher(pwd).matches()) {
... this is invalid
}

YMMV, but I find this reasonably intuitive -- but I may have spent too
much of my life staring at these bits of line noise. It says

..* Anything (or possibly nothing),
(.) followed by a character,
(\\1){N} followed by the same character N times,
..* followed by anything (or possibly nothing).

So it matches anything that has the same character N+1 times in a row.
Isn't that obvious!

The () around the backreference aren't really needed, but I generally
like to
use them when I have a repeat count.

class RepeatedCharacterException extends Exception
{
}

static void testPasswordRepeats( String password )
throws RepeatedCharacterException
{
final int maxRepeats = password.length() / 2;
char[] passChars = new char[password.length()];
password.getChars( 0, password.length(), passChars, 0 );
Arrays.sort( passChars );
int repeats = 0;
for( int i = 0; i < passChars.length - 1; i++ ) {
if( passChars == passChars[i + 1] ) {
System.err.println( password + ": repeated: " +
passChars );
repeats++;
}
}
if( repeats > maxRepeats )
throw new RepeatedCharacterException();
}

Looking at your code I gather you're trying to exclude things like
abababab as well as aaabbb which is what I was going for above. That
requires a slightly different regex:

.*(.)(.*\1){N}.*

where N is the maximum number of repeats that are OK. No sorting or
playing with the password string required.

I don't think people should necessarily use regexes just because they
can solve a particular problem. They can add a bit of flexibility
since they are essentially a mini-program in themselves, but this
means that they are also really easy to get wrong. I always have to
check mine (which I confess I haven't for these though I think they
are right!).

Regards,
Tom

Mark Space · Feb 13, 2009

Tom said:
Looking at your code I gather you're trying to exclude things like
abababab as well as aaabbb which is what I was going for above. That
requires a slightly different regex:

.*(.)(.*\1){N}.*

where N is the maximum number of repeats that are OK. No sorting or
playing with the password string required.

That's interesting. I've mostly mucked around with regex using grep and
vi. I've never made a study of it. I appreciate you taking the time to
explain these regex strings. Most make sense to me after I read them.
However, I'm having a hard time understand why that string above works,
and this one fail:

".*(.).*(\\1){3}.*"

The second ".*" in the middle of the string there seems equivalent to
yours. Both match zero or more characters, and both do so just before
the capture sequence \1. So I don't see why putting the .* grouped with
the \1 makes a difference, but apparently it does.

BTW, I use \\1 because it's from a Java string, which I used to test
with. Code fragment:

private static void testMatchers()
{
Pattern pattern = Pattern.compile( ".*(.).*(\\1){3}.*" );
Pattern pattern2 = Pattern.compile( ".*(.)(.*\\1){3}.*" );
ArrayList<Matcher> matchStrings = newArrayList();
matchStrings.add( pattern.matcher( "asasasasasasasas" ) );
matchStrings.add( pattern.matcher( "adad1234567890" ) );
matchStrings.add( pattern2.matcher( "asasasasasasasas" ) );
matchStrings.add( pattern2.matcher( "adad1234567890" ) );
for( Matcher matchString : matchStrings ) {
if( matchString.matches() ) {
System.out.println( matchString.toString() + " matches "
+ matchString.pattern() );
}
}
}

This produces the output:

run:
java.util.regex.Matcher[pattern=.*(.)(.*\1){3}.* region=0,16
lastmatch=asasasasasasasas] matches .*(.)(.*\1){3}.*
BUILD SUCCESSFUL (total time: 2 seconds)

So only the second pattern is matching.

Arved Sandstrom · Feb 13, 2009

Arved Sandstrom wrote:
...
...

For me, they result in the worst possible passwords, the ones I have to
write down and carry around with me.

The easiest way for me to remember a reasonably good password is to pick
a sentence or phrase I can remember, and use the initial letters of the
words. Unfortunately, that only works with flexible password checkers.

Patricia

I'm not too worried about writing down passwords. Whatever method you use
there's always a downside. For example, between work and personal affairs
I have perhaps 3 dozen passwords that I need to maintain. Regardless of
how good the individual passwords are, I have the problem of remembering
what password is used for what. And these passwords are spread over half
a dozen isolated environments, so I can't even use a single password to
unlock a slew of others. Inevitably therefore I use the same password for
a bunch of different things, and...horror of horrors...I write others
down. The latter being the ones I could remember but rarely use, or ones
that are really tough to remember no matter what.

For someone to use one of my recorded passwords for work, they'd need a
few hours to figure out where I've got it written down, then try to
figure out that something _is_ a password, then figure out what it's for.
Because of my physical work environment, nobody would have a few
undisturbed hours with the book in question...they'd have better luck if
they broke into my apartment while my girlfriend and I were out, and
started photographing pages...and somehow I'm just not that worried about
that scenario.

AHS

Tom McGlynn · Feb 13, 2009

Nice. However, is this really practical for a password checker? What's
actually wrong with sadaxyz as a password, for example? You reduce
the number of permutations quite a bit if you remove each character as a
choice once it has been used.

I sent in an earlier response, but I don't see it. The essence was
that the regular
expression
.*(.)(\1){N}.*
would match a set on N+1 (or more) consecutive identical characters
while
.*(.)(.*\1){N}.*
should match any expression which reused a single character more than
N times.
Here N should be replaced by some actual number of course.

If you want set N dynamically (e.g., dependent on the length of the
string),
then just build the regular expression string once you know the length
you want.

However it's much more important to program in a way that you are sure
is correct than one that may exercise some particular aspect of Java.
I think you can use regular expressions here, but that doesn't mean
you should.

Regards,
Tom

Tom McGlynn · Feb 13, 2009

On Feb 12 said:
However, I'm having a hard time understand why that string above works,
and this one fail:

".*(.).*(\\1){3}.*"

The second ".*" in the middle of the string there seems equivalent to
yours. Both match zero or more characters, and both do so just before
the capture sequence \1. So I don't see why putting the .* grouped with
the \1 makes a difference, but apparently it does.

The syntax
.*(\\1){3}
says find three consecutive strings which each exactly match whatever
matched in the first set of parentheses. Nothing is allowed between
these repetitions but they can follow any junk you like.

(.*\\1){3}
says find three consecutive strings, each of which ends with whatever
was matched in the first set of parentheses.

Your version will match
axyzaaa
since anything is allowed between the first character we're going to
match and it's three repetitions, but it doesn't allow for text
between the repetitions like
axayaza

If we were using your regex, the x could match against the the
inital .*, but since that isn't repeated it's not available to match
against the y.

Regards,
Tom

Mark Space · Feb 13, 2009

Tom said:
says find three consecutive strings which each exactly match whatever

Ah, gotcha. I see it now. It's pretty obvious once it's explained,
isn't it? Thanks for taking the time to point that out!

Trouble with regular expressions	8	Jul 29, 2009
Parsing with complex regular expressions	1	Apr 24, 2007
multiple pattern replacement using regular expressions	1	Feb 20, 2004
Cleaning strings with Regular Expressions	6	Sep 6, 2005
Password Regular Expression	9	May 15, 2004
regular expression AND required validator together?	9	Feb 21, 2008
XSD: Password validation	2	May 25, 2004
ANN: 'rex', a module for easy creation and use of regular expressions	0	Jun 10, 2004

Password check with regular expressions

Adam Lipscombe

Lew

John B. Matthews

Mark Space

Martin Gregorie

Roedy Green

Arved Sandstrom

blue indigo

Tom McGlynn

Andreas Leitgeb

Tom McGlynn

Tom McGlynn

Andreas Leitgeb

Mark Space

Tom McGlynn

Mark Space

Arved Sandstrom

Tom McGlynn

Tom McGlynn

Mark Space

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads