Trying to write my first Regex's

R

Robert TV

Hi, I am trying to learn the fine points of writing correct regex's to
untaint my data. I have gone through a few tutorials and I have a very basic
idea of their operations. I would like some assistance writing them
correctly.

Example 1

$name = "Jimmy Spenser";
# allow $name to only have letters or spaces by filtering out unwanted junk
if ($name =~ /\d|[\!\@\#\$\%\^\&\*\(\)\-\=\_\+]/;) {
print "Bad"
} else {
print "Good";
}

Im sure the above is sloppy and right now your laughing. Also there are
other charaters that exist that were not included in the filter. It was my
goal to filter out and digits "\d" and all the trailing characters. I tried
$name =~ /\W/ but that wouldn't allow spaces. What is the best was to allow
$name to only have any case letters or spaces?

Example 2

$address = "#12 - 4243 Jones Street.";
# allow $address to only have letters, digits, the # sign or spaces by
filtering out unwanted junk
if ($name =~ /[\!\@\$\%\^\&\*\(\)\-\=\_\+]/;) {
print "Bad"
} else {
print "Good";
}

Now my filter needs to allow digits and the # sign as well as letters and
periods and spaces etc. Is there a way to better write these filters so that
I can "define" what I consider allowable instead of filtering out what is
bad? $name is allowed to have for instance /digits/letters/number
sign/period/spaces/ but does not HAVE to contain them, any other charater
would be detected as bad.

My end goal will be creating a web form that will be secsure by not allowing
bad stuff.

Thank you all

Robert
 
B

Bob Walton

Robert said:
Hi, I am trying to learn the fine points of writing correct regex's to
untaint my data. I have gone through a few tutorials and I have a very basic
idea of their operations. I would like some assistance writing them
correctly.

Example 1

$name = "Jimmy Spenser";
# allow $name to only have letters or spaces by filtering out unwanted junk
if ($name =~ /\d|[\!\@\#\$\%\^\&\*\(\)\-\=\_\+]/;) {


You'd better carefully read and study "perldoc perlre" -- that regexp
isn't even close. It will match any string containing anywhere in it
one of the characters: a digit, !, @, #, $, %, ^, &, *, (, ), -, =, _,
+, but will fail to match many many other characters you probably don't
want either, like all the control characters, ~, `, [, {, |, \, etc etc.
If you wanted to match any string which contains a character that is
not a letter or whitespace, you might try:

if($name =~ /[^a-z\s]/i){

But warning: that is not how to untaint stuff. Keep reading.

print "Bad"
} else {
print "Good";
}


Well, you want to design a regexp that will allow only what you want,
not one that disallows specific stuff -- if you happen to neglect a
disallow item, it would get through. So to have a regexp that matches
only on all letters or whitespace, try:

if($name =~ /^[a-z\s]*$/i){
print "Good\n";
}
else{
print "Bad\n";
}

In that regexp, the /i switch is used on the end to make it case
insensitive (saves making the character class [a-zA-Z\s]). The ^
anchors the start of the match at the beginning of the string so
something like ***blah won't match, and the $ anchors the end of the
match at the end of the string so something like blah*** won't match.
Note that \s is a code for a regexp that matches any one single
whitespace character.

You should also read up on tainting (perldoc perlsec) where you will
learn that you need to assign a variable's value from one of the $1, $2
etc variables which result from a successful pattern match from a regexp
containing parentheses groupings. This means something like:

...
if($name =~ /^([a-z\s]*)$/i){
$name=$1; #$name is now untainted
}
else{
die "\$name had a bad value which I refuse to untaint: $name";
}
...

Im sure the above is sloppy and right now your laughing. Also there are
other charaters that exist that were not included in the filter. It was my
goal to filter out and digits "\d" and all the trailing characters. I tried
$name =~ /\W/ but that wouldn't allow spaces. What is the best was to allow
$name to only have any case letters or spaces?

Example 2

$address = "#12 - 4243 Jones Street.";
# allow $address to only have letters, digits, the # sign or spaces by
filtering out unwanted junk
if ($name =~ /[\!\@\$\%\^\&\*\(\)\-\=\_\+]/;) {
print "Bad"
} else {
print "Good";
}


Again, write a regexp to match only on what you *want to permit*, like:

if($name =~ /^([a-z\d#\s]*)$/i){
$name=$1; #$name now untainted
}
else {
die "I refuse to untaint this tainted crap: $name";
}

I note, though, that this will fail on your example string because it
contains a period and a hyphen, neither of which is among your defined
permitted characters above.

Now my filter needs to allow digits and the # sign as well as letters and
periods and spaces etc. Is there a way to better write these filters so that
I can "define" what I consider allowable instead of filtering out what is
bad? $name is allowed to have for instance /digits/letters/number
sign/period/spaces/ but does not HAVE to contain them, any other charater
would be detected as bad.

My end goal will be creating a web form that will be secsure by not allowing
bad stuff.


An admirable goal. Be sure to very carefully think through what you
permit, as making a bad decision in your untainting regexp can leave
security holes. Just the fact that Perl considers the data to be
untainted does not mean it is secure -- that is up to your regexp. Perl
helps you a lot by letting you know it is certain that you did pass the
data through an untaining regexp.


....
 
I

Iain Chalmers

Robert TV said:
Hi, I am trying to learn the fine points of writing correct regex's to
untaint my data. I have gone through a few tutorials and I have a very basic
idea of their operations. I would like some assistance writing them
correctly.

Example 1

$name = "Jimmy Spenser";
# allow $name to only have letters or spaces by filtering out unwanted junk
if ($name =~ /\d|[\!\@\#\$\%\^\&\*\(\)\-\=\_\+]/;) {
print "Bad"
} else {
print "Good";
}

Im sure the above is sloppy and right now your laughing. Also there are
other charaters that exist that were not included in the filter. It was my
goal to filter out and digits "\d" and all the trailing characters. I tried
$name =~ /\W/ but that wouldn't allow spaces. What is the best was to allow
$name to only have any case letters or spaces?

Note the ^ as the first character in a character class negates the
class, so:


if ($name =~ /[^A-Za-z ]/) { print "Bad"}

means "if name contains anything thats not [A-Za-z ]"
Example 2

$address = "#12 - 4243 Jones Street.";
# allow $address to only have letters, digits, the # sign or spaces by
filtering out unwanted junk
if ($name =~ /[\!\@\$\%\^\&\*\(\)\-\=\_\+]/;) {
print "Bad"
} else {
print "Good";
}

if ($address=~ /[^0-9A-Za-z#. ]/) { print "Bad"}
Now my filter needs to allow digits and the # sign as well as letters and
periods and spaces etc. Is there a way to better write these filters so that
I can "define" what I consider allowable instead of filtering out what is
bad? $name is allowed to have for instance /digits/letters/number
sign/period/spaces/ but does not HAVE to contain them, any other charater
would be detected as bad.

See character classes in perlre

perldoc perlre

cheers,

big
 
R

Robert TV

Bob Walton said:
An admirable goal. Be sure to very carefully think through what you
permit, as making a bad decision in your untainting regexp can leave
security holes. Just the fact that Perl considers the data to be
untainted does not mean it is secure -- that is up to your regexp. Perl
helps you a lot by letting you know it is certain that you did pass the
data through an untaining regexp.

Thank you Bob, that was an excellent reply, your suggestions and advice will
be of great value in my learning process. I really appreciate your
assistance.

Robert
 
D

Daedalus

Now my filter needs to allow digits and the # sign as well as letters
and
An admirable goal. Be sure to very carefully think through what you
permit, as making a bad decision in your untainting regexp can leave
security holes. Just the fact that Perl considers the data to be
untainted does not mean it is secure -- that is up to your regexp. Perl
helps you a lot by letting you know it is certain that you did pass the
data through an untaining regexp.

It might be a good idea to make a more precise regexp when permitting
special caracter, specifying where it can be used in the string rather than
just permit it within a class.

DAE
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,528
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top