Trying to write my first Regex's

Discussion in 'Perl Misc' started by Robert TV, Jun 25, 2004.

  1. Robert TV

    Robert TV Guest

    Hi, I am trying to learn the fine points of writing correct regex's to
    untaint my data. I have gone through a few tutorials and I have a very basic
    idea of their operations. I would like some assistance writing them
    correctly.

    Example 1

    $name = "Jimmy Spenser";
    # allow $name to only have letters or spaces by filtering out unwanted junk
    if ($name =~ /\d|[\!\@\#\$\%\^\&\*\(\)\-\=\_\+]/;) {
    print "Bad"
    } else {
    print "Good";
    }

    Im sure the above is sloppy and right now your laughing. Also there are
    other charaters that exist that were not included in the filter. It was my
    goal to filter out and digits "\d" and all the trailing characters. I tried
    $name =~ /\W/ but that wouldn't allow spaces. What is the best was to allow
    $name to only have any case letters or spaces?

    Example 2

    $address = "#12 - 4243 Jones Street.";
    # allow $address to only have letters, digits, the # sign or spaces by
    filtering out unwanted junk
    if ($name =~ /[\!\@\$\%\^\&\*\(\)\-\=\_\+]/;) {
    print "Bad"
    } else {
    print "Good";
    }

    Now my filter needs to allow digits and the # sign as well as letters and
    periods and spaces etc. Is there a way to better write these filters so that
    I can "define" what I consider allowable instead of filtering out what is
    bad? $name is allowed to have for instance /digits/letters/number
    sign/period/spaces/ but does not HAVE to contain them, any other charater
    would be detected as bad.

    My end goal will be creating a web form that will be secsure by not allowing
    bad stuff.

    Thank you all

    Robert
    Robert TV, Jun 25, 2004
    #1
    1. Advertising

  2. Robert TV

    Bob Walton Guest

    Robert TV wrote:

    > Hi, I am trying to learn the fine points of writing correct regex's to
    > untaint my data. I have gone through a few tutorials and I have a very basic
    > idea of their operations. I would like some assistance writing them
    > correctly.
    >
    > Example 1
    >
    > $name = "Jimmy Spenser";
    > # allow $name to only have letters or spaces by filtering out unwanted junk
    > if ($name =~ /\d|[\!\@\#\$\%\^\&\*\(\)\-\=\_\+]/;) {



    You'd better carefully read and study "perldoc perlre" -- that regexp
    isn't even close. It will match any string containing anywhere in it
    one of the characters: a digit, !, @, #, $, %, ^, &, *, (, ), -, =, _,
    +, but will fail to match many many other characters you probably don't
    want either, like all the control characters, ~, `, [, {, |, \, etc etc.
    If you wanted to match any string which contains a character that is
    not a letter or whitespace, you might try:

    if($name =~ /[^a-z\s]/i){

    But warning: that is not how to untaint stuff. Keep reading.


    > print "Bad"
    > } else {
    > print "Good";
    > }
    >



    Well, you want to design a regexp that will allow only what you want,
    not one that disallows specific stuff -- if you happen to neglect a
    disallow item, it would get through. So to have a regexp that matches
    only on all letters or whitespace, try:

    if($name =~ /^[a-z\s]*$/i){
    print "Good\n";
    }
    else{
    print "Bad\n";
    }

    In that regexp, the /i switch is used on the end to make it case
    insensitive (saves making the character class [a-zA-Z\s]). The ^
    anchors the start of the match at the beginning of the string so
    something like ***blah won't match, and the $ anchors the end of the
    match at the end of the string so something like blah*** won't match.
    Note that \s is a code for a regexp that matches any one single
    whitespace character.

    You should also read up on tainting (perldoc perlsec) where you will
    learn that you need to assign a variable's value from one of the $1, $2
    etc variables which result from a successful pattern match from a regexp
    containing parentheses groupings. This means something like:

    ...
    if($name =~ /^([a-z\s]*)$/i){
    $name=$1; #$name is now untainted
    }
    else{
    die "\$name had a bad value which I refuse to untaint: $name";
    }
    ...


    > Im sure the above is sloppy and right now your laughing. Also there are
    > other charaters that exist that were not included in the filter. It was my
    > goal to filter out and digits "\d" and all the trailing characters. I tried
    > $name =~ /\W/ but that wouldn't allow spaces. What is the best was to allow
    > $name to only have any case letters or spaces?
    >
    > Example 2
    >
    > $address = "#12 - 4243 Jones Street.";
    > # allow $address to only have letters, digits, the # sign or spaces by
    > filtering out unwanted junk
    > if ($name =~ /[\!\@\$\%\^\&\*\(\)\-\=\_\+]/;) {
    > print "Bad"
    > } else {
    > print "Good";
    > }
    >



    Again, write a regexp to match only on what you *want to permit*, like:

    if($name =~ /^([a-z\d#\s]*)$/i){
    $name=$1; #$name now untainted
    }
    else {
    die "I refuse to untaint this tainted crap: $name";
    }

    I note, though, that this will fail on your example string because it
    contains a period and a hyphen, neither of which is among your defined
    permitted characters above.


    > Now my filter needs to allow digits and the # sign as well as letters and
    > periods and spaces etc. Is there a way to better write these filters so that
    > I can "define" what I consider allowable instead of filtering out what is
    > bad? $name is allowed to have for instance /digits/letters/number
    > sign/period/spaces/ but does not HAVE to contain them, any other charater
    > would be detected as bad.
    >
    > My end goal will be creating a web form that will be secsure by not allowing
    > bad stuff.



    An admirable goal. Be sure to very carefully think through what you
    permit, as making a bad decision in your untainting regexp can leave
    security holes. Just the fact that Perl considers the data to be
    untainted does not mean it is secure -- that is up to your regexp. Perl
    helps you a lot by letting you know it is certain that you did pass the
    data through an untaining regexp.


    ....


    > Robert


    --
    Bob Walton
    Email: http://bwalton.com/cgi-bin/emailbob.pl
    Bob Walton, Jun 25, 2004
    #2
    1. Advertising

  3. In article <ILLCc.847056$Pk3.308032@pd7tw1no>,
    "Robert TV" <> wrote:

    > Hi, I am trying to learn the fine points of writing correct regex's to
    > untaint my data. I have gone through a few tutorials and I have a very basic
    > idea of their operations. I would like some assistance writing them
    > correctly.
    >
    > Example 1
    >
    > $name = "Jimmy Spenser";
    > # allow $name to only have letters or spaces by filtering out unwanted junk
    > if ($name =~ /\d|[\!\@\#\$\%\^\&\*\(\)\-\=\_\+]/;) {
    > print "Bad"
    > } else {
    > print "Good";
    > }
    >
    > Im sure the above is sloppy and right now your laughing. Also there are
    > other charaters that exist that were not included in the filter. It was my
    > goal to filter out and digits "\d" and all the trailing characters. I tried
    > $name =~ /\W/ but that wouldn't allow spaces. What is the best was to allow
    > $name to only have any case letters or spaces?


    Note the ^ as the first character in a character class negates the
    class, so:


    if ($name =~ /[^A-Za-z ]/) { print "Bad"}

    means "if name contains anything thats not [A-Za-z ]"

    >
    > Example 2
    >
    > $address = "#12 - 4243 Jones Street.";
    > # allow $address to only have letters, digits, the # sign or spaces by
    > filtering out unwanted junk
    > if ($name =~ /[\!\@\$\%\^\&\*\(\)\-\=\_\+]/;) {
    > print "Bad"
    > } else {
    > print "Good";
    > }


    if ($address=~ /[^0-9A-Za-z#. ]/) { print "Bad"}

    >
    > Now my filter needs to allow digits and the # sign as well as letters and
    > periods and spaces etc. Is there a way to better write these filters so that
    > I can "define" what I consider allowable instead of filtering out what is
    > bad? $name is allowed to have for instance /digits/letters/number
    > sign/period/spaces/ but does not HAVE to contain them, any other charater
    > would be detected as bad.


    See character classes in perlre

    perldoc perlre

    cheers,

    big
    --
    "I ran out of gas! I got a flat tire! I didn't have change for cab fare!
    I lost my tux at the cleaners! I locked my keys in the car! An old friend
    came in from out of town! Someone stole my car! There was an earthquake!
    A terrible flood! Locusts! It wasn't my fault I swear to god!" Jake Blues
    Iain Chalmers, Jun 25, 2004
    #3
  4. Robert TV

    Robert TV Guest

    "Bob Walton" <> wrote
    > An admirable goal. Be sure to very carefully think through what you
    > permit, as making a bad decision in your untainting regexp can leave
    > security holes. Just the fact that Perl considers the data to be
    > untainted does not mean it is secure -- that is up to your regexp. Perl
    > helps you a lot by letting you know it is certain that you did pass the
    > data through an untaining regexp.


    Thank you Bob, that was an excellent reply, your suggestions and advice will
    be of great value in my learning process. I really appreciate your
    assistance.

    Robert
    Robert TV, Jun 25, 2004
    #4
  5. Robert TV

    Daedalus Guest

    > > Now my filter needs to allow digits and the # sign as well as letters
    and
    > > periods and spaces etc. Is there a way to better write these filters so

    that
    > > I can "define" what I consider allowable instead of filtering out what

    is
    > > bad? $name is allowed to have for instance /digits/letters/number
    > > sign/period/spaces/ but does not HAVE to contain them, any other

    charater
    > > would be detected as bad.
    > >
    > > My end goal will be creating a web form that will be secsure by not

    allowing
    > > bad stuff.

    >
    >
    > An admirable goal. Be sure to very carefully think through what you
    > permit, as making a bad decision in your untainting regexp can leave
    > security holes. Just the fact that Perl considers the data to be
    > untainted does not mean it is secure -- that is up to your regexp. Perl
    > helps you a lot by letting you know it is certain that you did pass the
    > data through an untaining regexp.
    >


    It might be a good idea to make a more precise regexp when permitting
    special caracter, specifying where it can be used in the string rather than
    just permit it within a class.

    DAE
    Daedalus, Jun 25, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Frank Wilson
    Replies:
    6
    Views:
    1,202
    Tom Wild
    Aug 14, 2003
  2. Gary Wilson Jr

    trying to use swig for the first time

    Gary Wilson Jr, Jan 23, 2006, in forum: Python
    Replies:
    0
    Views:
    287
    Gary Wilson Jr
    Jan 23, 2006
  3. Replies:
    3
    Views:
    716
    Reedick, Andrew
    Jul 1, 2008
  4. manu
    Replies:
    4
    Views:
    243
  5. Jesse Crockett
    Replies:
    2
    Views:
    100
    Jesse Crockett
    Jul 11, 2008
Loading...

Share This Page