matching all perldoc names but no more

Discussion in 'Perl Misc' started by wana, Nov 6, 2004.

  1. wana

    wana Guest

    I was getting carried away answering myself in another thread so I thought I
    should purify my actual problem:

    I am allowing a user to enter a perldoc name and I will run 'perldoc $name'
    for them.

    What regex will match all perldoc names but not allow for a command to be
    slipped into the name.

    for example, here is my latest:

    /^[a-zA-Z1-9\:]+$/

    if you allowed just anything:

    /.*/

    a user could enter 'perlref | rm -r ./*' or something like that.

    previous attempts:

    /^[a-z]+$/

    seemed perfect but left out perlfaq1-9

    /^[a-z1-9]+$/

    left out CGI and other ones with caps.

    Is there a rule for all current and future perldoc names? I mean, they
    can't possible have a | or a > in their name or even a space in the middle,
    right?

    wana
     
    wana, Nov 6, 2004
    #1
    1. Advertising

  2. wana <> wrote:

    > I am allowing a user to enter a perldoc name and I will run 'perldoc $name'
    > for them.
    >
    > What regex will match all perldoc names but not allow for a command to be
    > slipped into the name.



    You won't need to solve that problem if you choose an approach
    that does not require solving that problem. :)

    If they can only look up the std docs, then build a lookup table
    of the actual installed std docs, see code below.

    Or maybe process the =head2 POD tags in perltoc.pod for legal names.

    I think this ought to work though: /^(\w|::)+$/

    (leaving out single quote on purpose since it is deprecated.)


    ---------------------------------
    #!/usr/bin/perl
    use warnings;
    use strict;

    foreach my $pod ( 'foo bar', qw/ perlnope perl perltoc perlfunc / ) {
    if ( is_pod($pod) )
    { print "$pod is a POD\n" }
    else
    { print "$pod is *not* a POD\n" }
    }


    BEGIN {
    my %pods;

    chomp( my $dir = qx/ perldoc -l perlfunc / );
    $dir =~ s#/[^/]+$##; # should use File::Basename here...

    opendir POD, $dir or die "could not open '$dir' directory $!";
    $pods{ $_ } = 1 for map { s/.pod$// ? $_ : () } readdir POD;
    closedir POD;

    sub is_pod { exists $pods{ $_[0] } ? 1 : 0 }
    }
    ---------------------------------


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Nov 6, 2004
    #2
    1. Advertising

  3. wana <> wrote in
    news::

    > I was getting carried away answering myself in another thread so I
    > thought I should purify my actual problem:
    >
    > I am allowing a user to enter a perldoc name and I will run 'perldoc
    > $name' for them.


    I thinking you are going down the wrong road. You know exactly the list of
    phrases you want to allow. Why don't you just restrict the options to that.
    Even if you do not have Perl on your computer, it is not hard to write
    script to parse the output of perldoc perltoc. That will give you the list
    of allowable phrases. Now, you can make sure the phrase sent to your CGI
    matches only one of those in the set of allowable perldoc arguments.

    Sinan
     
    A. Sinan Unur, Nov 6, 2004
    #3
  4. wana

    wana Guest

    Tad McClellan wrote:

    > wana <> wrote:
    >
    >> I am allowing a user to enter a perldoc name and I will run 'perldoc
    >> $name' for them.
    >>
    >> What regex will match all perldoc names but not allow for a command to be
    >> slipped into the name.

    >
    >
    > You won't need to solve that problem if you choose an approach
    > that does not require solving that problem. :)
    >
    > If they can only look up the std docs, then build a lookup table
    > of the actual installed std docs, see code below.
    >
    > Or maybe process the =head2 POD tags in perltoc.pod for legal names.
    >
    > I think this ought to work though: /^(\w|::)+$/


    I only avoided \w because perlre states that it is not portable across
    character sets and may be insecure, which is critical in my case. That may
    or may not be an issue in my program.

    wana

    >
    > (leaving out single quote on purpose since it is deprecated.)
    >
    >
    > ---------------------------------
    > #!/usr/bin/perl
    > use warnings;
    > use strict;
    >
    > foreach my $pod ( 'foo bar', qw/ perlnope perl perltoc perlfunc / ) {
    > if ( is_pod($pod) )
    > { print "$pod is a POD\n" }
    > else
    > { print "$pod is *not* a POD\n" }
    > }
    >
    >
    > BEGIN {
    > my %pods;
    >
    > chomp( my $dir = qx/ perldoc -l perlfunc / );
    > $dir =~ s#/[^/]+$##; # should use File::Basename here...
    >
    > opendir POD, $dir or die "could not open '$dir' directory $!";
    > $pods{ $_ } = 1 for map { s/.pod$// ? $_ : () } readdir POD;
    > closedir POD;
    >
    > sub is_pod { exists $pods{ $_[0] } ? 1 : 0 }
    > }
    > ---------------------------------
    >
    >
     
    wana, Nov 6, 2004
    #4
  5. wana

    wana Guest

    Jim Gibson wrote:

    > In article <>, wana
    > <> wrote:
    >
    >> Tad McClellan wrote:
    >>
    >> > wana <> wrote:
    >> >

    >
    > [ problem of untainting perldoc subjects snipped ]
    >
    >> >
    >> > I think this ought to work though: /^(\w|::)+$/

    >>
    >> I only avoided \w because perlre states that it is not portable across
    >> character sets and may be insecure, which is critical in my case. That
    >> may or may not be an issue in my program.

    >
    > Where in perldoc perlre does it say that? It does not say it in the
    > version (5.8.5) on my computer. I could not find the string 'insecure'
    > anywhere in 'perldoc perlre', and 'portable' only occurs once in a
    > discussion of character ranges.


    The words to look for are 'unsafe' and 'unportable' about 78% into perlre.
    The discussion about character ranges is what I am talking about.
    [a-zA-Z1-9] is safe but \w may vary in different locales.

    wana
     
    wana, Nov 8, 2004
    #5
  6. On Mon, 8 Nov 2004, wana wrote:

    > Jim Gibson wrote:
    >
    > > In article <>, wana


    > >> I only avoided \w because perlre states that it is not portable
    > >> across character sets and may be insecure, which is critical in
    > >> my case. That may or may not be an issue in my program.


    That depends on what you mean by "insecure".

    > > Where in perldoc perlre does it say that? It does not say it in
    > > the version (5.8.5) on my computer. I could not find the string
    > > 'insecure' anywhere in 'perldoc perlre', and 'portable' only
    > > occurs once in a discussion of character ranges.

    >
    > The words to look for are 'unsafe' and 'unportable' about 78% into perlre.


    I don't read that as being about "security" (in the usual meaning of
    that term)...

    > The discussion about character ranges is what I am talking about.
    > [a-zA-Z1-9] is safe


    It'll reliably do a specific job. I'd suggest that the use of the
    word "unsafe" in the documentation is a bit misleading. I think in
    this specific reference it means "might not do what the naive reader
    expects"; but "unsafe" often refers to the possibility of malicious
    data causing security-relevant damage to result (such as, for example,
    unintended interpolation taking place using externally-derived data),
    and that's not what is intended here, AFAICS.

    > but \w may vary in different locales.


    Which, in some situations, might be exactly what one wants.

    all the best
     
    Alan J. Flavell, Nov 8, 2004
    #6
  7. wana

    Ben Morrow Guest

    Quoth "Alan J. Flavell" <>:
    > On Mon, 8 Nov 2004, wana wrote:
    > > Jim Gibson wrote:
    > >
    > > > In article <>, wana

    >
    > > >> I only avoided \w because perlre states that it is not portable
    > > >> across character sets and may be insecure, which is critical in
    > > >> my case. That may or may not be an issue in my program.

    >

    <snip>
    >
    > It'll reliably do a specific job. I'd suggest that the use of the
    > word "unsafe" in the documentation is a bit misleading. I think in
    > this specific reference it means "might not do what the naive reader
    > expects"; but "unsafe" often refers to the possibility of malicious
    > data causing security-relevant damage to result (such as, for example,
    > unintended interpolation taking place using externally-derived data),
    > and that's not what is intended here, AFAICS.


    The locale is externally-derived data. A malicious user could (under
    some OSen at least) construct their own locale that said ';' was a word
    character.

    I would hope (but I haven't tested) that if 'use locale' is in effect
    and the locale setting was tainted then such regexen won't untaint...
    One can always secure things by explicitly asking for the C locale, or
    simply not using 'locale', which will cause \w to match what you expect.

    > > but \w may vary in different locales.

    >
    > Which, in some situations, might be exactly what one wants.


    Of course, but not when dealing with shell metachars.

    Ben

    --
    "The Earth is degenerating these days. Bribery and corruption abound.
    Children no longer mind their parents, every man wants to write a book,
    and it is evident that the end of the world is fast approaching."
    -Assyrian stone tablet, c.2800 BC
     
    Ben Morrow, Nov 9, 2004
    #7
  8. wana

    wana Guest

    Jim Gibson wrote:

    > In article <>, wana
    > <> wrote:
    >
    >> Jim Gibson wrote:
    >>
    >> > In article <>, wana
    >> > <> wrote:
    >> >
    >> >> Tad McClellan wrote:
    >> >>
    >> >> > wana <> wrote:
    >> >> >
    >> >
    >> > [ problem of untainting perldoc subjects snipped ]
    >> >
    >> >> >
    >> >> > I think this ought to work though:    ^(\w|::)+$
    >> >>
    >> >> I only avoided \w because perlre states that it is not portable across
    >> >> character sets and may be insecure, which is critical in my case.
    >> >> That may or may not be an issue in my program.
    >> >
    >> > Where in perldoc perlre does it say that? It does not say it in the
    >> > version (5.8.5) on my computer. I could not find the string 'insecure'
    >> > anywhere in 'perldoc perlre', and 'portable' only occurs once in a
    >> > discussion of character ranges.

    >>
    >> The words to look for are 'unsafe' and 'unportable' about 78% into
    >> perlre. The discussion about character ranges is what I am talking about.
    >> [a-zA-Z1-9] is safe but \w may vary in different locales.

    >
    > The warning is about defining your own character ranges, such as [ -~]
    > for the ascii printable set. That may give an error in other character
    > sets. The doc says nothing about character classes such as \w being
    > unsafe or unportable across character sets. In fact, it implies that
    > using \w is safer than defining your own character sets.
    >
    > Here it is from perlre:
    >
    > "Note also that the whole range idea is rather unportable between char-
    > acter sets--and even within character sets they may cause results you
    > probably didn't expect.  A sound principle is to use only ranges that
    > begin from and end at either alphabets of equal case ([a-e], [A-E]),  or
    > digits ([0-9]).  Anything else is unsafe.  If in doubt, spell out the
    > character sets in full."


     for example:

    $comm = $ARGV[0];
    if ($comm =~ /^\w+/$) # the same as ^[a-zA-Z1-9_]+$
    {
            `echo $comm`
    }

    this prevents a user from slipping in dangerous characters like | or >
    etc...

    Suppose a new character set comes along and is described by a different
    locale.  Then suppose this code is cut&paste or included otherwise within
    the new locale which has a character in its alphabet that the shell
    interpretes as | for example.  Now there is a security compromise, hence it
    is insecure and unsafe.  I don't know if this is possible, but that's what
    I read into the statement in perlre.  If this is possible, it is clearly a
    potential, though unlikely, security risk.  I believe perlsec touches
    briefly on the same subject.

    wana
     
    wana, Nov 9, 2004
    #8
  9. On Tue, 9 Nov 2004, Ben Morrow wrote:

    > Quoth "Alan J. Flavell" <>:
    > >
    > > It'll reliably do a specific job. I'd suggest that the use of the
    > > word "unsafe" in the documentation is a bit misleading. I think in
    > > this specific reference it means "might not do what the naive reader
    > > expects"; but "unsafe" often refers to the possibility of malicious
    > > data causing security-relevant damage to result (such as, for example,
    > > unintended interpolation taking place using externally-derived data),
    > > and that's not what is intended here, AFAICS.

    >
    > The locale is externally-derived data. A malicious user could (under
    > some OSen at least) construct their own locale that said ';' was a word
    > character.


    Good call. I withdraw the comment.

    > I would hope (but I haven't tested) that if 'use locale' is in effect
    > and the locale setting was tainted then such regexen won't untaint...


    Let's hope so.

    > > > but \w may vary in different locales.

    > >
    > > Which, in some situations, might be exactly what one wants.

    >
    > Of course, but not when dealing with shell metachars.


    I take it you were commenting here on the specific problem, rather
    than on the cited documentation as such.

    cheers
     
    Alan J. Flavell, Nov 9, 2004
    #9
  10. On Tue, 9 Nov 2004, wana wrote:

    > Alan J. Flavell wrote:

    [snip]
    > > Good call. I withdraw the comment.

    [snip]

    > Thanks to all for further discussion. I still think that the security issue
    > with tainted data is at least partly the intent of this paragraph in
    > perlre.


    Just so. That's why I accepted that my comment had been misguided.

    > I mentioned that the \w topic is also discussed in perlsec:


    [...]

    > The second paragraph makes it clear that this is the issue. It is
    > really not a big deal and on the outer fringes of my perl knowledge
    > as a newbie and an amateur. I just wanted to make my point that
    > what I read in perlre meant what I thought it meant. At least I am
    > finally reading my perldocs before posting!


    Absolutely. My apologies that I missed this point the first time
    around. It'll remind me to check the documentation properly myself
    instead of just skim-reading it.

    Umble pie for tea today...

    cheers
     
    Alan J. Flavell, Nov 9, 2004
    #10
  11. wana

    wana Guest

    Alan J. Flavell wrote:

    > On Tue, 9 Nov 2004, Ben Morrow wrote:
    >
    >> Quoth "Alan J. Flavell" <>:
    >> >
    >> > It'll reliably do a specific job. I'd suggest that the use of the
    >> > word "unsafe" in the documentation is a bit misleading. I think in
    >> > this specific reference it means "might not do what the naive reader
    >> > expects"; but "unsafe" often refers to the possibility of malicious
    >> > data causing security-relevant damage to result (such as, for example,
    >> > unintended interpolation taking place using externally-derived data),
    >> > and that's not what is intended here, AFAICS.

    >>
    >> The locale is externally-derived data. A malicious user could (under
    >> some OSen at least) construct their own locale that said ';' was a word
    >> character.

    >
    > Good call. I withdraw the comment.
    >
    >> I would hope (but I haven't tested) that if 'use locale' is in effect
    >> and the locale setting was tainted then such regexen won't untaint...

    >
    > Let's hope so.
    >
    >> > > but \w may vary in different locales.
    >> >
    >> > Which, in some situations, might be exactly what one wants.

    >>
    >> Of course, but not when dealing with shell metachars.

    >
    > I take it you were commenting here on the specific problem, rather
    > than on the cited documentation as such.
    >
    > cheers


    Thanks to all for further discussion. I still think that the security issue
    with tainted data is at least partly the intent of this paragraph in
    perlre. I mentioned that the \w topic is also discussed in perlsec:

    This is fairly secure because "/\w+/" doesn’t normally
    match shell metacharacters, nor are dot, dash, or at going
    to mean something special to the shell. Use of "/.+/"
    would have been insecure in theory because it lets every­
    thing through, but Perl doesn’t check for that. The les­
    son is that when untainting, you must be exceedingly care­
    ful with your patterns. Laundering data using regular
    expression is the only mechanism for untainting dirty
    data, unless you use the strategy detailed below to fork a
    child of lesser privilege.

    The example does not untaint $data if "use locale" is in
    effect, because the characters matched by "\w" are deter­
    mined by the locale. Perl considers that locale defini­
    tions are untrustworthy because they contain data from
    outside the program. If you are writing a localeâ€aware
    program, and want to launder data with a regular expres­
    sion containing "\w", put "no locale" ahead of the expres­
    sion in the same block. See "SECURITY" in perllocale for
    further discussion and examples.

    The second paragraph makes it clear that this is the issue. It is really
    not a big deal and on the outer fringes of my perl knowledge as a newbie
    and an amateur. I just wanted to make my point that what I read in perlre
    meant what I thought it meant. At least I am finally reading my perldocs
    before posting!

    wana
     
    wana, Nov 9, 2004
    #11
  12. wana

    Anno Siegel Guest

    Alan J. Flavell <> wrote in comp.lang.perl.misc:
    > On Tue, 9 Nov 2004, Ben Morrow wrote:


    > > The locale is externally-derived data. A malicious user could (under
    > > some OSen at least) construct their own locale that said ';' was a word
    > > character.

    >
    > Good call. I withdraw the comment.
    >
    > > I would hope (but I haven't tested) that if 'use locale' is in effect
    > > and the locale setting was tainted then such regexen won't untaint...

    >
    > Let's hope so.


    I don't find it in the documentation, but a test shows that a regex
    that is itself tainted (i.e. interpolates a tainted string) doesn't
    launder tainted data.

    Anno
     
    Anno Siegel, Nov 10, 2004
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Carl
    Replies:
    0
    Views:
    532
  2. Michael
    Replies:
    4
    Views:
    438
    Matt Hammond
    Jun 26, 2006
  3. Robert Klemme

    With a Ruby Yell: more, more more!

    Robert Klemme, Sep 28, 2005, in forum: Ruby
    Replies:
    5
    Views:
    224
    Jeff Wood
    Sep 29, 2005
  4. Upstart
    Replies:
    1
    Views:
    103
    Tad McClellan
    Aug 11, 2003
  5. Marc Bissonnette

    Pattern matching : not matching problem

    Marc Bissonnette, Jan 8, 2004, in forum: Perl Misc
    Replies:
    9
    Views:
    244
    Marc Bissonnette
    Jan 13, 2004
Loading...

Share This Page