matching all perldoc names but no more

W

wana

I was getting carried away answering myself in another thread so I thought I
should purify my actual problem:

I am allowing a user to enter a perldoc name and I will run 'perldoc $name'
for them.

What regex will match all perldoc names but not allow for a command to be
slipped into the name.

for example, here is my latest:

/^[a-zA-Z1-9\:]+$/

if you allowed just anything:

/.*/

a user could enter 'perlref | rm -r ./*' or something like that.

previous attempts:

/^[a-z]+$/

seemed perfect but left out perlfaq1-9

/^[a-z1-9]+$/

left out CGI and other ones with caps.

Is there a rule for all current and future perldoc names? I mean, they
can't possible have a | or a > in their name or even a space in the middle,
right?

wana
 
T

Tad McClellan

wana said:
I am allowing a user to enter a perldoc name and I will run 'perldoc $name'
for them.

What regex will match all perldoc names but not allow for a command to be
slipped into the name.


You won't need to solve that problem if you choose an approach
that does not require solving that problem. :)

If they can only look up the std docs, then build a lookup table
of the actual installed std docs, see code below.

Or maybe process the =head2 POD tags in perltoc.pod for legal names.

I think this ought to work though: /^(\w|::)+$/

(leaving out single quote on purpose since it is deprecated.)


---------------------------------
#!/usr/bin/perl
use warnings;
use strict;

foreach my $pod ( 'foo bar', qw/ perlnope perl perltoc perlfunc / ) {
if ( is_pod($pod) )
{ print "$pod is a POD\n" }
else
{ print "$pod is *not* a POD\n" }
}


BEGIN {
my %pods;

chomp( my $dir = qx/ perldoc -l perlfunc / );
$dir =~ s#/[^/]+$##; # should use File::Basename here...

opendir POD, $dir or die "could not open '$dir' directory $!";
$pods{ $_ } = 1 for map { s/.pod$// ? $_ : () } readdir POD;
closedir POD;

sub is_pod { exists $pods{ $_[0] } ? 1 : 0 }
}
 
A

A. Sinan Unur

I was getting carried away answering myself in another thread so I
thought I should purify my actual problem:

I am allowing a user to enter a perldoc name and I will run 'perldoc
$name' for them.

I thinking you are going down the wrong road. You know exactly the list of
phrases you want to allow. Why don't you just restrict the options to that.
Even if you do not have Perl on your computer, it is not hard to write
script to parse the output of perldoc perltoc. That will give you the list
of allowable phrases. Now, you can make sure the phrase sent to your CGI
matches only one of those in the set of allowable perldoc arguments.

Sinan
 
W

wana

Tad said:
You won't need to solve that problem if you choose an approach
that does not require solving that problem. :)

If they can only look up the std docs, then build a lookup table
of the actual installed std docs, see code below.

Or maybe process the =head2 POD tags in perltoc.pod for legal names.

I think this ought to work though: /^(\w|::)+$/

I only avoided \w because perlre states that it is not portable across
character sets and may be insecure, which is critical in my case. That may
or may not be an issue in my program.

wana
(leaving out single quote on purpose since it is deprecated.)


---------------------------------
#!/usr/bin/perl
use warnings;
use strict;

foreach my $pod ( 'foo bar', qw/ perlnope perl perltoc perlfunc / ) {
if ( is_pod($pod) )
{ print "$pod is a POD\n" }
else
{ print "$pod is *not* a POD\n" }
}


BEGIN {
my %pods;

chomp( my $dir = qx/ perldoc -l perlfunc / );
$dir =~ s#/[^/]+$##; # should use File::Basename here...

opendir POD, $dir or die "could not open '$dir' directory $!";
$pods{ $_ } = 1 for map { s/.pod$// ? $_ : () } readdir POD;
closedir POD;

sub is_pod { exists $pods{ $_[0] } ? 1 : 0 }
}
 
W

wana

Jim said:
[ problem of untainting perldoc subjects snipped ]
I only avoided \w because perlre states that it is not portable across
character sets and may be insecure, which is critical in my case. That
may or may not be an issue in my program.

Where in perldoc perlre does it say that? It does not say it in the
version (5.8.5) on my computer. I could not find the string 'insecure'
anywhere in 'perldoc perlre', and 'portable' only occurs once in a
discussion of character ranges.

The words to look for are 'unsafe' and 'unportable' about 78% into perlre.
The discussion about character ranges is what I am talking about.
[a-zA-Z1-9] is safe but \w may vary in different locales.

wana
 
A

Alan J. Flavell

Jim said:
That depends on what you mean by "insecure".
The words to look for are 'unsafe' and 'unportable' about 78% into perlre.

I don't read that as being about "security" (in the usual meaning of
that term)...
The discussion about character ranges is what I am talking about.
[a-zA-Z1-9] is safe

It'll reliably do a specific job. I'd suggest that the use of the
word "unsafe" in the documentation is a bit misleading. I think in
this specific reference it means "might not do what the naive reader
expects"; but "unsafe" often refers to the possibility of malicious
data causing security-relevant damage to result (such as, for example,
unintended interpolation taking place using externally-derived data),
and that's not what is intended here, AFAICS.
but \w may vary in different locales.

Which, in some situations, might be exactly what one wants.

all the best
 
B

Ben Morrow

Quoth "Alan J. Flavell said:
It'll reliably do a specific job. I'd suggest that the use of the
word "unsafe" in the documentation is a bit misleading. I think in
this specific reference it means "might not do what the naive reader
expects"; but "unsafe" often refers to the possibility of malicious
data causing security-relevant damage to result (such as, for example,
unintended interpolation taking place using externally-derived data),
and that's not what is intended here, AFAICS.

The locale is externally-derived data. A malicious user could (under
some OSen at least) construct their own locale that said ';' was a word
character.

I would hope (but I haven't tested) that if 'use locale' is in effect
and the locale setting was tainted then such regexen won't untaint...
One can always secure things by explicitly asking for the C locale, or
simply not using 'locale', which will cause \w to match what you expect.
Which, in some situations, might be exactly what one wants.

Of course, but not when dealing with shell metachars.

Ben
 
W

wana

Jim said:
wana said:
Jim said:
Tad McClellan wrote:



[ problem of untainting perldoc subjects snipped ]


I think this ought to work though:    ^(\w|::)+$

I only avoided \w because perlre states that it is not portable across
character sets and may be insecure, which is critical in my case.
That may or may not be an issue in my program.

Where in perldoc perlre does it say that? It does not say it in the
version (5.8.5) on my computer. I could not find the string 'insecure'
anywhere in 'perldoc perlre', and 'portable' only occurs once in a
discussion of character ranges.

The words to look for are 'unsafe' and 'unportable' about 78% into
perlre. The discussion about character ranges is what I am talking about.
[a-zA-Z1-9] is safe but \w may vary in different locales.

The warning is about defining your own character ranges, such as [ -~]
for the ascii printable set. That may give an error in other character
sets. The doc says nothing about character classes such as \w being
unsafe or unportable across character sets. In fact, it implies that
using \w is safer than defining your own character sets.

Here it is from perlre:

"Note also that the whole range idea is rather unportable between char-
acter sets--and even within character sets they may cause results you
probably didn't expect.  A sound principle is to use only ranges that
begin from and end at either alphabets of equal case ([a-e], [A-E]),  or
digits ([0-9]).  Anything else is unsafe.  If in doubt, spell out the
character sets in full."

 for example:

$comm = $ARGV[0];
if ($comm =~ /^\w+/$) # the same as ^[a-zA-Z1-9_]+$
{
        `echo $comm`
}

this prevents a user from slipping in dangerous characters like | or >
etc...

Suppose a new character set comes along and is described by a different
locale.  Then suppose this code is cut&paste or included otherwise within
the new locale which has a character in its alphabet that the shell
interpretes as | for example.  Now there is a security compromise, hence it
is insecure and unsafe.  I don't know if this is possible, but that's what
I read into the statement in perlre.  If this is possible, it is clearly a
potential, though unlikely, security risk.  I believe perlsec touches
briefly on the same subject.

wana
 
A

Alan J. Flavell

The locale is externally-derived data. A malicious user could (under
some OSen at least) construct their own locale that said ';' was a word
character.

Good call. I withdraw the comment.
I would hope (but I haven't tested) that if 'use locale' is in effect
and the locale setting was tainted then such regexen won't untaint...

Let's hope so.
Of course, but not when dealing with shell metachars.

I take it you were commenting here on the specific problem, rather
than on the cited documentation as such.

cheers
 
A

Alan J. Flavell

Alan J. Flavell wrote: [snip]
Good call. I withdraw the comment.
[snip]

Thanks to all for further discussion. I still think that the security issue
with tainted data is at least partly the intent of this paragraph in
perlre.

Just so. That's why I accepted that my comment had been misguided.
I mentioned that the \w topic is also discussed in perlsec:
[...]

The second paragraph makes it clear that this is the issue. It is
really not a big deal and on the outer fringes of my perl knowledge
as a newbie and an amateur. I just wanted to make my point that
what I read in perlre meant what I thought it meant. At least I am
finally reading my perldocs before posting!

Absolutely. My apologies that I missed this point the first time
around. It'll remind me to check the documentation properly myself
instead of just skim-reading it.

Umble pie for tea today...

cheers
 
W

wana

Alan said:
Good call. I withdraw the comment.


Let's hope so.


I take it you were commenting here on the specific problem, rather
than on the cited documentation as such.

cheers

Thanks to all for further discussion. I still think that the security issue
with tainted data is at least partly the intent of this paragraph in
perlre. I mentioned that the \w topic is also discussed in perlsec:

This is fairly secure because "/\w+/" doesn’t normally
match shell metacharacters, nor are dot, dash, or at going
to mean something special to the shell. Use of "/.+/"
would have been insecure in theory because it lets every­
thing through, but Perl doesn’t check for that. The les­
son is that when untainting, you must be exceedingly care­
ful with your patterns. Laundering data using regular
expression is the only mechanism for untainting dirty
data, unless you use the strategy detailed below to fork a
child of lesser privilege.

The example does not untaint $data if "use locale" is in
effect, because the characters matched by "\w" are deter­
mined by the locale. Perl considers that locale defini­
tions are untrustworthy because they contain data from
outside the program. If you are writing a localeâ€aware
program, and want to launder data with a regular expres­
sion containing "\w", put "no locale" ahead of the expres­
sion in the same block. See "SECURITY" in perllocale for
further discussion and examples.

The second paragraph makes it clear that this is the issue. It is really
not a big deal and on the outer fringes of my perl knowledge as a newbie
and an amateur. I just wanted to make my point that what I read in perlre
meant what I thought it meant. At least I am finally reading my perldocs
before posting!

wana
 
A

Anno Siegel

Alan J. Flavell said:
Good call. I withdraw the comment.


Let's hope so.

I don't find it in the documentation, but a test shows that a regex
that is itself tainted (i.e. interpolates a tainted string) doesn't
launder tainted data.

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top