RegExp to validate an MVS dataset name

A

anothermindbomb

Hello,

Brand new to RegExp's this afternoon, so forgive the mess I'm about to
present...

I've been trying to write a regular expression to validate an MVS
dataset name - a file, essentially, on an MVS mainframe. The naming
standards for datasets runs like this (straight from the IBM manual,
I'm afraid)...

A data set name can be one name segment, or a series of joined name
segments. Each name segment represents a level of qualification. For
example, the data set name DEPT58.SMITH.DATA3 is composed of three name
segments. The first name on the left is called the high-level
qualifier, the last is the low-level qualifier.

Each name segment (qualifier) is 1 to 8 characters, the first of which
must be alphabetic (A to Z) or national (# @ $). The remaining 7
characters are either alphabetic, numeric (0-9), national, or a hyphen
(-).

The period (.) separates name segments from each other. Including all
name segments and periods, the length of the data set name must not
exceed 44 characters. Thus, a maximum of 22 name segments can make up a
data set name.

So, SYS1.ISPPLIB is perfectly ok, 6TEST.FILE is bad (due to the leading
'6'), PROD.DATASET.D230206 is fine and TEST.QUALIFIED.DATASET is bad
(QUALIFIED > 8 characters)

So far, I've come up with a ^([A-Z#@\$]{1}[\w#@\$\-]{1,7})

^([A-Z#@\$]{1} / # deal with the first character being alpha or
national
[\w#@\$\-]{1,7}) # followed by up to 7 characters for the rest of
the first qualifier.

I now need to possibly match a period and then the rules for the first
qualifier all over again (hence the brackets in my existing regexp).
How do I do it?

If I code ^([A-Z#@\$]{1}[\w#@\$\-]{1,7})\. to match the period how can
I referback to the rule I've just written to say "and there may be some
more of this lot coming up"... I've been playing with "The RegExp
Coach" (http://weitz.de/files/regex-coach.exe) to experiment and test
my musings but when I attempt to code a \0 to refer to the capturing
group I've just made, it doesn't seem to do what I expect. I'm assuming
that's because what I'm expecting is utterly incorrect!

BTW, I'm not concerned about checking the length of the string I'm
matching being less than 44 characters (I don't know if that's even
possible in a regexp!) - I can do that outside of this bit of code.


I have a feeling that what I'm attempting to do isn't particularly
hard, but I'm struggling to make further progress after matching the
first part of my string... a prod in the right direction, rather than a
solution would be most welcome - I'd like to learn how to work this out
for myself rather than have someone present me with the answer.

Cheers,

Steve,.
 
A

A. Sinan Unur

(e-mail address removed) wrote in @e56g2000cwe.googlegroups.com:
A data set name can be one name segment, or a series of joined name
segments. Each name segment represents a level of qualification. For
example, the data set name DEPT58.SMITH.DATA3 is composed of three
name segments. The first name on the left is called the high-level
qualifier, the last is the low-level qualifier.

Each name segment (qualifier) is 1 to 8 characters, the first of which
must be alphabetic (A to Z) or national (# @ $). The remaining 7
characters are either alphabetic, numeric (0-9), national, or a hyphen
(-).

The period (.) separates name segments from each other.
....
So, SYS1.ISPPLIB is perfectly ok, 6TEST.FILE is bad (due to the
leading '6'), PROD.DATASET.D230206 is fine and TEST.QUALIFIED.DATASET
is bad (QUALIFIED > 8 characters)

I would split the name into individual segments, and check if each
segment is valid:

#!/usr/bin/perl

use strict;
use warnings;

use constant MAX_DATASET_NAME_LENGTH => 44;

DATASET_NAME: while ( my $name = <DATA> ) {
chomp $name;
next DATASET_NAME unless length $name
and length $name < MAX_DATASET_NAME_LENGTH;

my @segments = split m{ \. }x, $name;
my $valid_segments = 0;

SEGMENT: for my $segment ( @segments) {
if ( $segment =~ m{ \A [A-Z][A-Z0-9]{0,7} \z }x ) {
++$valid_segments;
} else {
last SEGMENT;
}
}

if ( $valid_segments == @segments ) {
print "VALID: $name\n";
}
}

__DATA__
SYS1.ISPPLIB
6TEST.FILE
TEST.QUALIFIED.DATASET
PROD.DATASET.D230206
 
E

Eric Schwartz

Brand new to RegExp's this afternoon, so forgive the mess I'm about to
present...

You've done better than many already by providing a useful description
of your problem and some sample code.
I've been trying to write a regular expression to validate an MVS
dataset name - a file, essentially, on an MVS mainframe. The naming
standards for datasets runs like this (straight from the IBM manual,
I'm afraid)...

Taking a page from the Perl Best Practises course I took from Damian
Conway a while back, I'm going to build up your regex as a series of
smaller ones. This is helpful because it's often easy to describe
little parts, and how those parts come together to form big pieces,
but often you can get lost in trying to view the trees for the forest
if you'll excuse the horrible metaphor.

I'm also going to re-order your paragraphs a bit to allow for
graduated understanding.
Each name segment (qualifier) is 1 to 8 characters, the first of which
must be alphabetic (A to Z) or national (# @ $). The remaining 7
characters are either alphabetic, numeric (0-9), national, or a hyphen
(-).

From that description, a segment looks like:

my $SEGMENT = qr/ [A-Z#@$] # matches all legal first chars

(?:[-A-Z\d#@$]) # use non-grouping parens (?:) to
# match valid characters for
# non-first chars.

{0,7} # Allow from 0 to 7 of non-first chars.

/x;
A data set name can be one name segment, or a series of joined name
segments. Each name segment represents a level of qualification. For
example, the data set name DEPT58.SMITH.DATA3 is composed of three name
segments. The first name on the left is called the high-level
qualifier, the last is the low-level qualifier.

The period (.) separates name segments from each other.


my $DATA_SET_NAME = qr/^ $SEGMENT

# one segment at the beginning of the
# string is required.

(?: \. $SEGMENT)*

# optionally, follow that with a literal '.'
# and any number of segments, including 0.

$
# also, anchor the end. This is important,
# because $SEGMENT will match the '.ABCDEFGH'
# part of '.ABCDEFGHIJ'. If you anchor the
# end, though, $SEGMENT can't match there.
/x;

Including all name segments and periods, the length of the data set
name must not exceed 44 characters. Thus, a maximum of 22 name
segments can make up a data set name.

This is going to be more easily treated by a simple test, I think--
I'd just do

die "Data Set name is too long!" if length($name) > 44;

if it were me. You could also replace the * above with {0,22}, but
that won't tell you if you have 22 segments that are each 8 characters
long.
BTW, I'm not concerned about checking the length of the string I'm
matching being less than 44 characters (I don't know if that's even
possible in a regexp!)

It is, it just doesn't read as well:

/^.{0,44}$/ vs 'length($name) <= 44';
I have a feeling that what I'm attempting to do isn't particularly
hard, but I'm struggling to make further progress after matching the
first part of my string... a prod in the right direction, rather than a
solution would be most welcome - I'd like to learn how to work this out
for myself rather than have someone present me with the answer.

Well, I gave you *an* answer, sorry about that, but your problem
interested me enough. Hope it helps you figure out how to subdivide
your future problems appropriately, though.

-=Eric
 
B

Brian Helterline

A. Sinan Unur said:
(e-mail address removed) wrote in @e56g2000cwe.googlegroups.com:

Each name segment (qualifier) is 1 to 8 characters, the first of which
must be alphabetic (A to Z) or national (# @ $). The remaining 7
characters are either alphabetic, numeric (0-9), national, or a hyphen
(-).

The period (.) separates name segments from each other.


...

if ( $segment =~ m{ \A [A-Z][A-Z0-9]{0,7} \z }x ) {
++$valid_segments;

you need to include the "nationals" and a hyphen
if ( $segment =~ m{ \A [A-Z#@$][A-Z0-9#@$-]{0,7} \z }x ) {
++$valid_segments;
 
A

anothermindbomb

Thanks you Eric - I'm off to read up on non-capturing groups (?:X)
which I know nothing about but seem to crop up quite a bit in the
responses to my initial question.

I will still need to do some work on any of the responses I get here -
I completely neglected to mention that the regexp is actually going to
live inside an XML schema, so perl isn't actually involved! When I was
trying to find a suitable newsgroup to pose my question in, I was drawn
to the c.l.p.m group as regexps are so prevalent in perl that most
(all!) perl monks will be comepletely au fait with them!

Thinking about it, it has worked out exactly as I wanted - strong
pushes down a path, without presenting me with a canned answer which
I'd be tempted to use without fully unerstanding.
 
A

anothermindbomb

Thinking about it, it has worked out exactly as I wanted - strong
pushes down a path, without presenting me with a canned answer which
I'd be tempted to use without fully unerstanding.

I've finally settled on:
^[a-zA-Z#$@][a-zA-Z0-9#$@-]{0,7}([.][a-zA-Z#$@][a-zA-Z0-9#$@-]{0,7}){0,21}
andI check the min and max length of the string I'm matching against
inisde my XML schema.

Can anyone see any obvious errors with this - everything I've tested it
with seems to pass as expected and be failed as exepcted...

Thanks to all who responded.
 
S

Samwyse

I've finally settled on:
^[a-zA-Z#$@][a-zA-Z0-9#$@-]{0,7}([.][a-zA-Z#$@][a-zA-Z0-9#$@-]{0,7}){0,21}
andI check the min and max length of the string I'm matching against
inisde my XML schema.

Can anyone see any obvious errors with this - everything I've tested it
with seems to pass as expected and be failed as exepcted...

I only see one obvious error, a missing '$' to anchor the end of the regex.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top