RegExp to validate an MVS dataset name

anothermindbomb · Feb 23, 2006

Hello,

Brand new to RegExp's this afternoon, so forgive the mess I'm about to
present...

I've been trying to write a regular expression to validate an MVS
dataset name - a file, essentially, on an MVS mainframe. The naming
standards for datasets runs like this (straight from the IBM manual,
I'm afraid)...

A data set name can be one name segment, or a series of joined name
segments. Each name segment represents a level of qualification. For
example, the data set name DEPT58.SMITH.DATA3 is composed of three name
segments. The first name on the left is called the high-level
qualifier, the last is the low-level qualifier.

Each name segment (qualifier) is 1 to 8 characters, the first of which
must be alphabetic (A to Z) or national (# @ $). The remaining 7
characters are either alphabetic, numeric (0-9), national, or a hyphen
(-).

The period (.) separates name segments from each other. Including all
name segments and periods, the length of the data set name must not
exceed 44 characters. Thus, a maximum of 22 name segments can make up a
data set name.

So, SYS1.ISPPLIB is perfectly ok, 6TEST.FILE is bad (due to the leading
'6'), PROD.DATASET.D230206 is fine and TEST.QUALIFIED.DATASET is bad
(QUALIFIED > 8 characters)

So far, I've come up with a ^([A-Z#@\$]{1}[\w#@\$\-]{1,7})

^([A-Z#@\$]{1} / # deal with the first character being alpha or
national
[\w#@\$\-]{1,7}) # followed by up to 7 characters for the rest of
the first qualifier.

I now need to possibly match a period and then the rules for the first
qualifier all over again (hence the brackets in my existing regexp).
How do I do it?

If I code ^([A-Z#@\$]{1}[\w#@\$\-]{1,7})\. to match the period how can
I referback to the rule I've just written to say "and there may be some
more of this lot coming up"... I've been playing with "The RegExp
Coach" (http://weitz.de/files/regex-coach.exe) to experiment and test
my musings but when I attempt to code a \0 to refer to the capturing
group I've just made, it doesn't seem to do what I expect. I'm assuming
that's because what I'm expecting is utterly incorrect!

BTW, I'm not concerned about checking the length of the string I'm
matching being less than 44 characters (I don't know if that's even
possible in a regexp!) - I can do that outside of this bit of code.

I have a feeling that what I'm attempting to do isn't particularly
hard, but I'm struggling to make further progress after matching the
first part of my string... a prod in the right direction, rather than a
solution would be most welcome - I'd like to learn how to work this out
for myself rather than have someone present me with the answer.

Cheers,

Steve,.

A. Sinan Unur · Feb 23, 2006

(e-mail address removed) wrote in @e56g2000cwe.googlegroups.com:

A data set name can be one name segment, or a series of joined name
segments. Each name segment represents a level of qualification. For
example, the data set name DEPT58.SMITH.DATA3 is composed of three
name segments. The first name on the left is called the high-level
qualifier, the last is the low-level qualifier.

Each name segment (qualifier) is 1 to 8 characters, the first of which
must be alphabetic (A to Z) or national (# @ $). The remaining 7
characters are either alphabetic, numeric (0-9), national, or a hyphen
(-).

The period (.) separates name segments from each other.
....
So, SYS1.ISPPLIB is perfectly ok, 6TEST.FILE is bad (due to the
leading '6'), PROD.DATASET.D230206 is fine and TEST.QUALIFIED.DATASET
is bad (QUALIFIED > 8 characters)

I would split the name into individual segments, and check if each
segment is valid:

#!/usr/bin/perl

use strict;
use warnings;

use constant MAX_DATASET_NAME_LENGTH => 44;

DATASET_NAME: while ( my $name = <DATA> ) {
chomp $name;
next DATASET_NAME unless length $name
and length $name < MAX_DATASET_NAME_LENGTH;

my @segments = split m{ \. }x, $name;
my $valid_segments = 0;

SEGMENT: for my $segment ( @segments) {
if ( $segment =~ m{ \A [A-Z][A-Z0-9]{0,7} \z }x ) {
++$valid_segments;
} else {
last SEGMENT;
}
}

if ( $valid_segments == @segments ) {
print "VALID: $name\n";
}
}

__DATA__
SYS1.ISPPLIB
6TEST.FILE
TEST.QUALIFIED.DATASET
PROD.DATASET.D230206

A. Sinan Unur · Feb 23, 2006

next DATASET_NAME unless length $name
and length $name < MAX_DATASET_NAME_LENGTH;

Should be:

length $name <= MAX_DATASET_NAME_LENGTH;

Sinan

Eric Schwartz · Feb 23, 2006

Brand new to RegExp's this afternoon, so forgive the mess I'm about to
present...

You've done better than many already by providing a useful description
of your problem and some sample code.

I've been trying to write a regular expression to validate an MVS
dataset name - a file, essentially, on an MVS mainframe. The naming
standards for datasets runs like this (straight from the IBM manual,
I'm afraid)...

Taking a page from the Perl Best Practises course I took from Damian
Conway a while back, I'm going to build up your regex as a series of
smaller ones. This is helpful because it's often easy to describe
little parts, and how those parts come together to form big pieces,
but often you can get lost in trying to view the trees for the forest
if you'll excuse the horrible metaphor.

I'm also going to re-order your paragraphs a bit to allow for
graduated understanding.

Each name segment (qualifier) is 1 to 8 characters, the first of which
must be alphabetic (A to Z) or national (# @ $). The remaining 7
characters are either alphabetic, numeric (0-9), national, or a hyphen
(-).

From that description, a segment looks like:

my $SEGMENT = qr/ [A-Z#@$] # matches all legal first chars

(?:[-A-Z\d#@$]) # use non-grouping parens (?

to
# match valid characters for
# non-first chars.

{0,7} # Allow from 0 to 7 of non-first chars.

/x;

A data set name can be one name segment, or a series of joined name
segments. Each name segment represents a level of qualification. For
example, the data set name DEPT58.SMITH.DATA3 is composed of three name
segments. The first name on the left is called the high-level
qualifier, the last is the low-level qualifier.

The period (.) separates name segments from each other.

my $DATA_SET_NAME = qr/^ $SEGMENT

# one segment at the beginning of the
# string is required.

(?: \. $SEGMENT)*

# optionally, follow that with a literal '.'
# and any number of segments, including 0.

$
# also, anchor the end. This is important,
# because $SEGMENT will match the '.ABCDEFGH'
# part of '.ABCDEFGHIJ'. If you anchor the
# end, though, $SEGMENT can't match there.
/x;

Including all name segments and periods, the length of the data set
name must not exceed 44 characters. Thus, a maximum of 22 name
segments can make up a data set name.

This is going to be more easily treated by a simple test, I think--
I'd just do

die "Data Set name is too long!" if length($name) > 44;

if it were me. You could also replace the * above with {0,22}, but
that won't tell you if you have 22 segments that are each 8 characters
long.

BTW, I'm not concerned about checking the length of the string I'm
matching being less than 44 characters (I don't know if that's even
possible in a regexp!)

It is, it just doesn't read as well:

/^.{0,44}$/ vs 'length($name) <= 44';

I have a feeling that what I'm attempting to do isn't particularly
hard, but I'm struggling to make further progress after matching the
first part of my string... a prod in the right direction, rather than a
solution would be most welcome - I'd like to learn how to work this out
for myself rather than have someone present me with the answer.

Well, I gave you *an* answer, sorry about that, but your problem
interested me enough. Hope it helps you figure out how to subdivide
your future problems appropriately, though.

-=Eric

Brian Helterline · Feb 23, 2006

A. Sinan Unur said:
(e-mail address removed) wrote in @e56g2000cwe.googlegroups.com:

Each name segment (qualifier) is 1 to 8 characters, the first of which
must be alphabetic (A to Z) or national (# @ $). The remaining 7
characters are either alphabetic, numeric (0-9), national, or a hyphen
(-).

The period (.) separates name segments from each other.

Click to expand...

...

if ( $segment =~ m{ \A [A-Z][A-Z0-9]{0,7} \z }x ) {
++$valid_segments;

you need to include the "nationals" and a hyphen
if ( $segment =~ m{ \A [A-Z#@$][A-Z0-9#@$-]{0,7} \z }x ) {
++$valid_segments;

anothermindbomb · Feb 24, 2006

Thanks you Eric - I'm off to read up on non-capturing groups (?:X)
which I know nothing about but seem to crop up quite a bit in the
responses to my initial question.

I will still need to do some work on any of the responses I get here -
I completely neglected to mention that the regexp is actually going to
live inside an XML schema, so perl isn't actually involved! When I was
trying to find a suitable newsgroup to pose my question in, I was drawn
to the c.l.p.m group as regexps are so prevalent in perl that most
(all!) perl monks will be comepletely au fait with them!

Thinking about it, it has worked out exactly as I wanted - strong
pushes down a path, without presenting me with a canned answer which
I'd be tempted to use without fully unerstanding.

anothermindbomb · Feb 24, 2006

Thinking about it, it has worked out exactly as I wanted - strong
pushes down a path, without presenting me with a canned answer which
I'd be tempted to use without fully unerstanding.

I've finally settled on:
^[a-zA-Z#$@][a-zA-Z0-9#$@-]{0,7}([.][a-zA-Z#$@][a-zA-Z0-9#$@-]{0,7}){0,21}
andI check the min and max length of the string I'm matching against
inisde my XML schema.

Can anyone see any obvious errors with this - everything I've tested it
with seems to pass as expected and be failed as exepcted...

Thanks to all who responded.

Samwyse · Feb 26, 2006

I've finally settled on:
^[a-zA-Z#$@][a-zA-Z0-9#$@-]{0,7}([.][a-zA-Z#$@][a-zA-Z0-9#$@-]{0,7}){0,21}
andI check the min and max length of the string I'm matching against
inisde my XML schema.

Can anyone see any obvious errors with this - everything I've tested it
with seems to pass as expected and be failed as exepcted...

I only see one obvious error, a missing '$' to anchor the end of the regex.

How to not load an insanely big dataset in less than 50 hrs	1	Sep 2, 2023
How to discover a CSS Selector name?	8	Sep 12, 2023
How to change key name in json file with python	0	Oct 2, 2022
Insert replace text based on a name in other file python script	4	Mar 5, 2025
Difficulties with the addition of an accordion	0	Feb 6, 2024
How to display input options only after selecting an option from the 'select class' tag JS?	6	May 12, 2023
How to properly insert a landing page within same container beneath an image element?	1	Oct 6, 2024
regexp assistance	14	Oct 12, 2009

RegExp to validate an MVS dataset name

anothermindbomb

A. Sinan Unur

A. Sinan Unur

Eric Schwartz

Brian Helterline

anothermindbomb

anothermindbomb

Samwyse

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads