Need syntax/small footprint help

R

robic0

Hello, last month I wrote a rather large 10,000 line program that
parses xml (expat). The data is read into compound structures
then analyzed later. Below is one of the subroutines. I would like
to make it smaller so that its easier to read and understand and
also make the regular expressions less complicated for performance.
Any ideas?

---------------------------
sub eCHK_RegDel ## -- checks Redundant Regkey Delete // Regkey
exists --
{
my ($details) = @_;
return 1 if (@{$details} == 0);
my @regdel = ();

for (@$details) {
my $aref = $_->[0];
push (@regdel, $aref) if ($aref->[5] eq $REGISTRY && $aref->[6] eq
$DELETE);
}
if (@regdel)
{
## check that KeyName exists
for (@regdel) {
my $aref = $_;
next if (exists ($aref->[4]{$KEYNAME}) &&
length($aref->[4]{$KEYNAME}) > 0);
# log error
## 0701 -- test 07,01
ReportItem ('E', $CLOGINFO, $CERRLOG, '0701', glicn($aref), (0,0),
'', $XML_File, 1,0,1);
}
## check redundant regkey deletes
my $first_kname = '';
my $first_vname_exists = 0;
my $aref_first = undef;

my @regx_esc_codes = ( "\\", '/', '(', ')', '[', ']', '?', '|',
'+', '.', '*', '$', '^', '{',
'}', '@' );
for (@regdel)
{
my $aref = $_;
next if (!(exists ($aref->[4]{$KEYNAME}) &&
length($aref->[4]{$KEYNAME}) > 0));
my $kname = $aref->[4]{$KEYNAME};
my $vname = '';
my $kv = 'key (branch)';
if (exists ($aref->[4]{$VALUENAME})) {
if (length($aref->[4]{$VALUENAME}) > 0) {
$vname = $aref->[4]{$VALUENAME};
# strip leading spaces
while ($vname =~ s/^ //) {}
}
$kv = 'value';
}
for (@regx_esc_codes)
{
my $tc = $_;
my $xx = "\$kname =~ s/\\$tc/\\\\\\$tc/g;"; # code template for
regex
eval $xx;
#print "$xx\n";
}
$kname = $kname."\\\\";
my $fnd = 0;
# -- #
my $ctmpl = "if (\$kname =~ /^$first_kname/i) {\$fnd = 1;}"; #
code template for first key name
eval $ctmpl;
# -- #
#print
"-------------\n$fnd,$ctmpl\nkname=$kname\n--------------\n\n";
#print
"-------------\n$fnd,$ctmpl\nkname=$kname\nfirst=$first_kname\n--------------\n\n";
if ($@) {
## Check the $ctmpl, get the control code, log this error as a code
issue.
## This shouldn't happen ... the compiler will show the escape
char, add
## the char to "@regx_esc_codes", now its fixed!
$@ =~ s/^[\x20\n\t]+//; $@ =~ s/[\x20\n\t]+$//;
## 0702 -- test 07,02
ReportItem ('E', $CLOGINFO, $CERRLOG, '0702', (0,0,0,0), $@,
$XML_File, 1,0,1);
}
if ((!$fnd || length ($first_kname) == 0))
{
# because of sort,the first non valuename
# will always be in the new keyname action

$first_kname = uc($kname);
for (@regx_esc_codes)
{
my $tc = $_;
my $xx = "\$first_kname =~ s/\\$tc/\\\\\\$tc/g;"; # code
template for regex
eval $xx;
}
$aref_first = $aref;
if (length ($vname) > 0) {$first_vname_exists = 1} else {
$first_vname_exists = 0}
next;
}
next if ($first_vname_exists);
if (length($vname) == 0) {
if ($fnd) {
# log error, this sub-key is already being deleted
## 0703 -- test 07,03
ReportItem ('E', $CLOGINFO, $CERRLOG, '0703', glicn($aref),
glicn($aref_first), '', $XML_File, 1,0,1);
}
next;
}
# log error, the key for this value is already being deleted
## 0704 -- test 07,04
ReportItem ('E', $CLOGINFO, $CERRLOG, '0704', glicn($aref),
glicn($aref_first), '', $XML_File, 1,0,1);
}
}
}
-----------------------------
 
B

Brian McCauley

Hello, last month I wrote a rather large 10,000 line program that
parses xml (expat). The data is read into compound structures
then analyzed later. Below is one of the subroutines. I would like
to make it smaller so that its easier to read and understand and
also make the regular expressions less complicated for performance.
Any ideas?

---------------------------
sub eCHK_RegDel ## -- checks Redundant Regkey Delete // Regkey
exists --
{
my ($details) = @_;
return 1 if (@{$details} == 0);

return 1 unless @$details;

But actually this is, I think, redundant. There is no point since if
@$details is empty so will @regdel so nothing will happen anyhow.

Why are you returning 1? You don't seem to be bothering about the
returning anything in other situations so I infer that nothing is
considering the return value of eCHK_RegDel.
my @regdel = ();
for (@$details) {
my $aref = $_->[0];
push (@regdel, $aref) if ($aref->[5] eq $REGISTRY && $aref->[6] eq
$DELETE);
}

my @regdel =
grep { $_->[5] eq $REGISTRY && $_->[6] eq $DELETE )
map { $_->[0] } @$details;

if (@regdel)
{

Again this is redundant. "If the next line would do nothing then don't
do it". But if the next line would do nothing there's no cost in doing
it anyhow.
## check that KeyName exists
for (@regdel) {
my $aref = $_;

You don't need the value in $_ so simply:

for my $aref (@regdel) {
next if (exists ($aref->[4]{$KEYNAME}) &&
length($aref->[4]{$KEYNAME}) > 0);

You mean defined() not exists().

length() can never be negative so >0 is redundant.

There could be a case for simply using no warnings 'uninitialized' here.
# log error
## 0701 -- test 07,01
ReportItem ('E', $CLOGINFO, $CERRLOG, '0701', glicn($aref), (0,0),
'', $XML_File, 1,0,1);
}
## check redundant regkey deletes
my $first_kname = '';
my $first_vname_exists = 0;
my $aref_first = undef;

my @regx_esc_codes = ( "\\", '/', '(', ')', '[', ']', '?', '|',
'+', '.', '*', '$', '^', '{',
'}', '@' );

This is invarient and should be taken outside the loop or maybe even
outside the subroutine. But you don't need it at all probably.
for (@regdel)
{
my $aref = $_;
next if (!(exists ($aref->[4]{$KEYNAME}) &&
length($aref->[4]{$KEYNAME}) > 0));

Why the second loop? Why not a do it in a single pass?
my $kname = $aref->[4]{$KEYNAME};
my $vname = '';
my $kv = 'key (branch)';
if (exists ($aref->[4]{$VALUENAME})) {
if (length($aref->[4]{$VALUENAME}) > 0) {
$vname = $aref->[4]{$VALUENAME};
# strip leading spaces
while ($vname =~ s/^ //) {}

s/^ +//;
}
$kv = 'value';
}
for (@regx_esc_codes)
{
my $tc = $_;
my $xx = "\$kname =~ s/\\$tc/\\\\\\$tc/g;"; # code template for
regex
eval $xx;
#print "$xx\n";
}

Do not use eval unless there is a reason to do so. The above is more simply

for my $tc (@regx_esc_codes) {
$kname =~ s/(\Q$tc\E)/\\$1/g;
}

But I suspect you simply wanted

$kname = quotemeta $kname;

But I'm guessing you wouldn't need to do anything if you got rid of some
more of the pointless eval()s in your code.

All in all I think you are working way too. Hard. I'll stop now with
the line-by-line and go have another look at your code in its entirity.
Get back to you soon.
 
B

Brian McCauley

Brian said:
All in all I think you are working way too hard. I'll stop now with
the line-by-line and go have another look at your code in its entirity.
Get back to you soon.

Below find an, untested but much simpler version. I have not changed
the alorithm just removed some of the tortuous obfuscations.

I've subsequently noticed what I presume is a flaw in the algorithm (but
that's not Perl related so OT here).
Consider:

{ $KEYNAME => 'foo', $VALUENAME => 'bar' }
{ $KEYNAME => 'foo\blab' }
{ $KEYNAME => 'foo\blab', $VALUENAME => 'bar' }

I'm guessing the 3rd one above should be reported as redundant but won't be.

## -- checks Redundant Regkey Delete // Regkey exists --
sub eCHK_RegDel {
my ($details) = @_;
my ($first_kname,$first_vname_exists,$aref_first);
for (@$details) {
my $aref = $_->[0];
next unless $aref->[5] eq $REGISTRY && $aref->[6] eq $DELETE;
my $kname = $aref->[4]{$KEYNAME};

# Can input data really contain $kname='' ?
# if not then the check of length($kname) is redundant.

unless ( defined($kname) && length($kname) ) {
# log error
## 0701 -- test 07,01
ReportItem ('E', $CLOGINFO, $CERRLOG, '0701', glicn($aref), (0,0),
'', $XML_File, 1,0,1);
next;
}
my $vname = $aref->[4]{$VALUENAME};

# redunant surely - does input really contain $vname
# with leading spaces or empty strings?
if ( defined $vname ) {
$vname =~ s/^ +//;
undef $vname unless length $vname;
}

$kname .= "\\\\";
unless ( $first_kname && $kname =~ /^\Q$first_kname/i ) {
# because of sort,the first non valuename
# will always be in the new keyname action

$first_kname = $kname;
$aref_first = $aref;
$first_vname_exists = defined($vname);
next;
}
next if $first_vname_exists;
ReportItem ('E', $CLOGINFO, $CERRLOG,
defined($vname) ? '0703' : '0704',
glicn($aref), glicn($aref_first), '',
$XML_File, 1,0,1);
}
}
 
R

robic0

Brian said:
All in all I think you are working way too hard. I'll stop now with
the line-by-line and go have another look at your code in its entirity.
Get back to you soon.

Below find an, untested but much simpler version. I have not changed
the alorithm just removed some of the tortuous obfuscations.

I've subsequently noticed what I presume is a flaw in the algorithm (but
that's not Perl related so OT here).
Consider:

{ $KEYNAME => 'foo', $VALUENAME => 'bar' }
{ $KEYNAME => 'foo\blab' }
{ $KEYNAME => 'foo\blab', $VALUENAME => 'bar' }

I'm guessing the 3rd one above should be reported as redundant but won't be.

## -- checks Redundant Regkey Delete // Regkey exists --
sub eCHK_RegDel {
my ($details) = @_;
my ($first_kname,$first_vname_exists,$aref_first);
for (@$details) {
my $aref = $_->[0];
next unless $aref->[5] eq $REGISTRY && $aref->[6] eq $DELETE;
my $kname = $aref->[4]{$KEYNAME};

# Can input data really contain $kname='' ?
# if not then the check of length($kname) is redundant.

unless ( defined($kname) && length($kname) ) {
# log error
## 0701 -- test 07,01
ReportItem ('E', $CLOGINFO, $CERRLOG, '0701', glicn($aref), (0,0),
'', $XML_File, 1,0,1);
next;
}
my $vname = $aref->[4]{$VALUENAME};

# redunant surely - does input really contain $vname
# with leading spaces or empty strings?
if ( defined $vname ) {
$vname =~ s/^ +//;
undef $vname unless length $vname;
}

$kname .= "\\\\";
unless ( $first_kname && $kname =~ /^\Q$first_kname/i ) {
# because of sort,the first non valuename
# will always be in the new keyname action

$first_kname = $kname;
$aref_first = $aref;
$first_vname_exists = defined($vname);
next;
}
next if $first_vname_exists;
ReportItem ('E', $CLOGINFO, $CERRLOG,
defined($vname) ? '0703' : '0704',
glicn($aref), glicn($aref_first), '',
$XML_File, 1,0,1);
}
}

Hey man, thanks. This is just a followup test msg using Agent, lost
the google crap, now I can see mo better.
Yeah, you got some interresting stuff, I am going to correlate it with
the code once I set up everything. Some stuff you overlooked,
but some good stuff I have to check out. This is just a test
of Agent so I just in "check" mode for that. Obvious of interrest
is the extreme back slashing regx wise... I'll explain that too later.
Anyway thanks for the look Brian... be back to ya !!!

-RFC
 
R

robic0

Hey man, thanks. This is just a followup test msg using Agent, lost
the google crap, now I can see mo better.
Yeah, you got some interresting stuff, I am going to correlate it with
the code once I set up everything. Some stuff you overlooked,
but some good stuff I have to check out. This is just a test
of Agent so I just in "check" mode for that. Obvious of interrest
is the extreme back slashing regx wise... I'll explain that too later.
Anyway thanks for the look Brian... be back to ya !!!

-RFC

I've got Forté Agent v 1.91, seems to be working good.
Most of the code in this program was written fast and diddn't
have the luxury of detailed scrutiny and because of its size
and scope had to be written without the use of hidden
shortcuts or anything hard to read. For that reason it was
kept strictly on a C lang construct basis anything deeper
would have slowed down the write.

The "\\\\\\\\" 's et should be examined, along with what
this function is trying to do. This was a tough one to
program to say the least. Escape sequences in
hard code are analogous in chemistry to being
"out of solution". You don't worry about "in solution"
unles its brought out to be operated on. There's a
non-apparent unusual twist in this technique that
could be usefull maybe. Anyway, I've found alot
of other tricks doing this project. I think I will post
some of them on this thread. They need help too.
-RFC
 
R

robic0

return 1 unless @$details; yes, the same

But actually this is, I think, redundant. There is no point since if
@$details is empty so will @regdel so nothing will happen anyhow.
return from @details empty check comes b4 @regdel population?
Why are you returning 1? You don't seem to be bothering about the
returning anything in other situations so I infer that nothing is
considering the return value of eCHK_RegDel.

right, I paid no attention to the return condition at the end of the
sub. where its called from pays no attention to return codes as of
yet.
my @regdel = ();
for (@$details) {
my $aref = $_->[0];
push (@regdel, $aref) if ($aref->[5] eq $REGISTRY && $aref->[6] eq
$DELETE);
}

my @regdel =
grep { $_->[5] eq $REGISTRY && $_->[6] eq $DELETE )
map { $_->[0] } @$details;

if (@regdel)
{

Again this is redundant. "If the next line would do nothing then don't
do it". But if the next line would do nothing there's no cost in doing
it anyhow.

It may look weird but I assure you its the same push return addr onto
the stack register without a goto.
also, its proper form that more code may end up below "if" statement,
large code dictates you leave the door open.
You don't need the value in $_ so simply:

for my $aref (@regdel) {
writing freehand code from top to bottom, it is a rarety you will know
what the final outcome will be. not alot of time to go back...
but sometimes
next if (exists ($aref->[4]{$KEYNAME}) &&
length($aref->[4]{$KEYNAME}) > 0);

You mean defined() not exists().

length() can never be negative so >0 is redundant.
Not true... $aref->[4]{$KEYNAME} is dynamic. it MUST exist
and if it does is sometimes zero length, which is a condition
that is unacceptable...
There could be a case for simply using no warnings 'uninitialized' here.
# log error
## 0701 -- test 07,01
ReportItem ('E', $CLOGINFO, $CERRLOG, '0701', glicn($aref), (0,0),
'', $XML_File, 1,0,1);
}
## check redundant regkey deletes
my $first_kname = '';
my $first_vname_exists = 0;
my $aref_first = undef;

my @regx_esc_codes = ( "\\", '/', '(', ')', '[', ']', '?', '|',
'+', '.', '*', '$', '^', '{',
'}', '@' );

This is invarient and should be taken outside the loop or maybe even
outside the subroutine. But you don't need it at all probably.

Not true... it is not inside a loop. global scope had no use for
it.... in general, scoping in Perl is just like C++, dump it as soon
as possible !!
for (@regdel)
{
my $aref = $_;
next if (!(exists ($aref->[4]{$KEYNAME}) &&
length($aref->[4]{$KEYNAME}) > 0));

Why the second loop? Why not a do it in a single pass?
for alot of reasons... one is the side affects of trying to do such
a complicated maneuver all at once... generally a good
precautionarry measure especially since the first does not
slow down the second at all, since the @regdel array of
structures has been pre-conditioned via a sort engine
that maximizes this kind of check ahead of time.
Pre-fetching like this occurs sometimes 6 levels deep
before analysis begins. Its a trade-off where clarity wins
vs. bug introduction, but the engines must be designed
for this type of thing ahead of time... which were.
my $kname = $aref->[4]{$KEYNAME};
my $vname = '';
my $kv = 'key (branch)';
if (exists ($aref->[4]{$VALUENAME})) {
if (length($aref->[4]{$VALUENAME}) > 0) {
$vname = $aref->[4]{$VALUENAME};
# strip leading spaces
while ($vname =~ s/^ //) {}

s/^ +//;
}
$kv = 'value';
}
for (@regx_esc_codes)
{
my $tc = $_;
my $xx = "\$kname =~ s/\\$tc/\\\\\\$tc/g;"; # code template for
regex
eval $xx;
#print "$xx\n";
}

Do not use eval unless there is a reason to do so. The above is more simply
#--#
for my $tc (@regx_esc_codes) {
$kname =~ s/(\Q$tc\E)/\\$1/g;
}
#--#
I don't know what this is... I guess I should check it.
If you can resolve ALL the "$tc" with the "\Q"
in regx then this should work... have you tried this method yourself?
Without eval on the @regx codes you can't esc an "odd" number
of '\' without muting the '$' in "$tc". I hope it works I will try it
tommorow. @regx is array of single chars, this won't work on
expressions as are the eval in the dynamic strings compared later.
Course thats why its done here right....
But I suspect you simply wanted

$kname = quotemeta $kname;

But I'm guessing you wouldn't need to do anything if you got rid of some
more of the pointless eval()s in your code.

All in all I think you are working way too. Hard. I'll stop now with
the line-by-line and go have another look at your code in its entirity.
Get back to you soon.

You know, I'm being paid far too much money for it too .... hehe!!!
 
R

robic0

for alot of reasons... one is the side affects of trying to do such
a complicated maneuver all at once... generally a good
precautionarry measure especially since the first does not
slow down the second at all, since the @regdel array of
i meant @details instead of @regdel
 
A

Anno Siegel

return from @details empty check comes b4 @regdel population?

Please use complete English words. Things like "b4" have been a childish
affectation since the last 300 baud modem was scrapped.

Anno
 
R

robic0

Since we're talking about deleting a key, the 3rd one would be
reported as the root key (2nd) is already being deleted.. check the
code.
BTW, the
while ($vname =~ s/^ //) {} was replace with:
$vname =~ s/^ +//;
and so the trailing spaces:
$vname =~ s/ +$//;
although its not used in this sub. i have a reduction sub
(layer) that conditions data and takes away redudendency for
"equivalence" purposes before a typical check. Its called
"sub reduce" its got like 8 reductions now... more or less
depending on how I handle url's and directories (mixed
and across os'). u understand what I mean given these
are human typed?
## -- checks Redundant Regkey Delete // Regkey exists --
sub eCHK_RegDel {
my ($details) = @_;
my ($first_kname,$first_vname_exists,$aref_first);
for (@$details) {
my $aref = $_->[0];
next unless $aref->[5] eq $REGISTRY && $aref->[6] eq $DELETE;
my $kname = $aref->[4]{$KEYNAME};

# Can input data really contain $kname='' ?
# if not then the check of length($kname) is redundant.

$kname is dynamic... again, if it exists, it MUST have value.
there is no ambiguity here... what good does existence without
value in an initialized world in the middle of life? prozac....
unless ( defined($kname) && length($kname) ) {
# log error
## 0701 -- test 07,01
ReportItem ('E', $CLOGINFO, $CERRLOG, '0701', glicn($aref), (0,0),
'', $XML_File, 1,0,1);
next;
}
my $vname = $aref->[4]{$VALUENAME};

# redunant surely - does input really contain $vname
# with leading spaces or empty strings?

In the real world $vname contains raw human input, yes..
Unfortunately, raw spaces have length. Is a space
acceptical name on a credit card? no... so we
emulate life programitally sometimes in code to factor
out human mistakes.
Hey man, thanks. This is just a followup test msg using Agent, lost
the google crap, now I can see mo better.
Yeah, you got some interresting stuff, I am going to correlate it with
the code once I set up everything. Some stuff you overlooked,
but some good stuff I have to check out. This is just a test
of Agent so I just in "check" mode for that. Obvious of interrest
is the extreme back slashing regx wise... I'll explain that too later.
Anyway thanks for the look Brian... be back to ya !!!

-RFC

OK thanks again Brian, keep up the good work.
That "\Q" thing has got me to find out at work tommorow what it is.
Thanks.
-rfc
 
B

Brian McCauley

yes, the same


return from @details empty check comes b4 @regdel population?

You think it is relevant why? It is always better to write code with
fewer cases handled as exceptional.
It may look weird but I assure you its the same push return addr onto
the stack register without a goto.

I have no idea what you just said.
also, its proper form that more code may end up below "if" statement,
large code dictates you leave the door open.

What you call "leaving the door open" is what makes your code large.
If someone is going to add more code that needs to be outside the loop
but conditional on @regdel being non-empty then the additional effort of
adding the if is trivial.

Remember: always write less code unless this would make your code less
readable.
next if (exists ($aref->[4]{$KEYNAME}) &&
length($aref->[4]{$KEYNAME}) > 0);

You mean defined() not exists().

length() can never be negative so >0 is redundant.

Not true...
Why?

$aref->[4]{$KEYNAME} is dynamic. it MUST exist
and if it does is sometimes zero length, which is a condition
that is unacceptable...

That supports my assertion. When disagreeing with someone it is
convetional to put forward reasons why they are wrong, not why they are
right.
my @regx_esc_codes = ( "\\", '/', '(', ')', '[', ']', '?', '|',
'+', '.', '*', '$', '^', '{',
'}', '@' );

This is invarient and should be taken outside the loop or maybe even
outside the subroutine. But you don't need it at all probably.


Not true... it is not inside a loop.

Sorry my mistake.
global scope had no use for
it.... in general, scoping in Perl is just like C++, dump it as soon
as possible !!

Yes in general, yes. There are however exceptions. Constants are one.

for (@regdel)
{
my $aref = $_;
next if (!(exists ($aref->[4]{$KEYNAME}) &&
length($aref->[4]{$KEYNAME}) > 0));

Why the second loop? Why not a do it in a single pass?

for alot of reasons... one is the side affects of trying to do such
a complicated maneuver all at once... generally a good
precautionarry measure especially since the first does not
slow down the second at all, since the @regdel array of
structures has been pre-conditioned via a sort engine
that maximizes this kind of check ahead of time.
Pre-fetching like this occurs sometimes 6 levels deep
before analysis begins. Its a trade-off where clarity wins
vs. bug introduction, but the engines must be designed
for this type of thing ahead of time... which were.

I have no idea what you just said. I think you are saying that if an
error is detected in thre first loop then the second loop is not
executed. That's simply not true.
I don't know what this is... I guess I should check it.
If you can resolve ALL the "$tc" with the "\Q"
in regx then this should work... have you tried this method yourself?

Yes I have used quotemeta (or \Q) in Perl a lot.
Without eval on the @regx codes you can't esc an "odd" number
of '\' without muting the '$' in "$tc". I hope it works I will try it
tommorow. @regx is array of single chars, this won't work on
expressions as are the eval in the dynamic strings compared later.

I have no idea what you just said there. You seem to be saying you
think eval() is somehow the solution to some problem you were having.
As far as I can see all the problems you mention are a _consequence_ of
your decision to use eval().

Most of your code seems devoted to checking if string $kname starts with
string $first_kname. In most languages there's an easy way to do this.
Surely the fact that you spent a couple of dozen lines on it should
have rung alarm bells.

In Perl this is simply:

$kname =~ /^\Q$first_kname/

....or...

!index($kname,$first_kname)

To make it case insensative you can simply put a /i on the refeg or
convert everything to lowercase.
Course thats why its done here right....

I have no idea what you are talking about. What are you suggesting you
have done right?
You know, I'm being paid far too much money for it too .... hehe!!!

So that's why you'll spend ten units of effort working around a bug you
could fix in one unit.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,017
Latest member
GreenAcreCBDGummiesReview

Latest Threads

Top