Hrs of work on regex: please help

R

Robert

After this message text is a pasted xml file I've been working
(wrestling) with.
The goal is to remove text from the file that begins with:
"<ns0:ErrorDetails>" and ends with "</ns0:ErrorDetails>".
I have done several other s/// type operations to this file to remove
other text parts, and it was no problem. I've heard the 'devil is in
the details' and I believe it now, hehe.

I have copy 'n pasted the text surrounding the target before and
after, and made a string of it in a simple Perl script. I had to use
single quotes, due to the numerous double quotes in the text. I used
the same s/// operation and it printed as I want! Wonderful, I
thought, now to do it on the file contents. But, it just will not do a
replace. It is getting beyond the point where I can think on this
problem without my brain feeling a spinning motion. I humbly submit my
problem for discussion.

My code follows:
#!/usr/bin/perl
my $results_dir = $ARGV[0];
my $expected_results_dir = "$results_dir/expectedresults";
my $cleaned_results_dir = "$results_dir/cleanedresults";
my $cleaned_expected_results_dir =
"$results_dir/expectedresults/cleanedexpectedresults";
my $cleaned_xml = "";
my $clean_file = "";
my $Line = "";
opendir(BIN, $results_dir) or die "Can't open directory: $dir: $!";
FILE_CLEAN: while( defined ($file = readdir BIN) )
{
next FILE_CLEAN if $file =~ /^\.\.?$/; # skip . and ..
next FILE_CLEAN if (-d "$results_dir/$file");# skip if it is
directory
open(To_Clean, "$results_dir/$file") or die "Can't open $To_Clean:
$!\n";
my @data = <To_Clean>; #read file contents
close(To_Clean); #close file
$clean_file = "$cleaned_results_dir/$file";
for (my $i = 0; $i < scalar(@data); ++$i) {
$Line = $data[$i];
#replace whitespaces at beginning and end with nothing
chomp $Line;
$Line =~ tr/\t/ /;
$Line =~ s/\t//g;
$Line =~ s/\<ns0:ErrorDetails\>.*?\<\/ns0:ErrorDetails\>//g;
$cleaned_xml = $cleaned_xml . $Line;
$Line = "";
};#END FOR
open(CLEANFILE, ">$clean_file") or die "Can't open $clean_file:
$!\n";
print CLEANFILE $cleaned_xml;
close(CLEANFILE);
$cleaned_xml = "";
};#END WHILE
print "...DONE\n";
closedir(BIN);
################################################################################

<?xml version="1.0" encoding="UTF-8"?>
<ns0:BOBEntitlementRoot xmlns:ns0="http://www.noco.com/BOBEntitlement"
version="NA"><ns0:ApplicationArea><ns0:CreationDateTime>2004-07-26T14:07:02.248-07:00</ns0:CreationDateTime><ns0:SourceSystem>HANDSHAKE</ns0:SourceSystem><ns0:Operation><ns0:Name>UnknownOperation</ns0:Name><ns0:Version>NA</ns0:Version></ns0:Operation></ns0:ApplicationArea><ns0:DataArea><ns0:Status><ns0:StatusCode>Failure</ns0:StatusCode><ns0:Error><ns0:ErrorCode>2101</ns0:ErrorCode><ns0:ErrorSever
ty>Error</ns0:ErrorSeverity><ns0:ErrorCategory>InputFormatError</ns0:ErrorCategory><ns0:ErrorDescription>Invalid
XML request. </ns0:ErrorDescription><ns0:ErrorDetails>Job-4296 Error
in [Processes/Integration_Interfaces/getEntitlement/getBHAPIJMSRequest_1.process/Group
(1)/Group/Parse XML]
Output data invalid
at com.tibco.pe.core.TaskImpl.a(TaskImpl.java:501)
at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:428)
at com.tibco.pe.core.Job.a(Job.java:591)
at com.tibco.pe.core.Job.if(Job.java:443)
at com.tibco.pe.core.JobDispatcher$a.a(JobDispatcher.java:270)
at com.tibco.pe.core.JobDispatcher$a.run(JobDispatcher.java:218)
caused by: org.xml.sax.SAXException: validation error: unexpected
content "{http://www.noco.com/BOBEntitlement}Sku"; expected
"{http://www.noco.com/BOBEntitlement}Name" or
"{http://www.noco.com/BOBEntitlement}Description" or
"{http://www.noco.com/BOBEntitlement}DomainType" or
"{http://www.noco.com/BOBEntitlement}PropertyTypeStatus" or
"{http://www.noco.com/BOBEntitlement}ChangeDate" or
"{http://www.noco.com/BOBEntitlement}DefaultValue" or
"{http://www.noco.com/BOBEntitlement}UsageType"
({com.tibco.xml.validation}COMPLEX_E_UNEXPECTED_CONTENT) at
/BOBEntitlementRoot[1]/DataArea[1]/BOBEntitlement[1]/OfferingProperty[1]/OfferingPropertyType[1]/Sku[1]
java.lang.Exception: unexpected content
"{http://www.noco.com/BOBEntitlement}Sku"; expected
"{http://www.noco.com/BOBEntitlement}Name" or
"{http://www.noco.com/BOBEntitlement}Description" or
"{http://www.noco.com/BOBEntitlement}DomainType" or
"{http://www.noco.com/BOBEntitlement}PropertyTypeStatus" or
"{http://www.noco.com/BOBEntitlement}ChangeDate" or
"{http://www.noco.com/BOBEntitlement}DefaultValue" or
"{http://www.noco.com/BOBEntitlement}UsageType"
at com.tibco.xml.validation.helpers.d.a(XmlContentValidatorElementContext.java:348)
at com.tibco.xml.validation.helpers.h.if(XmlContentValidator.java:753)
at com.tibco.xml.validation.helpers.h.text(XmlContentValidator.java:1601)
at com.tibco.xml.datamodel.nodes.Text.content(Text.java:327)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Document.content(Document.java:226)
at com.tibco.xml.datamodel.nodes.Document.serialize(Document.java:242)
at com.tibco.xml.xdata.bind.BindingRunner.validate(BindingRunner.java:302)
at com.tibco.xml.xdata.bind.OutputBindingRunner.validate(OutputBindingRunner.java:47)
at com.tibco.pe.core.TaskImpl.a(TaskImpl.java:489)
at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:428)
at com.tibco.pe.core.Job.a(Job.java:591)
at com.tibco.pe.core.Job.if(Job.java:443)
at com.tibco.pe.core.JobDispatcher$a.a(JobDispatcher.java:270)
at com.tibco.pe.core.JobDispatcher$a.run(JobDispatcher.java:218)
validation error: no declaration for element
"{http://www.noco.com/BOBEntitlement}Sku"
({com.tibco.xml.validation}COMPLEX_E_MISSING_ELEMENT_DECLARATION) at
/BOBEntitlementRoot[1]/DataArea[1]/BOBEntitlement[1]/OfferingProperty[1]/OfferingPropertyType[1]/Sku[1]
java.lang.Exception: no declaration for element
"{http://www.noco.com/BOBEntitlement}Sku"
at com.tibco.xml.validation.helpers.d.if(XmlContentValidatorElementContext.java:615)
at com.tibco.xml.validation.helpers.d.a(XmlContentValidatorElementContext.java:180)
at com.tibco.xml.validation.helpers.h.if(XmlContentValidator.java:818)
at com.tibco.xml.validation.helpers.h.text(XmlContentValidator.java:1601)
at com.tibco.xml.datamodel.nodes.Text.content(Text.java:327)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Document.content(Document.java:226)
at com.tibco.xml.datamodel.nodes.Document.serialize(Document.java:242)
at com.tibco.xml.xdata.bind.BindingRunner.validate(BindingRunner.java:302)
at com.tibco.xml.xdata.bind.OutputBindingRunner.validate(OutputBindingRunner.java:47)
at com.tibco.pe.core.TaskImpl.a(TaskImpl.java:489)
at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:428)
at com.tibco.pe.core.Job.a(Job.java:591)
at com.tibco.pe.core.Job.if(Job.java:443)
at com.tibco.pe.core.JobDispatcher$a.a(JobDispatcher.java:270)
at com.tibco.pe.core.JobDispatcher$a.run(JobDispatcher.java:218)
validation error: unexpected end of content
({com.tibco.xml.validation}COMPLEX_E_UNEXPECTED_END_OF_CONTENT) at
/BOBEntitlementRoot[1]/DataArea[1]/BOBEntitlement[1]/OfferingProperty[1]/OfferingPropertyType[1]
java.lang.Exception: unexpected end of content
at com.tibco.xml.validation.helpers.d.case(XmlContentValidatorElementContext.java:414)
at com.tibco.xml.validation.helpers.h.a(XmlContentValidator.java:1182)
at com.tibco.xml.validation.helpers.h.endElement(XmlContentValidator.java:1034)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1108)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Document.content(Document.java:226)
at com.tibco.xml.datamodel.nodes.Document.serialize(Document.java:242)
at com.tibco.xml.xdata.bind.BindingRunner.validate(BindingRunner.java:302)
at com.tibco.xml.xdata.bind.OutputBindingRunner.validate(OutputBindingRunner.java:47)
at com.tibco.pe.core.TaskImpl.a(TaskImpl.java:489)
at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:428)
at com.tibco.pe.core.Job.a(Job.java:591)
at com.tibco.pe.core.Job.if(Job.java:443)
at com.tibco.pe.core.JobDispatcher$a.a(JobDispatcher.java:270)
at com.tibco.pe.core.JobDispatcher$a.run(JobDispatcher.java:218)

at com.tibco.xml.xdata.bind.BindingRemarkHandler.assertNoErrors(BindingRemarkHandler.java:43)
at com.tibco.xml.xdata.bind.BindingRunner.validate(BindingRunner.java:319)
at com.tibco.xml.xdata.bind.OutputBindingRunner.validate(OutputBindingRunner.java:47)
at com.tibco.pe.core.TaskImpl.a(TaskImpl.java:489)
at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:428)
at com.tibco.pe.core.Job.a(Job.java:591)
at com.tibco.pe.core.Job.if(Job.java:443)
at com.tibco.pe.core.JobDispatcher$a.a(JobDispatcher.java:270)
at com.tibco.pe.core.JobDispatcher$a.run(JobDispatcher.java:218)
</ns0:ErrorDetails></ns0:Error></ns0:Status></ns0:DataArea></ns0:BOBEntitlementRoot>
 
G

Gunnar Hjalmarsson

Robert said:
The goal is to remove text from the file that begins with:
"<ns0:ErrorDetails>" and ends with "</ns0:ErrorDetails>".

Hmm.. Far too much code for my taste. ;-)

my @data = <To_Clean>; #read file contents

Here you slurp the file into an array, where each line is a separate
element.

for (my $i = 0; $i < scalar(@data); ++$i) {

Here you start various operations for each line.

$Line =~ s/\<ns0:ErrorDetails\>.*?\<\/ns0:ErrorDetails\>//g;

Since the start and end tags appear on different lines, that pattern
will never match.

Try slurping the file into a scalar variable instead, and add the /s
modifier to the s/// operator.
 
R

Robert

Thanks for the reply. Just to close the loop, what I ended up doing
was using the join function on the @data variable. I then used the
tr/// function to replace tabs and newlines with a space char. Now,
everything is set for the substituion, and the resulting files are
still able to be viewed as xml!

The main thing I have learned is when I spend more than an hour on a
problem, look at it from a different direction.

Thanks, again.
 
G

Gunnar Hjalmarsson

Jim said:
As Gunnar pointed out, you probably want to replace this with 'my
$data = <To_Clean>;'

That must be combined with enabling "slurp" mode:

local $/;
 
G

Gunnar Hjalmarsson

Robert said:
Thanks for the reply. Just to close the loop, what I ended up doing
was using the join function on the @data variable.

You could have skipped the @data array by just doing:

my $data = do { local $/; said:
I then used the tr/// function to replace tabs and newlines with a
space char.

Why? I suspect that the reason is that you are unfamiliar with the /s
modifier. Read about it in "perldoc perlre".
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top