Hrs of work on regex: please help

Discussion in 'Perl Misc' started by Robert, Jul 27, 2004.

  1. Robert

    Robert Guest

    After this message text is a pasted xml file I've been working
    (wrestling) with.
    The goal is to remove text from the file that begins with:
    "<ns0:ErrorDetails>" and ends with "</ns0:ErrorDetails>".
    I have done several other s/// type operations to this file to remove
    other text parts, and it was no problem. I've heard the 'devil is in
    the details' and I believe it now, hehe.

    I have copy 'n pasted the text surrounding the target before and
    after, and made a string of it in a simple Perl script. I had to use
    single quotes, due to the numerous double quotes in the text. I used
    the same s/// operation and it printed as I want! Wonderful, I
    thought, now to do it on the file contents. But, it just will not do a
    replace. It is getting beyond the point where I can think on this
    problem without my brain feeling a spinning motion. I humbly submit my
    problem for discussion.

    My code follows:
    #!/usr/bin/perl
    my $results_dir = $ARGV[0];
    my $expected_results_dir = "$results_dir/expectedresults";
    my $cleaned_results_dir = "$results_dir/cleanedresults";
    my $cleaned_expected_results_dir =
    "$results_dir/expectedresults/cleanedexpectedresults";
    my $cleaned_xml = "";
    my $clean_file = "";
    my $Line = "";
    opendir(BIN, $results_dir) or die "Can't open directory: $dir: $!";
    FILE_CLEAN: while( defined ($file = readdir BIN) )
    {
    next FILE_CLEAN if $file =~ /^\.\.?$/; # skip . and ..
    next FILE_CLEAN if (-d "$results_dir/$file");# skip if it is
    directory
    open(To_Clean, "$results_dir/$file") or die "Can't open $To_Clean:
    $!\n";
    my @data = <To_Clean>; #read file contents
    close(To_Clean); #close file
    $clean_file = "$cleaned_results_dir/$file";
    for (my $i = 0; $i < scalar(@data); ++$i) {
    $Line = $data[$i];
    #replace whitespaces at beginning and end with nothing
    chomp $Line;
    $Line =~ tr/\t/ /;
    $Line =~ s/\t//g;
    $Line =~ s/\<ns0:ErrorDetails\>.*?\<\/ns0:ErrorDetails\>//g;
    $cleaned_xml = $cleaned_xml . $Line;
    $Line = "";
    };#END FOR
    open(CLEANFILE, ">$clean_file") or die "Can't open $clean_file:
    $!\n";
    print CLEANFILE $cleaned_xml;
    close(CLEANFILE);
    $cleaned_xml = "";
    };#END WHILE
    print "...DONE\n";
    closedir(BIN);
    ################################################################################

    <?xml version="1.0" encoding="UTF-8"?>
    <ns0:BOBEntitlementRoot xmlns:ns0="http://www.noco.com/BOBEntitlement"
    version="NA"><ns0:ApplicationArea><ns0:CreationDateTime>2004-07-26T14:07:02.248-07:00</ns0:CreationDateTime><ns0:SourceSystem>HANDSHAKE</ns0:SourceSystem><ns0:Operation><ns0:Name>UnknownOperation</ns0:Name><ns0:Version>NA</ns0:Version></ns0:Operation></ns0:ApplicationArea><ns0:DataArea><ns0:Status><ns0:StatusCode>Failure</ns0:StatusCode><ns0:Error><ns0:ErrorCode>2101</ns0:ErrorCode><ns0:ErrorSever
    ty>Error</ns0:ErrorSeverity><ns0:ErrorCategory>InputFormatError</ns0:ErrorCategory><ns0:ErrorDescription>Invalid
    XML request. </ns0:ErrorDescription><ns0:ErrorDetails>Job-4296 Error
    in [Processes/Integration_Interfaces/getEntitlement/getBHAPIJMSRequest_1.process/Group
    (1)/Group/Parse XML]
    Output data invalid
    at com.tibco.pe.core.TaskImpl.a(TaskImpl.java:501)
    at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:428)
    at com.tibco.pe.core.Job.a(Job.java:591)
    at com.tibco.pe.core.Job.if(Job.java:443)
    at com.tibco.pe.core.JobDispatcher$a.a(JobDispatcher.java:270)
    at com.tibco.pe.core.JobDispatcher$a.run(JobDispatcher.java:218)
    caused by: org.xml.sax.SAXException: validation error: unexpected
    content "{http://www.noco.com/BOBEntitlement}Sku"; expected
    "{http://www.noco.com/BOBEntitlement}Name" or
    "{http://www.noco.com/BOBEntitlement}Description" or
    "{http://www.noco.com/BOBEntitlement}DomainType" or
    "{http://www.noco.com/BOBEntitlement}PropertyTypeStatus" or
    "{http://www.noco.com/BOBEntitlement}ChangeDate" or
    "{http://www.noco.com/BOBEntitlement}DefaultValue" or
    "{http://www.noco.com/BOBEntitlement}UsageType"
    ({com.tibco.xml.validation}COMPLEX_E_UNEXPECTED_CONTENT) at
    /BOBEntitlementRoot[1]/DataArea[1]/BOBEntitlement[1]/OfferingProperty[1]/OfferingPropertyType[1]/Sku[1]
    java.lang.Exception: unexpected content
    "{http://www.noco.com/BOBEntitlement}Sku"; expected
    "{http://www.noco.com/BOBEntitlement}Name" or
    "{http://www.noco.com/BOBEntitlement}Description" or
    "{http://www.noco.com/BOBEntitlement}DomainType" or
    "{http://www.noco.com/BOBEntitlement}PropertyTypeStatus" or
    "{http://www.noco.com/BOBEntitlement}ChangeDate" or
    "{http://www.noco.com/BOBEntitlement}DefaultValue" or
    "{http://www.noco.com/BOBEntitlement}UsageType"
    at com.tibco.xml.validation.helpers.d.a(XmlContentValidatorElementContext.java:348)
    at com.tibco.xml.validation.helpers.h.if(XmlContentValidator.java:753)
    at com.tibco.xml.validation.helpers.h.text(XmlContentValidator.java:1601)
    at com.tibco.xml.datamodel.nodes.Text.content(Text.java:327)
    at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
    at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
    at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
    at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
    at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
    at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
    at com.tibco.xml.datamodel.nodes.Document.content(Document.java:226)
    at com.tibco.xml.datamodel.nodes.Document.serialize(Document.java:242)
    at com.tibco.xml.xdata.bind.BindingRunner.validate(BindingRunner.java:302)
    at com.tibco.xml.xdata.bind.OutputBindingRunner.validate(OutputBindingRunner.java:47)
    at com.tibco.pe.core.TaskImpl.a(TaskImpl.java:489)
    at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:428)
    at com.tibco.pe.core.Job.a(Job.java:591)
    at com.tibco.pe.core.Job.if(Job.java:443)
    at com.tibco.pe.core.JobDispatcher$a.a(JobDispatcher.java:270)
    at com.tibco.pe.core.JobDispatcher$a.run(JobDispatcher.java:218)
    validation error: no declaration for element
    "{http://www.noco.com/BOBEntitlement}Sku"
    ({com.tibco.xml.validation}COMPLEX_E_MISSING_ELEMENT_DECLARATION) at
    /BOBEntitlementRoot[1]/DataArea[1]/BOBEntitlement[1]/OfferingProperty[1]/OfferingPropertyType[1]/Sku[1]
    java.lang.Exception: no declaration for element
    "{http://www.noco.com/BOBEntitlement}Sku"
    at com.tibco.xml.validation.helpers.d.if(XmlContentValidatorElementContext.java:615)
    at com.tibco.xml.validation.helpers.d.a(XmlContentValidatorElementContext.java:180)
    at com.tibco.xml.validation.helpers.h.if(XmlContentValidator.java:818)
    at com.tibco.xml.validation.helpers.h.text(XmlContentValidator.java:1601)
    at com.tibco.xml.datamodel.nodes.Text.content(Text.java:327)
    at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
    at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
    at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
    at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
    at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
    at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
    at com.tibco.xml.datamodel.nodes.Document.content(Document.java:226)
    at com.tibco.xml.datamodel.nodes.Document.serialize(Document.java:242)
    at com.tibco.xml.xdata.bind.BindingRunner.validate(BindingRunner.java:302)
    at com.tibco.xml.xdata.bind.OutputBindingRunner.validate(OutputBindingRunner.java:47)
    at com.tibco.pe.core.TaskImpl.a(TaskImpl.java:489)
    at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:428)
    at com.tibco.pe.core.Job.a(Job.java:591)
    at com.tibco.pe.core.Job.if(Job.java:443)
    at com.tibco.pe.core.JobDispatcher$a.a(JobDispatcher.java:270)
    at com.tibco.pe.core.JobDispatcher$a.run(JobDispatcher.java:218)
    validation error: unexpected end of content
    ({com.tibco.xml.validation}COMPLEX_E_UNEXPECTED_END_OF_CONTENT) at
    /BOBEntitlementRoot[1]/DataArea[1]/BOBEntitlement[1]/OfferingProperty[1]/OfferingPropertyType[1]
    java.lang.Exception: unexpected end of content
    at com.tibco.xml.validation.helpers.d.case(XmlContentValidatorElementContext.java:414)
    at com.tibco.xml.validation.helpers.h.a(XmlContentValidator.java:1182)
    at com.tibco.xml.validation.helpers.h.endElement(XmlContentValidator.java:1034)
    at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1108)
    at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
    at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
    at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
    at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
    at com.tibco.xml.datamodel.nodes.Document.content(Document.java:226)
    at com.tibco.xml.datamodel.nodes.Document.serialize(Document.java:242)
    at com.tibco.xml.xdata.bind.BindingRunner.validate(BindingRunner.java:302)
    at com.tibco.xml.xdata.bind.OutputBindingRunner.validate(OutputBindingRunner.java:47)
    at com.tibco.pe.core.TaskImpl.a(TaskImpl.java:489)
    at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:428)
    at com.tibco.pe.core.Job.a(Job.java:591)
    at com.tibco.pe.core.Job.if(Job.java:443)
    at com.tibco.pe.core.JobDispatcher$a.a(JobDispatcher.java:270)
    at com.tibco.pe.core.JobDispatcher$a.run(JobDispatcher.java:218)

    at com.tibco.xml.xdata.bind.BindingRemarkHandler.assertNoErrors(BindingRemarkHandler.java:43)
    at com.tibco.xml.xdata.bind.BindingRunner.validate(BindingRunner.java:319)
    at com.tibco.xml.xdata.bind.OutputBindingRunner.validate(OutputBindingRunner.java:47)
    at com.tibco.pe.core.TaskImpl.a(TaskImpl.java:489)
    at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:428)
    at com.tibco.pe.core.Job.a(Job.java:591)
    at com.tibco.pe.core.Job.if(Job.java:443)
    at com.tibco.pe.core.JobDispatcher$a.a(JobDispatcher.java:270)
    at com.tibco.pe.core.JobDispatcher$a.run(JobDispatcher.java:218)
    </ns0:ErrorDetails></ns0:Error></ns0:Status></ns0:DataArea></ns0:BOBEntitlementRoot>
    Robert, Jul 27, 2004
    #1
    1. Advertising

  2. Robert wrote:
    > The goal is to remove text from the file that begins with:
    > "<ns0:ErrorDetails>" and ends with "</ns0:ErrorDetails>".


    Hmm.. Far too much code for my taste. ;-)

    <snip>

    > my @data = <To_Clean>; #read file contents


    Here you slurp the file into an array, where each line is a separate
    element.

    <snip>

    > for (my $i = 0; $i < scalar(@data); ++$i) {


    Here you start various operations for each line.

    <snip>

    > $Line =~ s/\<ns0:ErrorDetails\>.*?\<\/ns0:ErrorDetails\>//g;


    Since the start and end tags appear on different lines, that pattern
    will never match.

    Try slurping the file into a scalar variable instead, and add the /s
    modifier to the s/// operator.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Jul 27, 2004
    #2
    1. Advertising

  3. Robert

    Robert Guest

    Thanks for the reply. Just to close the loop, what I ended up doing
    was using the join function on the @data variable. I then used the
    tr/// function to replace tabs and newlines with a space char. Now,
    everything is set for the substituion, and the resulting files are
    still able to be viewed as xml!

    The main thing I have learned is when I spend more than an hour on a
    problem, look at it from a different direction.

    Thanks, again.
    Gunnar Hjalmarsson <> wrote in message news:<>...
    > Robert wrote:
    > > The goal is to remove text from the file that begins with:
    > > "<ns0:ErrorDetails>" and ends with "</ns0:ErrorDetails>".

    >
    > Hmm.. Far too much code for my taste. ;-)
    >
    > <snip>
    >
    > > my @data = <To_Clean>; #read file contents

    >
    > Here you slurp the file into an array, where each line is a separate
    > element.
    >
    > <snip>
    >
    > > for (my $i = 0; $i < scalar(@data); ++$i) {

    >
    > Here you start various operations for each line.
    >
    > <snip>
    >
    > > $Line =~ s/\<ns0:ErrorDetails\>.*?\<\/ns0:ErrorDetails\>//g;

    >
    > Since the start and end tags appear on different lines, that pattern
    > will never match.
    >
    > Try slurping the file into a scalar variable instead, and add the /s
    > modifier to the s/// operator.
    Robert, Jul 27, 2004
    #3
  4. Jim Gibson wrote:
    > Robert wrote:
    >>
    >> my @data = <To_Clean>; #read file contents

    >
    > As Gunnar pointed out, you probably want to replace this with 'my
    > $data = <To_Clean>;'


    That must be combined with enabling "slurp" mode:

    local $/;

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Jul 29, 2004
    #4
  5. Robert wrote:
    > Gunnar Hjalmarsson wrote:
    >> Robert wrote:
    >>> The goal is to remove text from the file that begins with:
    >>> "<ns0:ErrorDetails>" and ends with "</ns0:ErrorDetails>".

    >>
    >> Try slurping the file into a scalar variable instead, and add the
    >> /s modifier to the s/// operator.

    >
    > Thanks for the reply. Just to close the loop, what I ended up doing
    > was using the join function on the @data variable.


    You could have skipped the @data array by just doing:

    my $data = do { local $/; <To_Clean> };

    > I then used the tr/// function to replace tabs and newlines with a
    > space char.


    Why? I suspect that the reason is that you are unfamiliar with the /s
    modifier. Read about it in "perldoc perlre".

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Jul 29, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. The One
    Replies:
    2
    Views:
    843
    The One
    Dec 21, 2006
  2. Replies:
    4
    Views:
    320
    webMODULES
    Oct 5, 2006
  3. nitya

    24 hrs

    nitya, Mar 10, 2008, in forum: C++
    Replies:
    0
    Views:
    318
    nitya
    Mar 10, 2008
  4. carmen

    Converting hrs and min to just min

    carmen, Aug 10, 2004, in forum: ASP General
    Replies:
    4
    Views:
    122
    carmen
    Aug 10, 2004
  5. Pragash Mr.
    Replies:
    2
    Views:
    83
    Tachikoma
    Jul 17, 2008
Loading...

Share This Page