XML Replace

Discussion in 'Perl Misc' started by Trev, Apr 28, 2010.

  1. Trev

    Trev Guest

    I'm trying to use Perl to replace a line in a few XML files I have.

    Example XML below, I'm wanting to change the Id= part from Id="/Local/
    App/App1" to Id=/App1". I know there's an easy way to do this with
    perl alone however I'm trying to use XML::Simple or any XML plugin for
    perl.

    <?xml version="1.0" encoding="UTF-8" standalone="no" ?>

    <Profile xmlns="xxxxxxxxx" name="" version="1.1" xmlns:xsi="http://
    www.w3.org/2001/XMLSchema-instance">


    <Application Name="App1" Id="/Local/App/App1" Services="1" policy=""
    StartApp="" Bal="5" sessInt="500" WaterMark="1.0"/>


    <AppProfileGuid>586e3456dt</AppProfileGuid>

    </Profile>
    Trev, Apr 28, 2010
    #1
    1. Advertising

  2. Trev

    Klaus Guest

    On 28 avr, 19:01, Trev <> wrote:
    > I'm trying to use Perl to replace a line in a few XML files I have.
    >
    > Example XML below, I'm wanting to change the Id= part from  Id="/Local/
    > App/App1" to Id=/App1". I know there's an easy way to do this with
    > perl alone however


    I don't think that processing XML with Perl alone (i.e. without any
    module) is easy.

    > I'm trying to use XML::Simple
    > or any XML plugin for perl.


    Have a look first at the excellent web site
    Ways to Rome: Processing XML with Perl
    http://xmltwig.com/article/ways_to_rome/ways_to_rome.html
    (original version by Ingo Macherius, maintained by Michel Rodriguez)

    If you don't find a solution there,
    then you can always employ a combination of the CPAN modules
    XML::Reader and XML::Writer
    http://search.cpan.org/~keichner/XML-Reader-0.34/lib/XML/Reader.pm
    http://search.cpan.org/~josephw/XML-Writer-0.611/Writer.pm

    A sample program would look as follows:

    use strict;
    use warnings;

    use XML::Reader;
    use XML::Writer;

    my $rdr = XML::Reader->newhd(\*DATA, {filter => 3});
    my $wrt = XML::Writer->new(OUTPUT => \*STDOUT,
    NEWLINES => 0, DATA_MODE => 1, DATA_INDENT => 2);

    # If, with XML::Writer, you write mixed content XML (that
    # is tags and characters in the same level, such as, for ex.:
    # <data>abc<sub>def</sub>ghi</data>
    # then XML::Writer will abort with message "Mixed content
    # not allowed". To allow XML::Writer in this case, you
    # will have to alter the parameters to
    # XML::Writer->new(NEWLINES=>0, DATA_MODE=>0, DATA_INDENT=>0);
    # or even to
    # XML::Writer->new(NEWLINES=>1, DATA_MODE=>0, DATA_INDENT=>0);

    $wrt->xmlDecl('UTF-8', 'no');

    while ($rdr->iterate) {
    my $tag = $rdr->tag;
    my $val = $rdr->value;
    my %att = %{$rdr->att_hash};

    if ($rdr->path eq '/Profile/Application'
    and defined $att{Id}) {
    # change '/../../zzz' into 'zzz'
    $att{Id} =~ s{\A .* /}''xms;
    }

    if ($rdr->is_start) { $wrt->startTag($tag, %att); }
    if ($val ne '') { $wrt->characters($val); }
    if ($rdr->is_end) { $wrt->endTag($rdr->tag); }
    }

    $wrt->end();

    __DATA__
    <?xml version="1.0" encoding="UTF-8" standalone="no" ?>
    <Profile
    xmlns="xxxxxxxxx"
    name=""
    version="1.1"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    <Application
    Name="App1" Id="/Local/App/App1" Services="1"
    policy="" StartApp="" Bal="5" sessInt="500"
    WaterMark="1.0"/>

    <Application
    Name="App99" Id="/Dummy/Test/iii" Services="3"
    policy="99" StartApp="2" Bal="7" sessInt="27"
    WaterMark="4.3"/>

    <Application
    Name="Yyee" Id="/Dat/Inp/Out" Services="5"
    policy="88" StartApp="" Bal="1" sessInt="8"
    WaterMark="2.1"/>

    <AppProfileGuid>586e3456dt</AppProfileGuid>
    <AppProfileGuid>a46y2hktt7</AppProfileGuid>
    <AppProfileGuid>mi6j77mae6</AppProfileGuid>
    </Profile>
    Klaus, Apr 29, 2010
    #2
    1. Advertising

  3. Klaus <> wrote:
    >I don't think that processing XML with Perl alone (i.e. without any
    >module) is easy.


    Well, XML is a rather straightforward, well structured language. If you
    are familar with compiler construction then it should be no big deal. At
    least much easier to parse than let's say C or Perl itself or even HTML
    (there are too many special cases in HTML).

    jue
    Jürgen Exner, Apr 29, 2010
    #3
  4. Trev

    Klaus Guest

    On 29 avr, 16:42, Jürgen Exner <> wrote:
    > Klaus <> wrote:
    > >I don't think that processing XML with Perl alone (i.e. without any
    > >module) is easy.

    >
    > Well, XML is a rather straightforward, well structured language. If you
    > are familar with compiler construction then it should be no big deal. At
    > least much easier to parse than let's say C or Perl itself or even HTML
    > (there are too many special cases in HTML).


    I agree, XML is straight forward and well structured, that's why I
    like to use it wherever I can.

    ....and if I was a compiler writer, I would say that processing XML was
    easy :)

    By the way, I have now released a new version of XML::Reader (ver
    0.35) with some bug fixes, warts removed, relicensing, etc...
    http://search.cpan.org/~keichner/XML-Reader-0.35/lib/XML/Reader.pm

    The line I wrote in my previous post (which was for XML::Reader ver
    0.34) was:

    my $rdr = XML::Reader->newhd(\*DATA, {filter => 3});

    With the new version 0.35 of XML::Reader, the same line would be
    spelled:

    my $rdr = XML::Reader->new(\*DATA, {mode => 'attr-in-hash'});
    Klaus, Apr 29, 2010
    #4
  5. Trev

    Guest

    On Wed, 28 Apr 2010 10:01:37 -0700 (PDT), Trev <> wrote:

    >I'm trying to use Perl to replace a line in a few XML files I have.
    >
    >Example XML below, I'm wanting to change the Id= part from Id="/Local/
    >App/App1" to Id=/App1". I know there's an easy way to do this with
    >perl alone however I'm trying to use XML::Simple or any XML plugin for
    >perl.
    >
    ><?xml version="1.0" encoding="UTF-8" standalone="no" ?>
    >
    ><Profile xmlns="xxxxxxxxx" name="" version="1.1" xmlns:xsi="http://
    >www.w3.org/2001/XMLSchema-instance">
    >
    >
    > <Application Name="App1" Id="/Local/App/App1" Services="1" policy=""
    >StartApp="" Bal="5" sessInt="500" WaterMark="1.0"/>
    >
    >
    ><AppProfileGuid>586e3456dt</AppProfileGuid>
    >
    ></Profile>


    If what you need is all you state,
    this code should fix up your xml.
    Its restricted to just single tag-attribute pair.
    It works by parsing exclusionary and specific markup.

    The advantage here is that nothing else changes in the
    original markup, only the string content of Id is changed
    via the replacement side of the regex.
    This avoids formatting headaches with some writers.

    The regex may look simple for a parser, thats becuse it
    is custom to the specific task.
    The markup interraction is correct.

    -sln

    # -------------------------------------------
    # rx_xml_fixval.pl
    # -sln, 5/2/2010
    #
    # Util to extract some attribute/val's from
    # xml/xhtml
    # -------------------------------------------

    use strict;
    use warnings;

    ##
    my $rxopen = "(?: Application )"; # Open tag , cannot be empty alternation
    my $rxattr = "(?: Id )"; # Attribute we seek, cannot have an empty alternation

    my $Rxmarkup = qr/
    [^<]*
    (?:
    # Things that hide markup
    (?: <! (?: \[CDATA\[.*?\]\] | --.*?-- | \[[A-Z][A-Z\ ]*\[.*?\]\] ) > ) \K
    |
    # Specific markup
    (?: < (?<OPEN> $rxopen ) \s+[^>]*? (?<=\s) (?<ATTR> $rxattr) \s*=\s* \K(?<VAL> ".+?"|'.+?')
    (?= [^>]*? \s* \/? > )
    )
    )
    |
    < \K
    /xs;

    ##
    my $html = join '', <DATA>;
    $html =~ s/ $Rxmarkup/ fixval( $+{VAL} ) /eg;
    print "\n",$html;

    exit (0);

    ##
    sub fixval {
    return '' unless defined $_[0];
    if ($_[0] =~ / \/ \s* (?<val>[^\/]+?) \s* (?<delim>["']) $/x) {
    return "$+{delim}$+{val}$+{delim}";
    }
    return $_[0];
    }


    __DATA__

    <?xml version="1.0" encoding="UTF-8" standalone="no" ?>

    <Profile xmlns="xxxxxxxxx" name="" version="1.1" xmlns:xsi="http://
    www.w3.org/2001/XMLSchema-instance">


    <Application Name="App1" Id="/Local/App/App1" Services="1" policy=""
    StartApp="" Bal="5" sessInt="500" WaterMark="1.0"/>


    <AppProfileGuid>586e3456dt</AppProfileGuid>

    </Profile>
    , May 2, 2010
    #5
  6. Trev

    Guest

    On Sun, 02 May 2010 13:20:33 -0700, wrote:

    >On Wed, 28 Apr 2010 10:01:37 -0700 (PDT), Trev <> wrote:
    >
    >>I'm trying to use Perl to replace a line in a few XML files I have.
    >>
    >>Example XML below, I'm wanting to change the Id= part from Id="/Local/
    >>App/App1" to Id=/App1". I know there's an easy way to do this with
    >>perl alone however I'm trying to use XML::Simple or any XML plugin for
    >>perl.
    >>
    >><?xml version="1.0" encoding="UTF-8" standalone="no" ?>
    >>
    >><Profile xmlns="xxxxxxxxx" name="" version="1.1" xmlns:xsi="http://
    >>www.w3.org/2001/XMLSchema-instance">
    >>
    >>
    >> <Application Name="App1" Id="/Local/App/App1" Services="1" policy=""
    >>StartApp="" Bal="5" sessInt="500" WaterMark="1.0"/>
    >>
    >>
    >><AppProfileGuid>586e3456dt</AppProfileGuid>
    >>
    >></Profile>

    >
    >If what you need is all you state,
    >this code should fix up your xml.
    >Its restricted to just single tag-attribute pair.
    >It works by parsing exclusionary and specific markup.
    >
    >The advantage here is that nothing else changes in the
    >original markup, only the string content of Id is changed
    >via the replacement side of the regex.
    >This avoids formatting headaches with some writers.
    >
    >The regex may look simple for a parser, thats becuse it
    >is custom to the specific task.
    >The markup interraction is correct.
    >


    With a slight modification, multiple attr-val's can be done
    within a single tag. Of course this includes some re-eval
    fringe code (?{}) and a conditional (?() | ) but does the
    same search and replace and on multiples.

    Cheers!
    -sln

    Some output:
    --------------------------
    Id = "/Local/App/App1", (valnew = "App1")
    Id2 = "/Local/App/App2", (valnew = "App2")
    Id = '/Dummy/Test/iii', (valnew = 'iii')
    Id = "/testing", (valnew = "testing")
    Id = "/Dum
    my/Test/iii
    ", (valnew = "iii")
    Id = "/Dat/Inp/Out", (valnew = "Out")
    Id = "/Local/App/App1", (valnew = "App1")
    Id = "/Dummy/Test/iii", (valnew = "iii")
    Id = "/Dat/Inp/Out", (valnew = "Out")
    Tt = "TT/tt hello", (valnew = "tt hello")
    Id = "/he llo", (valnew = "he llo")
    -----------------------------

    # -------------------------------------------
    # rx_html_fixval2.pl
    # -sln, 5/5/2010
    #
    # Util to search/replace attribute/val's from
    # xml/html
    # -------------------------------------------

    use strict;
    use warnings;

    ## Initialization
    ##

    my $rxopen = "(?: Application )"; # Open tags , cannot be empty alternation
    my $rxattr = "(?: Id.?|Tt )"; # Attributes we seek, cannot have an empty alternation
    # "(?: \\w+ )";

    use re 'eval';
    my $topen = 0;

    my $Rxmarkup = qr
    {
    (?(?{$topen}) # Begin Conditional

    # Have <OPEN> ?
    (?:
    # Try to match next attr-val pair
    \s+[^>]*? (?<=\s) (?<ATTR> $rxattr) \s*=\s* \K(?<VAL> ".+?"|'.+?')
    (?= [^>]*? \s* /? > )
    |
    # No more attr-value pairs
    (?{$topen = 0})
    )
    |
    # Look for new <OPEN>
    (?:
    [^<]*
    (?:
    # Things that hide markup:
    # - Comments/CDATA
    (?: <! (?: \[CDATA\[.*?\]\] | --.*?-- | \[[A-Z][A-Z\ ]*\[.*?\]\] ) > ) \K
    |
    # Specific markup we seek:
    # - OPEN tag
    (?: < (?<OPEN> $rxopen \K) )
    (?{$topen = 1})
    )
    |
    < \K
    )
    ) # End Conditional
    }xs;

    ## Code
    ##

    my $html = join '', <DATA>;
    $html =~ s/$Rxmarkup/ fixval( $+{ATTR}, $+{VAL} ) /eg;
    print "\n",$html;

    exit (0);


    ## Subs
    ##

    sub fixval {
    return '' unless defined $_[1];
    print "$_[0] = $_[1], ";
    if ($_[1] =~ / \/ \s* (?<val>[^\/]+?) \s* (?<delim>["']) $/x) {
    my $valnew = $+{delim}.$+{val}.$+{delim};
    print "(valnew = $valnew)\n";
    return $valnew;
    }
    print "(val unchanged)\n";
    return $_[1];
    }


    __DATA__

    <?xml version="1.0" encoding="UTF-8" standalone="no" ?>

    <Profile xmlns="xxxxxxxxx" name="" version="1.1" xmlns:xsi="http://
    www.w3.org/2001/XMLSchema-instance">

    <Application Name="App1" Id="/Local/App/App1"
    Id2="/Local/App/App2" Services="1" policy=""
    StartApp="" Bal="5" sessInt="500" WaterMark="1.0"/>

    <AppProfileGuid>586e3456dt</AppProfileGuid>

    </Profile>

    <Application
    Name="App99" Id='/Dummy/Test/iii' Services="3"
    policy="99" StartApp="2" Bal="7" sessInt="27"
    WaterMark="4.3" />

    <Application Id="/testing"
    Name="App100" Id="/Dum
    my/Test/iii
    " Services="4"
    policy="99" StartApp="2" Bal="7" sessInt="27"
    WaterMark="4.3"/>

    <Application
    Name="Yyee" Id="/Dat/Inp/Out" Services="5"
    policy="88" StartApp="" Bal="1" sessInt="8"
    WaterMark="2.1"/>

    <![INCLUDE CDATA [ <Application Name="App99" Id="//Test/can't see me"/> ]]>

    <?xml version="1.0" encoding="UTF-8" standalone="no" ?>
    <Profile
    xmlns="xxxxxxxxx"
    name=""
    version="1.1"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    <Application
    Name="App1" Id="/Local/App/App1" Services="1"
    policy="" StartApp="" Bal="5" sessInt="500"
    WaterMark="1.0"/>

    <Application
    Name="App99" Id="/Dummy/Test/iii" Services="3"
    policy="99" StartApp="2" Bal="7" sessInt="27"
    WaterMark="4.3"/>

    <Application
    Name="Yyee" Id="/Dat/Inp/Out" Services="5"
    policy="88" StartApp="" Bal="1" sessInt="8"
    WaterMark="2.1" Tt = "TT/tt hello"/>

    <Application
    Name="Yyee" Id="/he llo" Services="5"
    policy="88" StartApp="" Bal="1" sessInt="8"
    WaterMark="2.1"/>

    <AppProfileGuid>586e3456dt</AppProfileGuid>
    <AppProfileGuid>a46y2hktt7</AppProfileGuid>
    <AppProfileGuid>mi6j77mae6</AppProfileGuid>
    </Profile>
    , May 5, 2010
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Brian Blais
    Replies:
    1
    Views:
    367
    Bruno Desthuilliers
    Jun 27, 2006
  2. Greg Ewing
    Replies:
    2
    Views:
    334
    Dieter Maurer
    Jun 29, 2006
  3. Alun
    Replies:
    3
    Views:
    4,485
    Masudur
    Feb 18, 2008
  4. Prasad S
    Replies:
    2
    Views:
    218
    Dr John Stockton
    Aug 27, 2004
  5. Replies:
    3
    Views:
    169
    Brian McCauley
    Sep 12, 2005
Loading...

Share This Page