split xml file between two processing instructions

Discussion in 'Perl Misc' started by kcwolle, Jun 23, 2004.

  1. kcwolle

    kcwolle Guest

    hello,

    I want to split an xml file on processing instructions into different
    files.
    All content between the two PIs should be included in the new file.
    The file name should contain the content of first and the last <no>
    elements.


    example:
    <?split ?>
    <h1>... text ...</h1>
    <start-element/>
    <text>
    ....text text text...
    <nr>4</nr>
    </text>
    text text text
    <nr>18</nr>
    <end-element/>
    <h6> ... text ...</h6>
    <?split ?>

    In this case the file name should be: test-no4to18.xml and everything
    from <h1> to </h6> should be included.
    (btw there can be different start and end tags so that no rule on the
    starting and ending elements is possible)
    I would like to use an XML module (eg XML::Twigs) but how do I get a
    node list that contains all nodes between the processing instructions
    for further processing?

    Can anybody help me?

    Yours

    Wolfgang
    kcwolle, Jun 23, 2004
    #1
    1. Advertising

  2. kcwolle

    Anno Siegel Guest

    kcwolle <> wrote in comp.lang.perl.misc:
    > hello,
    >
    > I want to split an xml file on processing instructions into different
    > files.
    > All content between the two PIs should be included in the new file.
    > The file name should contain the content of first and the last <no>
    > elements.
    >
    >
    > example:
    > <?split ?>
    > <h1>... text ...</h1>
    > <start-element/>
    > <text>
    > ...text text text...
    > <nr>4</nr>
    > </text>
    > text text text
    > <nr>18</nr>
    > <end-element/>
    > <h6> ... text ...</h6>
    > <?split ?>
    >
    > In this case the file name should be: test-no4to18.xml and everything
    > from <h1> to </h6> should be included.
    > (btw there can be different start and end tags so that no rule on the
    > starting and ending elements is possible)
    > I would like to use an XML module (eg XML::Twigs) but how do I get a
    > node list that contains all nodes between the processing instructions
    > for further processing?


    What have you tried so far?

    We help people with programming, but we don't deliver programs
    according to specification.

    Anno
    Anno Siegel, Jun 23, 2004
    #2
    1. Advertising

  3. kcwolle <> wrote:

    > I want to split an xml file on processing instructions into different
    > files.



    Does it have to work on arbitrary XML or only on "your" XML?

    Might you have PIs like this?

    <?split ?>
    or
    <?split
    ?>

    If so, you're on your own. If not, see below.


    > All content between the two PIs should be included in the new file.
    > The file name should contain the content of first and the last <no>
    > elements.



    There are no <no> elements...


    > example:
    ><?split ?>
    ><h1>... text ...</h1>
    ><start-element/>
    ><text>
    > ...text text text...
    ><nr>4</nr>
    ></text>
    > text text text
    ><nr>18</nr>
    ><end-element/>
    ><h6> ... text ...</h6>
    ><?split ?>
    >
    > In this case the file name should be: test-no4to18.xml and everything
    > from <h1> to </h6> should be included.



    > I would like to use an XML module



    Since you don't need to make use of the XML structuring, I would
    treat them as plain ol' text files.


    > Can anybody help me?



    What have you tried so far?

    We generally prefer to help those who have attempted to help
    themselves first...


    This should get you started:

    foreach my $section ( split /\Q<?split ?>/ ) {
    my( $num1, $num2) = ($section =~ /<nr>(\d+)/g)[0, -1];
    next unless defined $num1;
    my $fname = "text-no${num1}to$num2.xml";
    print "$fname\n";
    }


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Jun 23, 2004
    #3
  4. kcwolle

    kcwolle Guest

    Hello Anno,

    I tried the following code to split the document. The problem is that
    I get only the first two <no> elements and not the first and the last.

    use strict;

    my $text;
    my $file = shift;
    my $outfile = shift;
    my $testfile;
    open(INPUT, "<$file") or die "Kann Datei $file nicht lesen!\n";
    local $/;
    $text = <INPUT>;
    close INPUT;


    while ($text =~ /<\?split \?>(.*?)(?=<\?split \?>)/sg)
    {
    my $fragment = $1;
    my ($from, $to) = $fragment =~ /<no>(.*?)<\/no>/isg;
    $testfile = $outfile."\\test-nr".${from}."to".${to}."\.xml",
    open(OUTPUT, ">$testfile") or die "Kann Datei $testfile nicht
    schreiben!!!\n";
    print OUTPUT $fragment;
    close OUTPUT;
    }

    The general problem with using regular expressions is that there could
    be broken elements eg
    <?split ?><level1><text>xxx</text><level2><text>yyy</text></level2><?split
    ?><level2><text>zzz</text></level2></level1>
    where a level1 tag begins in the first <?split ?> and an ends in the
    second.
    How can that broken elements be handled, so that I have well-formed
    XML.

    On the other hand if I use an XML module the PI is a node that has no
    children. How can the following nodes up to the next PI handled?

    Btw I'm a relative newbie to Perl and XML programming so that I need
    some support in these things. Maybe you can help me? :-|

    Yours

    Wolfgang
    kcwolle, Jun 24, 2004
    #4
  5. kcwolle <> wrote:

    > The problem is that
    > I get only the first two <no> elements and not the first and the last.



    > my ($from, $to) = $fragment =~ /<no>(.*?)<\/no>/isg;



    Use a "list slice" ("Slices" section in perldata.pod) to slice
    the list that m//g is returning, like I did in my earlier followup:


    my ($from, $to) = ($fragment =~ /<no>(.*?)<\/no>/isg)[ 0, -1 ];
    ^ ^^^^^^^^^^
    ^ ^^^^^^^^^^

    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Jun 24, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dominic Olivastro

    Processing Instructions

    Dominic Olivastro, Apr 14, 2004, in forum: XML
    Replies:
    9
    Views:
    538
    Ashmodai
    Apr 16, 2004
  2. Carlos Ribeiro
    Replies:
    11
    Views:
    694
    Alex Martelli
    Sep 17, 2004
  3. Tom Anderson
    Replies:
    4
    Views:
    503
    Peter Flynn
    Dec 13, 2008
  4. Ronald Scheer

    Processing instructions removed from result XML webservice

    Ronald Scheer, Sep 30, 2003, in forum: ASP .Net Web Services
    Replies:
    5
    Views:
    159
  5. Replies:
    2
    Views:
    89
Loading...

Share This Page