get rid of non xml compliant lines from a file

Discussion in 'Perl Misc' started by Mr_Noob, Mar 26, 2008.

  1. Mr_Noob

    Mr_Noob Guest

    Hi all,

    I try to write a perl script that would delete all non xml complient
    lines (ie beginning with "<" and ending ">").
    Here is what i succeded to put down so far :


    sub delete_non_xml_lines
    {
    my $search = new File::List($xmldir);
    my @files = @{ $search->find("textfile") };

    foreach (@files)
    {
    my $file = $_;
    open(FILE, "< $file") or die "Can't open $file : $!";
    while(<FILE>)
    {
    print if $_ =~ />$/;
    }
    close FILE;
    }
    }


    But how can I redirect the output for each processed file into an xml
    file ?

    thanks in advance for helping

    Regards
    Mr_Noob, Mar 26, 2008
    #1
    1. Advertising

  2. Mr_Noob wrote:
    >
    > I try to write a perl script that would delete all non xml complient
    > lines (ie beginning with "<" and ending ">").


    <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V4.1//EN">
    <article>
    <sect1>
    <title>Observations on XML structure</title>
    <para>This is a valid XML document.
    Most of the lines don't start with an &lt; symbol.
    Some of the lines don't end with an &gt; symbol.
    Yet it is still valid XML.</para>
    </sect1>
    </article>

    --
    RGB
    RedGrittyBrick, Mar 26, 2008
    #2
    1. Advertising

  3. Mr_Noob

    Ben Morrow Guest

    Quoth Mr_Noob <>:
    >
    > I try to write a perl script that would delete all non xml complient
    > lines (ie beginning with "<" and ending ">").
    > Here is what i succeded to put down so far :
    >
    > sub delete_non_xml_lines
    > {
    > my $search = new File::List($xmldir);


    Indirect object syntax (new Foo) is unreliable and can parse
    incorrectly. Use

    my $search = File::List->new($xmldir);

    instead.

    > my @files = @{ $search->find("textfile") };
    >
    > foreach (@files)
    > {
    > my $file = $_;


    This is silly. Use

    foreach my $file (@files) {

    instead.

    > open(FILE, "< $file") or die "Can't open $file : $!";


    It is safer to use lexical filehandles and three-arg open.

    open(my $FILE, '<', $file) or die ...;

    [...from below the code...]
    > But how can I redirect the output for each processed file into an xml
    > file ?


    To write the output to a new file, you need

    open(my $XML, '>', "$file.xml") or die ...;
    select $XML;

    Note that this will leave $XML selected as your default output
    filehandle. If you are expecting to write to STDOUT later, you will need
    to select it again. Alternatively, you could use SelectSaver:

    my $ss = SelectSaver->new($XML);

    which will re-select STDOUT when $ss goes out of scope.

    > while(<FILE>)
    > {
    > print if $_ =~ />$/;


    $_ is the default match, so

    print if />$/;

    > }
    > close FILE;


    If you use lexical filehandles, there's no need to explicitly close
    files opened for reading. Files opened for writing should be explicitly
    closed, and the return value of close checked, to catch errors writing
    (such as a full disk). close will return an error if any of the writes
    failed, so there's no need to check each print (unless you are expecting
    errors and want to abort early).

    close $XML or die "can't write to $file.xml: $!";

    Ben
    Ben Morrow, Mar 26, 2008
    #3
  4. Mr_Noob

    szr Guest

    Ben Morrow wrote:
    > Quoth Mr_Noob <>:
    >>
    >> I try to write a perl script that would delete all non xml complient
    >> lines (ie beginning with "<" and ending ">").
    >> Here is what i succeded to put down so far :
    >>
    >> sub delete_non_xml_lines
    >> {
    >> my $search = new File::List($xmldir);

    >
    > Indirect object syntax (new Foo) is unreliable and can parse
    > incorrectly. Use


    I don't deny that that is god advise, though I personally have never had
    any problems creating an option using "my $o = new Foo(...);" as opposed
    to "my $o = Foo->new(...);"... as long as you know the potential
    problems, they are easy to avoid. Namely, watch those parens :)

    --
    szr
    szr, Mar 27, 2008
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    362
  2. ricky
    Replies:
    1
    Views:
    777
    Joris Gillis
    Oct 17, 2004
  3. Bruce
    Replies:
    0
    Views:
    243
    Bruce
    Nov 13, 2007
  4. zhivago

    How to get rid of horizontal and vertical lines in DataGrid

    zhivago, Mar 20, 2006, in forum: ASP .Net Datagrid Control
    Replies:
    1
    Views:
    131
    Eliyahu Goldin
    Mar 20, 2006
  5. David Ainley
    Replies:
    3
    Views:
    155
    Jesús Gabriel y Galán
    Aug 4, 2010
Loading...

Share This Page