Works fine on one file, but not on many

Discussion in 'Perl Misc' started by beartiger, Oct 9, 2005.

  1. beartiger

    beartiger Guest

    Can anyone tell me why the folling script works on one file but not on
    glob(*.html)?

    The script is designed to take the following line:

    <filestatus owner="jeh" status="1"/>

    Strip out the "status" value and then translate it into a string that
    appears to the user at the bottom of the file.

    When I run the script in glob mode in a dir of .html files, "$stat" is
    filled with the entire contents of each file [!]. When I run it on a
    single file, "$stat" is correctly filled only with the status value and
    the script runs as intended and correctly updates the file.

    ======================CODE STARTS============================

    use strict;
    use warnings;



    my @files;

    if ($ARGV[0])
    {
    @files= $ARGV[0];
    } else
    {
    @files=glob('*.html');
    }

    my $stat;


    foreach my $file (@files)
    {
    # first get the file's status

    #LOAD
    open (HTM, "$file") || die("Error
    Reading File: $file $!");
    while(<HTM>)
    {
    if(m/filestatus/)
    {
    s/.*<filestatus.*?status=\"(\d)\".*?>.*/$1/g;
    $stat=$_;
    $_="";
    }
    }
    close (HTM) || die("Error
    Closing File: $file $!");


    # then, based on the code, update the bottom line


    open (IN, "$file") || die("Error
    Reading File: $file $!");
    undef $/; my $remthis= <IN>;
    close (IN) || die("Error
    Closing File: $file $!");

    #CHANGE

    if ($stat==0) {$remthis=~ s/.*?<.html>/<font
    color=#CCCCCC>Placeholder<\/font><\/html>/g;}
    if ($stat==1) {$remthis=~ s/.*?<.html>/<font
    color=#CCCCCC>First draft<\/font><\/html>/g;}
    if ($stat==2) {$remthis=~ s/.*?<.html>/<font
    color=#CCCCCC>Peer Reviewed<\/font><\/html>/g;}
    if ($stat==3) {$remthis=~ s/.*?<.html>/<font
    color=#CCCCCC>SME Reviewed<\/font><\/html>/g;}
    if ($stat==4) {$remthis=~ s/.*?<.html>/<font
    color=#CCCCCC>Final draft<\/font><\/html>/g;}

    #WRITE
    open (OUT, ">$file") || die("Error
    Writing File: $file $!");
    print OUT $remthis;
    close (OUT) || die("Error
    Closing File: $file $!");
    }

    ========================CODE ENDS===========================


    Many thanks in advance,
    John
     
    beartiger, Oct 9, 2005
    #1
    1. Advertisements

  2. beartiger

    Matt Garrish Guest

    You're doing a variety of bad things in your code. For instance, you undef
    $/ the first time through the loop, which means it will be undefined for the
    rest of your script (local $/ = undef;). You're needlessly reading the file
    twice. You use stat globally, so it doesn't matter if a file matches as it
    will automatically get the last value of stat if it doesn't. You quote
    variables that don't need quoting. You are doing case-sensitive matches and
    not specifying the /s modifier after slurping the files, which may or may
    not explain why only some files work. I'm also not sure why you have
    "<.html>" on one side of your final substitution and "<\/html>" on the
    other, but regardless you're blowing away everything in the file in those
    final substitutions.

    Anyway, see if the following does what you want:

    use strict;
    use warnings;

    my @files;

    if (@ARGV) {
    push @files, $ARGV[0];
    }
    else {
    @files=glob('*.html');
    }

    foreach my $file (@files) {

    my $filecontent;

    {
    open(my $in, '<', $file) or die "Error Reading $file: $!";
    local $/ = undef;
    $filecontent = <$in>;
    close($in) or die "Error Closing $file: $!";
    }

    my $stat;

    if ($filecontent =~ m/<filestatus[^>]+status="(\d)"/is) {
    $stat=$1;
    }

    else {
    print "Error: no stat value for $file. Unable to process.\n";
    next;
    }

    if ($stat==0) {$filecontent=~ s|</html>|<font
    color=#CCCCCC>Placeholder</font></html>|gis;}
    elsif ($stat==1) {$filecontent=~ s|</html>|<font color=#CCCCCC>First
    draft</font></html>|gis;}
    elsif ($stat==2) {$filecontent=~ s|</html>|<font color=#CCCCCC>Peer
    Reviewed</font></html>|gis;}
    elsif ($stat==3) {$filecontent=~ s|</html>|<font color=#CCCCCC>SME
    Reviewed</font></html>|gis;}
    elsif ($stat==4) {$filecontent=~ s|</html>|<font color=#CCCCCC>Final
    draft</font></html>|gis;}

    open(my $out, '>', $file) or die "Error Writing $file: $!";
    print $out $filecontent;
    close($out) or die "Error Closing $file: $!";

    }
     
    Matt Garrish, Oct 9, 2005
    #2
    1. Advertisements

  3. You are changing the value of $/ which is a global variable. Change:

    undef $/; my $remthis= <IN>;

    To:

    my $remthis = do { local $/; <IN> };


    John
     
    John W. Krahn, Oct 9, 2005
    #3
  4. beartiger

    Matt Garrish Guest

    Oops, I forgot to remove the /g modifier you added to the last substitution.
    It's pointless unless there really are multiple closing html tags in your
    files.

    Matt
     
    Matt Garrish, Oct 9, 2005
    #4
  5. beartiger

    beartiger Guest

    Matt Garrish wrote:
    <snip excellent response>

    Matt, many many thanks. Your revised code worked better, but appended
    the string to the existing string in the file a la:

    <font color=#CCCCCC>Placeholder</font><font
    color=#CCCCCC>Placeholder</font></html>

    Neverthess, I (hope I) learned some things from your response.

    The revision below works well. I found that making stat a string rather
    than int was smarter, since the author could put, say, "A" in the
    status value.

    I hope I didn't repeat any of my earlier mistakes in this one, or
    introduce new ones:

    ==========================CODE BEGINS===========================

    use strict;
    use warnings;

    my @files;
    my @lines;

    if (@ARGV) { push @files, $ARGV[0]; } else { @files=glob('*.html'); }

    foreach my $file (@files) {

    {
    open(my $in, '<', $file) or die "Error Reading $file: $!";

    my $stat="";

    while(<$in>)
    {
    if (/<filestatus[^>]+status="(\d)"/is) { $stat=$1; }


    if($stat=~m/0/) {s|^<font
    color=#CCCCCC>.*?</font></html>|<font
    color=#CCCCCC>Placeholder</font></html>|is; }
    elsif($stat=~m/1/) {s|^<font
    color=#CCCCCC>.*?</font></html>|<font color=#CCCCCC>First
    draft</font></html>|is; }
    elsif($stat=~m/2/) {s|^<font
    color=#CCCCCC>.*?</font></html>|<font color=#CCCCCC>Peer
    reviewed</font></html>|is;}
    elsif($stat=~m/3/) {s|^<font
    color=#CCCCCC>.*?</font></html>|<font color=#CCCCCC>SME
    reveiwed</font></html>|is; }
    elsif($stat=~m/4/) {s|^<font
    color=#CCCCCC>.*?</font></html>|<font color=#CCCCCC>Final
    draft</font></html>|is; }
    else
    {s|^<font color=#CCCCCC>.*?</font></html>|<font
    color=#CCCCCC>BAD CODE: look at filestatus tags in
    source</font></html>|is;}

    push(@lines,$_);
    }

    close($in) or die "Error Closing $file: $!";
    }


    open(my $out, '>', $file) or die "Error Writing $file: $!";
    print $out @lines;
    close($out) or die "Error Closing $file: $!";

    }

    ==========================CODE ENDS===========================
     
    beartiger, Oct 9, 2005
    #5
  6. beartiger

    beartiger Guest

    John W. Krahn wrote:
    Thanks. I decided not to undef $/ at all. Is it a little risky?


    J
     
    beartiger, Oct 9, 2005
    #6
  7. It could be if the expression |^<font color=#CCCCCC>.*?</font></html>| is
    trying to match a multi-line string because the default is to put only one
    line in the $_ variable. This may work better for you (UNTESTED):


    #!/usr/bin/perl
    use strict;
    use warnings;

    use Fatal qw( :void open print close );

    @ARGV or @ARGV = glob '*.html';

    # Set in-place edit to create backups and unset Input Record Separator
    local ( $^I, $/ ) = '.bak';

    my %stats = (
    0 => 'Placeholder',
    1 => 'First draft',
    2 => 'Peer reviewed',
    3 => 'SME reveiwed',
    4 => 'Final draft',
    );

    my $stat = '';

    # In-place edit files in @ARGV
    while ( <> ) {
    if ( /<filestatus[^>]+?status="(\d)"/i ) {
    $stat = exists $stats{ $1 }
    ? $stats{ $1 }
    : 'BAD CODE: look at filestatus tags in source'
    }

    s|(?<=^<font color=#CCCCCC>).*?(?=</font></html>)|$stat|isg;

    print;

    if ( eof ) {
    $stat = '';
    close ARGV;
    }
    }

    __END__




    John
     
    John W. Krahn, Oct 9, 2005
    #7
  8. Impossible to tell without seeing the surrounding context. It *is* legal
    "shorthand" for an empty XML element - it's equivalent to:

    <filestatus owner="jeh" status="1"></filestatus>

    On the other hand, if there's a DTD that doesn't allow the filestatus
    element to be empty, or if there's a closing tag in addition to the
    standalone tag shown above, then the XML is bogus.

    sherm--
     
    Sherm Pendley, Oct 10, 2005
    #8
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.