Search/Replace text in XML file

L

Lax

Hello all,
I'm trying to search and replace the value of a tag in an xml file.
I'm not in a position to use the usual XML parsers as the version of
Perl I'm required to use
doesnt contain any of the XML libraries. I can use Text::Balanced, but
I want to deal with the xml file on a
line-by-line basis, as the value of my tag could strecth over multiple-
lines.

Perl Version:
This is perl, v5.8.7 built for sun4-solaris

Sample xml file:
-------------------------

<project xmlns="xml:header">

<version>1.0.0</version>

<SomeTag>
<version>invalid version</version>
</SomeTag>


<SomeAnotherTagNested1>
<SomeAnotherTagNested2>
<SomeAnotherTagNested3>
<version>invalid version</version>
</SomeAnotherTagNested3>
</SomeAnotherTagNested2>
</SomeAnotherTagNested1>

<version>stand-alone, but not valid either</version>

</project>

-------------------------

I only want the version tag when they're not enclosed in any other
tags.
I want to replace the 1.0.0 (an example value) with 2.0.0 on an stand-
alone "version"'s first occurence.
I came up with the following:

--------------------

#!/usr/local/bin/perl

use strict ;
use File::Copy ;

die "Usage: replace.pl <xml file>!\n" unless ( $#ARGV == 0 ) ;
my $file = shift ;

open(IN,"$file") or die "Cant open file: $!\n" ;
chomp(my @arr = <IN> ) ;
close(IN) ;

open(OUT,"> bak") or die "Cant open file: $!\n" ;

# Two flags,
# $tag_flag -- to check if we're inside a tag
# $version_flag -- to check if we've replaced version tag already.

my $tag_flag = "off" ;
my $version_flag = "off" ;

foreach my $line ( @arr )
{
# Dont consider the open and close of top-level <project> tag.
if ( $line =~ /^\s*\<(\/)?project/ )
{
print OUT "$line\n" ;
next ;
}

# Found <version>, replace version string if tag_flag is on and
version_flag is off.
elsif ( ($line =~ /^\s*\<version\>/) && ( $tag_flag eq "off" ) &&
( $version_flag eq "off" ) )
{
# print "Flag: $flag\n" ;
print OUT "<version>2.0.0</version>\n" ;
$tag_flag = "on" ;
$version_flag = "on" ;
}

# Inside an open tag "<", tag_flag on.
elsif ( ( $line =~ /^\s*\<.*\>/ ) && ( $line !~ /^\s*\<\/.*
\>/ ) )
{
print OUT "$line\n" ;
$tag_flag = "on" ;
}

# Inside a close tag "</", tag_flag on.
elsif ( $line =~ /^\s*\<\/.*\>/ )
{
print OUT "$line\n" ;
$tag_flag = "off" ;
} else {
print OUT "$line\n" ;
}
}
close(OUT) ;

# Move bak file to original

------------------------------------------

The above script works, and a "diff bak <xml-file>" gives me the
expected result when the stand-alone <version> is all on one line, I
cant get this working when its extended over multiple-lines.

Could anyone give me some pointers, please?

Thanks,
Lax
 
L

Lax

        # Found <version>, replace version string if tag_flag is on and
version_flag is off.
        # Inside an open tag "<", tag_flag on.
        # Inside a close tag "</", tag_flag on.

Please ignore the inaccurate values for off/on in the comments, the
code has proper values for the flags, sorry.

Thanks,
Lax
 
J

John W. Krahn

Jim said:
Lax said:
Hello all,
I'm trying to search and replace the value of a tag in an xml file.
I'm not in a position to use the usual XML parsers as the version of
Perl I'm required to use
doesnt contain any of the XML libraries. I can use Text::Balanced, but
I want to deal with the xml file on a
line-by-line basis, as the value of my tag could strecth over multiple-
lines.

[data, program snipped]
------------------------------------------

The above script works, and a "diff bak <xml-file>" gives me the
expected result when the stand-alone <version> is all on one line, I
cant get this working when its extended over multiple-lines.

Could anyone give me some pointers, please?

Read the entire file into a single scalar:

my $contents = do { local $/; <IN> };

Then add the /s modifier to your regular expression so that the '.'
special character will match the newlines embedded in your string.

See 'perldoc 'q entire' and 'perldoc perlre'.
ITYM: perldoc -q entire



John
 
T

Tad J McClellan

Lax said:
I'm trying to search and replace the value of a tag in an xml file.


No you're not.

You are trying to search and replace the value of an element in an xml file.

See the XML FAQ:

http://xml.silmaril.ie/authors/makeup/

Sample xml file:
-------------------------

<project xmlns="xml:header">

<version>1.0.0</version>

<SomeTag>
<version>invalid version</version>
</SomeTag>


<SomeAnotherTagNested1>
<SomeAnotherTagNested2>
<SomeAnotherTagNested3>
<version>invalid version</version>
</SomeAnotherTagNested3>
</SomeAnotherTagNested2>
</SomeAnotherTagNested1>

<version>stand-alone, but not valid either</version>

</project>


It is not legal in XML for a tag to enclose any other tag.

(tags start with a '<' and end with a '>')


You must have meant "element" where you said "tag".

In that case, there ARE NO version elements that are not enclosed
in any other elements!

I want to replace the 1.0.0 (an example value) with 2.0.0


That element is enclosed in the project element.

on an stand-
alone "version"'s first occurence.


You want to replace the 1.0.0 with 2.0.0 on the first version element
that is a child of the document element (the project element in this case).

(in which case you have a poor example input, as a solution that
operates on the first <version> anywhere in the file will work
for that input...
)
The above script works, and a "diff bak <xml-file>" gives me the
expected result when the stand-alone <version> is all on one line, I
cant get this working when its extended over multiple-lines.


Extended over multiple lines in what manner? Like this:

<version
1.0.0</version>

or like

<version>
1.0.0</version>

or like

<version>
1.0.0
</version>


??

Those all are legal XML, but none of them are equivalent, they each
have different content.

Could anyone give me some pointers, please?


If I could unambiguously figure out what you really want I probably could...
 
T

Tad J McClellan

Lax said:
#!/usr/local/bin/perl

use strict ;


You should always enable warnings when developing Perl code:

use warnings;

die "Usage: replace.pl <xml file>!\n" unless ( $#ARGV == 0 ) ;


That is more clearly written as:

my $file = shift ;

open(IN,"$file") or die "Cant open file: $!\n" ;


perldoc -q vars

What's wrong with always quoting "$vars"?

open(IN, $file) or die "Cant open file: $!\n" ;

(and nowadays you should use the 3-argument form of open() instead.)

chomp(my @arr = <IN> ) ;


Here you remove the newline from every line, and below you add a
newline to every line.

Why remove them only to put them back?

foreach my $line ( @arr )


If you are going to process the file line-by-line anyway, then why
bother reading the entire file into memory when one line at a time
in memory will work?

if ( $line =~ /^\s*\<(\/)?project/ )


The parenthesis in that pattern serve no purpose, so why include them?

Angle brackets are not special in regular expressions, so they
do not need backslashing.

If you choose some other delimiter for your match operator, then
the slash will not need backslashing either:

if ( $line =~ m#^\s*</?project# )

I
cant get this working when its extended over multiple-lines.


Then don't process the file line-by-line.

Could anyone give me some pointers, please?


perldoc -q match

I'm having trouble matching over more than one line. What's wrong?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top