M
mathias wündisch
dear group,
i have a little problem with the automatic conversion from unicode
entities in real characters by XML:
OM:
arser (or XML:
arser). for
example i have the string '&x#A0;' in a xml source file and i want it
after parsing with XML:
OM:
arser also in the target xml file.
begin source file:
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<doc>
<name>Mathias Wuendisch</name>
</doc>
end source file:
begin perl script:
#!c:\perl\bin\perl.exe -w
use XML:
OM;
use strict;
&process_file( shift @ARGV );
sub process_file {
my $infile = shift;
my $dom_parser = new XML:
OM:
arser(NoExpand => 1,
ProtocolEncoding => 'iso-8859-1', ParseParamEnt => 0, ExpandParamEnt
=> 0) ;
my $doc = $dom_parser->parsefile( $infile ,NoExpand => 1,
ParseParamEnt => 0, ExpandParamEnt => 0) ;
print $doc->toString;
$doc->dispose;
}
exit;
end perl script:
after: perl xml-dom-test.pl test.xml > test1.xml
i have this
begin target file:
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<doc>
<name>Mathias Wuendisch</name>
</doc>
end target file:
i've read the sourceforge faq and i've found a solution for "named
entities" like this:
---
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<!DOCTYPE doc [
<!ENTITY nbsp " " >
]>
<doc>
<name>Mathias Wuendisch</name>
</doc>
---
ok, than the "named entity" is also in the target file... but
what is with "unnamed entities" like &x#A0; ? why did the NoExpand
flag or ExpandParamEnt flag not work for me? any suggestions?
kind regards,
mathias wündisch
i have a little problem with the automatic conversion from unicode
entities in real characters by XML:
example i have the string '&x#A0;' in a xml source file and i want it
after parsing with XML:
begin source file:
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<doc>
<name>Mathias Wuendisch</name>
</doc>
end source file:
begin perl script:
#!c:\perl\bin\perl.exe -w
use XML:
use strict;
&process_file( shift @ARGV );
sub process_file {
my $infile = shift;
my $dom_parser = new XML:
ProtocolEncoding => 'iso-8859-1', ParseParamEnt => 0, ExpandParamEnt
=> 0) ;
my $doc = $dom_parser->parsefile( $infile ,NoExpand => 1,
ParseParamEnt => 0, ExpandParamEnt => 0) ;
print $doc->toString;
$doc->dispose;
}
exit;
end perl script:
after: perl xml-dom-test.pl test.xml > test1.xml
i have this
begin target file:
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<doc>
<name>Mathias Wuendisch</name>
</doc>
end target file:
i've read the sourceforge faq and i've found a solution for "named
entities" like this:
---
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<!DOCTYPE doc [
<!ENTITY nbsp " " >
]>
<doc>
<name>Mathias Wuendisch</name>
</doc>
---
ok, than the "named entity" is also in the target file... but
what is with "unnamed entities" like &x#A0; ? why did the NoExpand
flag or ExpandParamEnt flag not work for me? any suggestions?
kind regards,
mathias wündisch