Finding and replacing Invalid Tokens in an XML document

B

Ben Holness

Hi all,

I have a system which allows users to enter a message on a (PHP) website.
This message is then put into a (MySQL) Database.

A perl script then picks up the message and creates an XML document.

The webpages, database and XML are all UTF-8, however every now and then I
get an error in the XML parser that tells me I have an invalid token. This
occurs when the message contains particular characters, although I don't
know which characters - all I can see in the logs is the ANSI
representation (e.g. @^C). If I copy & paste into word the I get a square
box after the @ that takes two right cursor presses to go past.

My script catches that there is an invalid token, but rather than fail the
message completely, I would like to replace the bad characters with a
space.
Is there a simple way to find these characters, or do I have to
write a function that looks at the output of $@ from the eval and work out
where the character is from the line/column/byte information in order to
fix it?

FYI, the XML is created and parsed with XML::Simple and UTF-8 encoded with
encode. I have included a simplified snippet (written into this post, so
may contain typos) at the end of the email.

Cheers,

Ben

-- Snippet of Code --

# $MessageText is pulled from the database and may contain bad
characters.

# Build an array of the elements
my %arr;
$arr{'Message'}=encode("UTF-8", $MessageText);

# Convert the array into an XML Document with XMLOut
my $tempxml = new XML::Simple (NoAttr=>1, RootName=>'WebMessage');
my $xmldoc = "<?xml version=\"1.0\" encoding=\"UTF-8\">";
$xmldoc .= $tempxml->XMLout(\$arr);

# Parse the XML Document
my $tempxml2 = new XML::Simple (ForceArray => 1);
eval ($tempxml2->XMLin($xmldoc);};
if ($@)
{
# An error occurred. Usually an invalid token due to a bad character
# in $MessageText
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top