Encode XML

  • Thread starter Warrick FitzGerald
  • Start date
W

Warrick FitzGerald

Hi All,

When sending an XML \ SOAP request, if the value looks as follows:

<TEST> My Test & Son </TEST>

How should the & be encoded. What is this "TYPE" of encoding called, and
where can I find a spec on which values should and should not be encoded
.... I know the W3C will have something but I can't for the life of me
decode it from their docs.

Thanks
Warrick
 
B

Bob Foster

Warrick FitzGerald said:
When sending an XML \ SOAP request, if the value looks as follows:

<TEST> My Test & Son </TEST>

How should the & be encoded.
&amp;

What is this "TYPE" of encoding called

Predefined entities. You could also use numeric character references like
& but that requires looking up (uni) codes.
and
where can I find a spec on which values should and should not be encoded

XML 1.0 spec.
... I know the W3C will have something but I can't for the life of me
decode it from their docs.

To boil it down, two characters need to be escaped if they appear in
attribute values or text outside of CDATA sections:

& as &amp; or equivalent
< as &lt; or equivalent

Two other characters are sometimes handy to escape:

" as &quot; or equivalent
' as &apos; or equivalent

The other predefined entity is &gt; but AFAIK you never _need_ to use it.

Bob Foster
 
A

Arto V. Viitanen

Warrick> Hi All, When sending an XML \ SOAP request, if the value looks as
Warrick> follows:

Warrick> <TEST> My Test & Son </TEST>

Warrick> How should the & be encoded. What is this "TYPE" of encoding
Warrick> called, and where can I find a spec on which values should and
Warrick> should not be encoded ... I know the W3C will have something but I
Warrick> can't for the life of me decode it from their docs.

I am not sure if it is enought, but ZSI (a Python SOAP package) uses
following encodings for text:

& &amp;
< &lt;
"> &gt;
\015

and following for attributes

& &amp;
< &lt;
" &quot;
\011
\012

\015

(from file compat.py)
 
A

arnold m. slotnik

The other predefined entity is &gt; but AFAIK you never _need_
to use it.

Not quite "never". If you have the string ]]> in content, but not as
the end of a CDATA section, you need to escape the >.
 
R

Richard Tobin

arnold m. slotnik said:
Not quite "never". If you have the string ]]> in content, but not as
the end of a CDATA section, you need to escape the >.

Or one of the ]s.

-- Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,763
Messages
2,569,563
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top