XML to SGML entities

  • Thread starter =?iso-8859-1?q?Jean-Fran=E7ois_Michaud?=
  • Start date
?

=?iso-8859-1?q?Jean-Fran=E7ois_Michaud?=

Hello,

I was wondering if anybody could point me in the right direction
regarding this.

I have unicode entities in an XML in hexadecimal format and I need to
be able to convert to ISO entities. Are there facilities available to
do this easily or do I have to parse all text and convert everything
manually? If thats what I have to do, is there any code already
available that would orient me in the right direction?

This is my XML snippet.

XML:

<?xml version = "1.0" encoding = "UTF-8"?>
<root>
<para>Å Å å Ã β ε ϰ
λ μ</para>
</root>

I basically need to something like this:

SGML:

<root>
<para>&angst; &Aring; &aring; &Atilde; &b.beta; &b.epsi; &b.kappav;
&b.lambda; &b.mu;</para>
</root>

Thanks

Regards
Jeff
 
D

David Carlisle

Jean-François Michaud said:
Hello,

I was wondering if anybody could point me in the right direction
regarding this.

I have unicode entities in an XML in hexadecimal format and I need to
be able to convert to ISO entities. Are there facilities available to
do this easily or do I have to parse all text and convert everything
manually? If thats what I have to do, is there any code already
available that would orient me in the right direction?

This is my XML snippet.

XML:

<?xml version = "1.0" encoding = "UTF-8"?>
<root>
<para>Å Å å Ã β ε ϰ
λ μ</para>
</root>

I basically need to something like this:

SGML:

<root>
<para>&angst; &Aring; &aring; &Atilde; &b.beta; &b.epsi; &b.kappav;
&b.lambda; &b.mu;</para>
</root>

Thanks

Regards
Jeff


one way is to use xslt2 character maps, if I save your file as ent.xml,
saxon8 gives the following output if run with the stylesheet at the end
it's not quite the result you asked for but I think the bold greek
should map to the characters in plane1 so the grk3 entity names are used
rather than grk4. (It would be easy for you to take a local copy and
change that though)

David

$ saxon8 ent.xml ent.xsl
<?xml version="1.0" encoding="UTF-8"?><root>
<para>&angst; &Aring; &aring; &Atilde; &beta; &epsiv; &kappav;
&lambda; &mu;</para>
</root>



<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:import
href="http://www.w3.org/2003/entities/iso9573-2003/iso9573-2003map.xsl"/>
<xsl:eek:utput use-character-maps="iso9573-2003"/>
<xsl:template match="/">
<xsl:copy-of select="/"/>
</xsl:template>

</xsl:stylesheet>
 
?

=?iso-8859-1?q?Jean-Fran=E7ois_Michaud?=

David said:
one way is to use xslt2 character maps, if I save your file as ent.xml,
saxon8 gives the following output if run with the stylesheet at the end
it's not quite the result you asked for but I think the bold greek
should map to the characters in plane1 so the grk3 entity names are used
rather than grk4. (It would be easy for you to take a local copy and
change that though)

David

$ saxon8 ent.xml ent.xsl
<?xml version="1.0" encoding="UTF-8"?><root>
<para>&angst; &Aring; &aring; &Atilde; &beta; &epsiv; &kappav;
&lambda; &mu;</para>
</root>



<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:import
href="http://www.w3.org/2003/entities/iso9573-2003/iso9573-2003map.xsl"/>
<xsl:eek:utput use-character-maps="iso9573-2003"/>
<xsl:template match="/">
<xsl:copy-of select="/"/>
</xsl:template>

</xsl:stylesheet>

Wow! More than I could ever ask for. This is exactly the kind of stuff
I was looking for. Thank you much for your help!! I will look into this
more closely.

Warm regards
Jean-Francois Michaud
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top