XML compression

Discussion in 'XML' started by Ed Beroset, Dec 11, 2004.

  1. Ed Beroset

    Ed Beroset Guest

    I have an XML file that I want to squeeze down as small as possible for
    storage in an embedded device. I want it to still be a valid XML file
    (and not something like a binary ASN.1 encoding of an XML file) but it
    does not need to carry the long tags it currently has as long as I
    create an XSLT which will put it back into the right form. What I had
    in mind was something like this:

    <original-xml-fragment>
    <very-long-and-verbose-tag name="Long tag 1">
    <more-information-is-stored-here name="stuff 1"/>
    </very-long-and-verbose-tag>
    <very-long-and-verbose-tag name="Long tag 2">
    <more-information-is-stored-here name="stuff 2"/>
    <valuable-additional-information name="foo"/>
    </very-long-and-verbose-tag>
    </original-xml-fragment>

    I'm thinking of transforming it to this:

    <o><v n="Long tag 1"><m n="stuff 1"/></v><v n="Long tag 2"><m n="stuff
    2"/><v2 n="foo"/></v></o>

    My question is, has someone already generated an XSLT that would
    abbreviate tags in this kind of way AND generate the corresponding
    "decoder" XSLT which would reconstitute the original. I have ideas
    about how to do it using a procedural language, but I would like to do
    it entirely with XSL transforms if I can.

    The only part that I don't really know how to do is to automatically
    generate short, unique abbreviations for each of the tags. I *could*
    specify them all manually once, but I'd prefer an automatic solution to
    simplify maintenance.

    Ed
    Ed Beroset, Dec 11, 2004
    #1
    1. Advertising

  2. Ed Beroset

    Joris Gillis Guest

    > I have an XML file that I want to squeeze down as small as possible for
    > storage in an embedded device. I want it to still be a valid XML file
    > (and not something like a binary ASN.1 encoding of an XML file) but it
    > does not need to carry the long tags it currently has as long as I
    > create an XSLT which will put it back into the right form. What I had
    > in mind was something like this:
    >
    > <original-xml-fragment>
    > <very-long-and-verbose-tag name="Long tag 1">
    > <more-information-is-stored-here name="stuff 1"/>
    > </very-long-and-verbose-tag>
    > <very-long-and-verbose-tag name="Long tag 2">
    > <more-information-is-stored-here name="stuff 2"/>
    > <valuable-additional-information name="foo"/>
    > </very-long-and-verbose-tag>
    > </original-xml-fragment>
    >
    > I'm thinking of transforming it to this:
    >
    > <o><v n="Long tag 1"><m n="stuff 1"/></v><v n="Long tag 2"><m n="stuff
    > 2"/><v2 n="foo"/></v></o>
    >
    > My question is, has someone already generated an XSLT that would
    > abbreviate tags in this kind of way AND generate the corresponding
    > "decoder" XSLT which would reconstitute the original. I have ideas
    > about how to do it using a procedural language, but I would like to do
    > it entirely with XSL transforms if I can.
    >
    > The only part that I don't really know how to do is to automatically
    > generate short, unique abbreviations for each of the tags. I *could*
    > specify them all manually once, but I'd prefer an automatic solution to
    > simplify maintenance.


    Hi,

    I've created this little stylesheet that will map all unique nodes names and give them abbreviations. It might be handy as an intermediate step towards a solution for your - btw very interesting- question.

    <?xml version="1.0" encoding="utf-8"?>
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:eek:utput method="xml" indent="yes"/>

    <xsl:key name="name" match="*|@*" use="local-name()"/>

    <xsl:template match="/">
    <name-mapping>
    <xsl:for-each select="//*[generate-id()=generate-id(key('name',local-name()))]|//@*[generate-id()=generate-id(key('name',local-name()))]">
    <name>
    <xsl:attribute name="s"><xsl:number value="position()" format="a"/></xsl:attribute>
    <xsl:value-of select="local-name()"/>
    </name>
    </xsl:for-each>
    </name-mapping>
    </xsl:template>

    </xsl:stylesheet>



    this will generate the following output:

    <name-mapping>
    <name s="a">original-xml-fragment</name>
    <name s="b">very-long-and-verbose-tag</name>
    <name s="c">name</name>
    <name s="d">more-information-is-stored-here</name>
    <name s="e">valuable-additional-information</name>
    </name-mapping>

    regards,
    --
    Joris Gillis (http://www.ticalc.org/cgi-bin/acct-view.cgi?userid=38041)
    Ceterum censeo XML omnibus esse utendum
    Joris Gillis, Dec 11, 2004
    #2
    1. Advertising

  3. Ed Beroset

    Joris Gillis Guest

    >> My question is, has someone already generated an XSLT that would
    >> abbreviate tags in this kind of way AND generate the corresponding
    >> "decoder" XSLT which would reconstitute the original. I have ideas
    >> about how to do it using a procedural language, but I would like to do
    >> it entirely with XSL transforms if I can.
    >>
    >> The only part that I don't really know how to do is to automatically
    >> generate short, unique abbreviations for each of the tags. I *could*
    >> specify them all manually once, but I'd prefer an automatic solution to
    >> simplify maintenance.

    >
    > this will generate the following output:
    >
    > <name-mapping>
    > <name s="a">original-xml-fragment</name>
    > <name s="b">very-long-and-verbose-tag</name>
    > <name s="c">name</name>
    > <name s="d">more-information-is-stored-here</name>
    > <name s="e">valuable-additional-information</name>
    > </name-mapping>
    >

    Hi, again

    given that it is allowed to use two steps of tranformation, you can do this:
    Unleash the above stylesheet on the verbose XML and let it output to a file named 'name-map.xml'.

    When you apply the following stylesheet, the verbose XML will be reduced.

    <?xml version="1.0" encoding="utf-8"?>
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:eek:utput method="xml" indent="yes"/>

    <xsl:template match="*">
    <xsl:variable name="name" select="local-name()"/>
    <xsl:element name="{document('name-map.xml')//name[.=$name]/@s}">
    <xsl:apply-templates select="@*"/>
    <xsl:apply-templates/>
    </xsl:element>
    </xsl:template>

    <xsl:template match="@*">
    <xsl:variable name="name" select="local-name()"/>
    <xsl:attribute name="{document('name-map.xml')//name[.=$name]/@s}">
    <xsl:value-of select="."/>
    </xsl:attribute>
    </xsl:template>

    </xsl:stylesheet>


    The reduced form will look like this:
    <a>
    <b c="Long tag 1">
    <d c="stuff 1"/>
    </b>
    <b c="Long tag 2">
    <d c="stuff 2"/>
    <e c="foo"/>
    </b>
    </a>


    And this stylesheet will expand it again to the original verbose form:

    <?xml version="1.0" encoding="utf-8"?>
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:eek:utput method="xml" indent="yes"/>

    <xsl:template match="*">
    <xsl:variable name="name" select="local-name()"/>
    <xsl:element name="{document('name-map.xml')//name[@s=$name]}">
    <xsl:apply-templates select="@*"/>
    <xsl:apply-templates/>
    </xsl:element>
    </xsl:template>

    <xsl:template match="@*">
    <xsl:variable name="name" select="local-name()"/>
    <xsl:attribute name="{document('name-map.xml')//name[@s=$name]}">
    <xsl:value-of select="."/>
    </xsl:attribute>
    </xsl:template>

    </xsl:stylesheet>


    I hope this is useful.

    regards,

    --
    Joris Gillis (http://www.ticalc.org/cgi-bin/acct-view.cgi?userid=38041)
    Ceterum censeo XML omnibus esse utendum
    Joris Gillis, Dec 11, 2004
    #3
  4. Ed Beroset

    Ed Beroset Guest

    Joris Gillis wrote:
    [big snip of useful, working XSLT]
    >
    > I hope this is useful.


    It's more than useful -- it's superb! Thanks very much. When I figure
    out how to combine them into a single step, I'll post the result.

    Ed
    Ed Beroset, Dec 11, 2004
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jens Mander
    Replies:
    0
    Views:
    493
    Jens Mander
    Jun 10, 2005
  2. Jens Mander
    Replies:
    2
    Views:
    1,357
    Jerry Coffin
    Sep 1, 2005
  3. VP
    Replies:
    1
    Views:
    526
    Joerg Jooss
    Feb 28, 2006
  4. Robert Metzger

    XML compression

    Robert Metzger, Nov 7, 2003, in forum: XML
    Replies:
    0
    Views:
    352
    Robert Metzger
    Nov 7, 2003
  5. Erik Wasser
    Replies:
    5
    Views:
    437
    Peter J. Holzer
    Mar 5, 2006
Loading...

Share This Page