Re: xsl/xslt pipeline: missing core concept?

Discussion in 'XML' started by Mike Brown, Aug 3, 2003.

  1. Mike Brown

    Mike Brown Guest

    "James A. Robinson" <> wrote
    > What do you do
    > when you need to take your unicode rich document and transform it to a
    > format which which needs something other numeric character references?


    I'm having trouble parsing that question.

    > For example, what would I do if I wanted to output an XML document to a
    > format where I needed to replace the greek delta character with a very
    > specific string


    As you probably discovered, XPath 1.0 / XSLT 1.0 only has one character
    substitution function, translate(), which can only replace one character
    with one character throughout a string. However, there are some substring
    functions which can be applied in a recursive template to replace a
    character with a string. See the XSLT FAQ at http://www.dpawson.co.uk/.
    Other options: Dimitre's FXSL library has a string map template that is
    probably more efficient -- see
    http://sources.redhat.com/ml/xsl-list/2002-09/msg01172.html. And some
    processors may support EXSLT's str:replace() as documented at
    http://exslt.org/str/functions/replace/index.html (the page says none do
    natively, but I know of at least one that does, and there are some templates
    available for download that simulate native support).

    > (say to something in TeX, or just 'delta' if outputting
    > to a plain text us-ascii encoded file)? I think I understand how core
    > xml parsing (input) handles unicode (be it native encoding or NCR), but
    > I don't understand what's supposed to happen when transforming the XML
    > to an output which does NOT support unicode or SGML style character
    > entities.


    When XML is parsed, the bytes of the encoded document, the NCRs, the entity
    refs, and all other lexical hoo-hah goes away and you're left with a
    structured arrangement of Unicode strings representing the essential parts
    of the information in the document (elements, attributes, character data,
    etc.). XSLT does its business on this information, constructing a new tree.

    This new tree is (optionally) output somehow. Processors all support
    serialization in "text", "xml", or "html" formats, each of which spits out
    bytes in some encoding, depending on what you asked for in the xsl:eek:utput
    instruction. The text output method just emits the character data from text
    nodes, no others. Unencodable characters (e.g., you wanted to output a
    Chinese character in ASCII) are typically replaced with '?', omitted
    entirely, or an error is raised, depending on the implementation (the spec
    doesn't mandate what should be done). XML and HTML methods emit all nodes
    according to the syntax rules of XML or HTML, so for example an element node
    gets a start tag and end tag, or empty element tag if it has no content. In
    these output methods, unencodable characters occurring in character data or
    attribute values are replaced with an entity reference or NCR, as would be
    most appropriate.

    > I am aware of the possibility of using an XSLT function which does a
    > substring search and replace for characters, but surely there is a
    > better option? I'm also aware I can do a final stage processing with a
    > tool of my own design (something which walks through and replaces
    > unicode characters with something else from a mapping). But is there a
    > core XSLT or other XML technology which I am missing that is intended
    > to solve this type of problem? Which lets me say that, if I'm
    > outputting a US-ASCII encoded document that instead of Δ I want
    > to dump out 'Delta' or '$\Delta$' or '<IMG SRC="/images/Delta.gif">'?


    The general idea of the recursive template I mentioned above is demonstrated
    here for substring-to-substring replacement:
    http://skew.org/xml/stylesheets/replace/

    and here for substring-to-element replacement:
    http://skew.org/xml/stylesheets/linefeed2br/

    They should get you going in the right direction.
    Mike Brown, Aug 3, 2003
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    737
  2. Andy Fish
    Replies:
    0
    Views:
    431
    Andy Fish
    Jul 30, 2003
  3. Vijay singh
    Replies:
    1
    Views:
    433
    Martin Honnen
    Nov 4, 2004
  4. Replies:
    1
    Views:
    3,596
    A. Bolmarcich
    May 27, 2005
  5. Gyoung-Yoon Noh
    Replies:
    1
    Views:
    96
    James Britt
    Dec 24, 2005
Loading...

Share This Page