Marking words in a text

Discussion in 'XML' started by Hvid Hat, Apr 11, 2008.

  1. Hvid Hat

    Hvid Hat Guest

    Hello

    How should I go about marking certain words in a text? I've got a list of
    words:

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="Mark.xsl"?>
    <Words>
    <Word>
    <Acronym>XML</Acronym>
    <Description>eXtensible Markup Language</Description>
    </Word>
    <Word>
    <Acronym>SGML</Acronym>
    <Description>Standard Generalized Markup Language</Description>
    </Word>
    <Word>
    <Acronym>ISO</Acronym>
    <Description>International Organization for Standardization</Description>
    </Word>
    </Words>

    I want the words (acronyms) above to be marked within bold-tags in the text
    below:

    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:eek:utput method="xml"
    version="1.0" encoding="UTF-8" indent="yes"/> <xsl:template match="Words">
    XML is a simple, very flexible text format derived from SGML (ISO 8879)
    </xsl:template>
    </xsl:stylesheet>

    Can someone help me on my way? :)
     
    Hvid Hat, Apr 11, 2008
    #1
    1. Advertising

  2. Hvid Hat wrote:

    > I want the words (acronyms) above to be marked within bold-tags in the text
    > below:
    >
    > <?xml version="1.0" encoding="UTF-8"?>
    > <xsl:stylesheet version="1.0"
    > xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:eek:utput method="xml"
    > version="1.0" encoding="UTF-8" indent="yes"/> <xsl:template match="Words">
    > XML is a simple, very flexible text format derived from SGML (ISO 8879)
    > </xsl:template>
    > </xsl:stylesheet>


    That "text" is an XSLT stylesheet with output method="xml" so it is not
    clear what you want to achieve? Do you want to take your acronym list
    and transform it to HTML to be rendered in a browser?

    That is possible with

    <xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">

    <xsl:eek:utput method="html" indent="yes"/>

    <xsl:template match="Words">
    <html lang="en">
    <head>
    <title>List of Acronymns</title>
    <style tyype="text/css">
    dt { font-weight: bold; }
    </style>
    </head>
    <body>
    <dl>
    <xsl:apply-templates select="Word"/>
    </dl>
    </body>
    </html>
    </xsl:template>

    <xsl:template match="Word">
    <dt>
    <xsl:value-of select="Acronym"/>
    </dt>
    <dd>
    <xsl:value-of select="Description"/>
    </dd>
    </xsl:template>

    </xsl:stylesheet>



    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
     
    Martin Honnen, Apr 11, 2008
    #2
    1. Advertising

  3. Hvid Hat

    Peter Flynn Guest

    Hvid Hat wrote:
    > Hello
    >
    > How should I go about marking certain words in a text? I've got a list of
    > words:


    If you mean you want to automate the application of markup to a
    document, by matching each word against your list of acronyms, then it's
    probably possible in XSLT (easier in XSLT2 than 1.0) but difficult when
    you need to handle things like "in XML's model" where the "word" is not
    delimited by spaces or markup boundaries. You'd have to use a recursive
    template to isolate each word in turn and test it against your list,
    which would be slow.

    ///Peter


    > <?xml version="1.0" encoding="UTF-8"?>
    > <?xml-stylesheet type="text/xsl" href="Mark.xsl"?>
    > <Words>
    > <Word>
    > <Acronym>XML</Acronym>
    > <Description>eXtensible Markup Language</Description>
    > </Word>
    > <Word>
    > <Acronym>SGML</Acronym>
    > <Description>Standard Generalized Markup Language</Description>
    > </Word>
    > <Word>
    > <Acronym>ISO</Acronym>
    > <Description>International Organization for Standardization</Description>
    > </Word>
    > </Words>
    >
    > I want the words (acronyms) above to be marked within bold-tags in the text
    > below:
    >
    > <?xml version="1.0" encoding="UTF-8"?>
    > <xsl:stylesheet version="1.0"
    > xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:eek:utput method="xml"
    > version="1.0" encoding="UTF-8" indent="yes"/> <xsl:template match="Words">
    > XML is a simple, very flexible text format derived from SGML (ISO 8879)
    > </xsl:template>
    > </xsl:stylesheet>
    >
    > Can someone help me on my way? :)
     
    Peter Flynn, Apr 11, 2008
    #3
  4. Peter Flynn wrote:
    > You'd have to use a recursive
    > template to isolate each word in turn and test it against your list,
    > which would be slow.


    Or have the stylesheet invoke an extension function written in a
    language better suited to this task.

    Personally, I think you should make this the author's responsibility.
    Maybe use the (slow) find-words-and-tag-them as an authoring tool to
    help them do so... but encourage them to use appropriate markup in the
    first place rather than trying to reverse-engineer their text.
     
    Joseph J. Kesselman, Apr 11, 2008
    #4
  5. Hvid Hat

    Hvid Hat Guest

    On 11-04-2008 21:12:48, Peter Flynn wrote:

    > If you mean you want to automate the application of markup to a
    > document, by matching each word against your list of acronyms, then it's
    > probably possible in XSLT (easier in XSLT2 than 1.0) but difficult when
    > you need to handle things like "in XML's model" where the "word" is not
    > delimited by spaces or markup boundaries. You'd have to use a recursive
    > template to isolate each word in turn and test it against your list,
    > which would be slow.


    I'm just playing around with XSLT to improve my skills so the performance is
    not important. I'll give it a try but if anyone can help me on my way, I'd be
    appreciated :)

    What if I wanted to mark up relating words in some text? Say I wanted to mark
    up countries consiting of more words, e.g. Faroe Islands, South Africa, New
    Zealand etc. Then I couldn't isolate each word in the text and make a
    comparision. Would I have to use a mix of contains, substring-before,
    substring-after?
     
    Hvid Hat, Apr 12, 2008
    #5
  6. Hvid Hat

    Hvid Hat Guest

    On 11-04-2008 23:15:56, "Joseph J. Kesselman" wrote:
    > Peter Flynn wrote:


    > Or have the stylesheet invoke an extension function written in a
    > language better suited to this task.


    I've written a few small extension functions in C#. I thought about writing
    an extension function to solve the problem. Any ideas on how to approach the
    problem. Create a comma-separated list of the words and pass the word list
    and the text to an extension function and have the function mark up the words
    and return the marked up text? Is it possible to access the XML containing
    the words from the extension function so I could make a List<string> within
    my extension function? Perhaps send the XML containing the words as a node
    set or something. Does it make sense? :)

    > Personally, I think you should make this the author's responsibility.
    > Maybe use the (slow) find-words-and-tag-them as an authoring tool to
    > help them do so... but encourage them to use appropriate markup in the
    > first place rather than trying to reverse-engineer their text.


    I agree. I would make it an authoring tool but currently I'm just playing
    around with XSLT to improve my skills.
     
    Hvid Hat, Apr 12, 2008
    #6
  7. Hvid Hat wrote:
    > What if I wanted to mark up relating words in some text?


    This is a programming problem first, then an XSLT problem. Figure out
    how you would solve it in any other programming language, so you have
    the problem well-formed and well-understood. Then figure out how to
    solve it nonprocedurally. Then implement that in XSLT... or decide not
    to do so, if it really isn't a problem well-suited to XSLT (as this may
    not be.)
     
    Joseph J. Kesselman, Apr 12, 2008
    #7
  8. Hvid Hat

    Peter Flynn Guest

    Hvid Hat wrote:
    > On 11-04-2008 21:12:48, Peter Flynn wrote:
    >
    >> If you mean you want to automate the application of markup to a
    >> document, by matching each word against your list of acronyms, then it's
    >> probably possible in XSLT (easier in XSLT2 than 1.0) but difficult when
    >> you need to handle things like "in XML's model" where the "word" is not
    >> delimited by spaces or markup boundaries. You'd have to use a recursive
    >> template to isolate each word in turn and test it against your list,
    >> which would be slow.

    >
    > I'm just playing around with XSLT to improve my skills so the performance is
    > not important. I'll give it a try but if anyone can help me on my way, I'd be
    > appreciated :)
    >
    > What if I wanted to mark up relating words in some text? Say I wanted to mark
    > up countries consiting of more words, e.g. Faroe Islands, South Africa, New
    > Zealand etc. Then I couldn't isolate each word in the text and make a
    > comparision. Would I have to use a mix of contains, substring-before,
    > substring-after?


    No, you'd pay someone to open the document in an XML editor and do it by
    hand.

    Really. If you want to apply reliable content markup on names (people,
    places, things), it's a *human* task.

    ///Peter
     
    Peter Flynn, Apr 13, 2008
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Peter Strøiman
    Replies:
    1
    Views:
    2,140
    Peter Strøiman
    Aug 23, 2005
  2. noblEnds
    Replies:
    2
    Views:
    527
    Joe Kesselman
    Jun 9, 2006
  3. Richard Heathfield
    Replies:
    7
    Views:
    398
    Barry Schwarz
    Oct 5, 2003
  4. utab

    Words Words

    utab, Feb 16, 2006, in forum: C++
    Replies:
    6
    Views:
    448
    Daniel T.
    Feb 16, 2006
  5. BerlinBrown
    Replies:
    6
    Views:
    4,850
Loading...

Share This Page