Getting kind of abstract text snippets from text nodes

Discussion in 'XML' started by Andreas W. Wylach, Mar 8, 2007.

  1. Hi everybody,

    I am about implementing a little search engine that searches a phrase
    over xml text nodes. I got
    that all working fine but what I want as the results is not the
    complete text of the textnode,
    I would like to make an abstract like result list (such output that
    you get with google searches.

    For eg

    .... I am the <b>substring</b> from a complete text node ...

    where "substring" is the search term.

    The problem is simple (I think): I want to extract all the text parts
    of the complete text node,
    where search searchterm is highlighted, surrounded by the text like
    30
    characters.

    I found an intersting post "cut down text" which is almost that what
    I
    am looking for, but there the
    text is just trimmed by x characters.

    Is anybody here, that has an "elegant" way to solve that or some
    hints
    that get me to the solution? I am not able to use regex (would be
    nice
    though)
    My parser is Sablotron so I am restricted to the functions that I
    get.
    (1.0).


    Any help is greatly appreciated.


    regards,
    Andreas W Wylach
     
    Andreas W. Wylach, Mar 8, 2007
    #1
    1. Advertising

  2. Think about dividing the text into three parts: before your target, the
    target itself, and after the target. Process each appropriately. If you
    want to report multiple instances within the same block of text, look at
    the standard examples of recursive text processing.


    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
     
    Joe Kesselman, Mar 8, 2007
    #2
    1. Advertising

  3. "Andreas W. Wylach" <> wrote in message
    news:...
    > Hi everybody,
    >
    > I am about implementing a little search engine that searches a phrase
    > over xml text nodes. I got
    > that all working fine but what I want as the results is not the
    > complete text of the textnode,
    > I would like to make an abstract like result list (such output that
    > you get with google searches.
    >
    > For eg
    >
    > ... I am the <b>substring</b> from a complete text node ...
    >
    > where "substring" is the search term.
    >
    > The problem is simple (I think): I want to extract all the text parts
    > of the complete text node,
    > where search searchterm is highlighted, surrounded by the text like
    > 30
    > characters.



    FXSL gives you exactly that (look for testConcordance.xsl).

    As first shown here a year and a half ago:


    http://www.stylusstudio.com/xsllist/200511/post00560.html

    this was used to create a concordance of the text of the New Testament for
    any word longer than three characters with frequency count in the document
    not exceeding a given frequency count parameter (1280, which practically
    leaves out mainly pronouns).

    The code itself is 95 lines and on a 3GHz, 2GB Pentium IV PC with Saxon 8.6
    (at that time) needed less than 92 seconds to produce the complete (huge)
    concordance. The source xml document: "ot Ending Spaces.xml" is almost 50
    000 (fifty thousand) lines long.

    This is just one illustration of the reality of what can be done with XSLT,
    disspelling the myths of "XSLT cannot do this or that
    efficiently/elegantly".

    Hope this helped.


    Cheers,
    Dimitre Novatchev
     
    Dimitre Novatchev, Mar 10, 2007
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. DaKoadMunky
    Replies:
    4
    Views:
    552
    Lee Weiner
    Apr 20, 2004
  2. Matthias Kaeppler
    Replies:
    1
    Views:
    443
    R.F. Pels
    May 22, 2005
  3. asd
    Replies:
    3
    Views:
    440
    Arnaud Berger
    May 23, 2005
  4. Xamle Eng

    Why treat text nodes as nodes?

    Xamle Eng, May 13, 2005, in forum: XML
    Replies:
    8
    Views:
    496
    Fredrik Lundh
    May 28, 2005
  5. Paddy
    Replies:
    0
    Views:
    262
    Paddy
    Sep 18, 2007
Loading...

Share This Page