XQuerying material between elements

Discussion in 'XML' started by patrik.nyman@orient.su.se, Apr 25, 2007.

  1. Guest

    I am working with marking up the text of old books,
    and need to be able to present the result page-wise.
    Problem is, sometimes the page breaks occurs in the
    middle of a paragraph (or in some other element).
    See the following example.

    <p>I shall not describe it to you, for in-
    <lb/>deed I cannot. To delineate the truly aw-
    <lb/>ful locality of Trollhättan, would
    <lb/>baffle the powers of poetic fancy, and mock
    <pb n="15" urn="urn:nbn:se:kb:digark-7886"/>
    <lb/>the painter's daring pencil. I ran only af-
    <lb/>ford you a faint idea of its characteristic
    <lb/>features, and even that will he found
    <lb/>arduous. Come, and see it, and you will
    <lb/>applaud my modesty.
    </p>

    <p>[...]
    <lb/>of gold." Subscribing to the old Swedish
    <lb/>proverb: When it rains down milk, the poor
    <lb/>has no spoon," I silently dropped the theme,
    <lb/>and would not have rementioned it now,
    <pb n="16" urn="urn:nbn:se:kb:digark-7887"/>
    <lb/>if I were not anxious to dis-play to you, what
    <lb/>an able minister of state I might possibly
    <lb/>be, if His Majesty should be pleased to
    <lb/>invest me with that honor, which, you
    <lb/>know, is as distant from me as the mitre
    <lb/>and the slipper of the Pope of Rome.
    </p>

    Just separating out the material in between the <pb/>'s
    gives non-wellformed XML.

    So, is it possible to write an XQuery expression that
    can fix this, i.e. 'detect' that the <pb/> occurs in
    the middle of another element and take the appropriate
    action? The result would have to look something like

    <pb n="15" urn="urn:nbn:se:kb:digark-7886"/>
    <p rend="noindent">the painter's daring pencil. I ran only af-
    <lb/>ford you a faint idea of its characteristic
    <lb/>features, and even that will he found
    <lb/>arduous. Come, and see it, and you will
    <lb/>applaud my modesty.
    </p>

    <p>[...]
    <lb/>of gold." Subscribing to the old Swedish
    <lb/>proverb: When it rains down milk, the poor
    <lb/>has no spoon," I silently dropped the theme,
    <lb/>and would not have rementioned it now,
    </p>

    Thanks.
    , Apr 25, 2007
    #1
    1. Advertising

  2. I'm sure XQuery can do it, though I'm not sure of the syntax offhand.

    In XPath, I would set up a template that matches on p[pb] (a paragraph
    that contains a page break) and rewrites it appropriately by first
    outputting a p containing the pb's preceeding siblings, then the pb,
    then a p containing the following siblings. Very straightforward.

    --
    Joe Kesselman / Beware the fury of a patient man. -- John Dryden
    Joseph Kesselman, Apr 25, 2007
    #2
    1. Advertising

  3. Pavel Lepin Guest

    Joseph Kesselman <> wrote in
    <462f6975$1@kcnews01>:
    > > So, is it possible to write an XQuery expression that
    > > can fix this, i.e. 'detect' that the <pb/> occurs in
    > > the middle of another element and take the appropriate
    > > action? The result would have to look something like
    > > I'm sure XQuery can do it, though I'm not sure of the
    > > syntax offhand.

    >
    > In XPath, I would set up a template that matches on p[pb]
    > (a paragraph that contains a page break) and rewrites it
    > appropriately by first outputting a p containing the pb's
    > preceeding siblings, then the pb, then a p containing the
    > following siblings. Very straightforward.


    XSLT does indeed seem like a better bet than XQuery in this
    case, but if you try to generalize the problem a bit
    (multiple page breaks and more than one level of ancestor
    elements to be spliced) it gets kinda messy with XSLT1. On
    the other hand, an XSLT2 solution would be fairly elegant
    thanks to sequences--may FSM touch with his noodly
    appendage whoever on XSLT WG came up with those.

    --
    Pavel Lepin
    Pavel Lepin, Apr 25, 2007
    #3
  4. Joseph Kesselman wrote:
    > In XPath,


    Meant to write XSLT, obviously. Sigh. Engage mind, THEN put fingers in
    gear...

    --
    Joe Kesselman / Beware the fury of a patient man. -- John Dryden
    Joseph Kesselman, Apr 25, 2007
    #4
  5. Guest

    Thanks for the replies. I forgot to mention that the texts
    are posited in the eXist database, hence the need for XQuery.
    What I've managed to come up with is this.

    1 <hit>
    2 :) Check if the initial <pb> is the child of another element,
    3 and print the name of that element. :)
    4 {
    5 let $i1 := //pb[@urn='urn:nbn:se:kb:digark-7886']
    6 return
    7 if ($i1[parent::p]) then
    8 '<p rend="noindent">'
    9 else
    10 if ($i1[parent::lg]) then
    11 '<lg>'
    12 else()
    13 }
    14 :) Print the material between the pagebreaks. :)
    15 {
    16 let $i1 := //pb[@urn='urn:nbn:se:kb:digark-7886'],
    17 $i2 := //pb[@urn='urn:nbn:se:kb:digark-7887']
    18 for $n in //text()
    19 where $n >> $i1 and $n << $i2
    20 return $n
    21 }
    22 :) Check if the final <pb> is the child of another element,
    23 and print the name of that element. :)
    24 {
    25 let $i2 := //pb[@urn='urn:nbn:se:kb:digark-7887']
    26 return
    27 if ($i2[parent::p]) then
    28 '</p>'
    29 else
    30 if ($i2[parent::lg]) then
    31 '</lg>'
    32 else()
    33 }
    34 </hit>

    This works fine, except of course for the 'text()' om line 18.
    This outputs only the text, not the text and markup, which is what I
    want.
    Switching 'text()' for 'node()' or 'element()' doesn't give the
    desired result either, naturally.

    Any suggestions are welcome. Thanks.
    --
    Patrik Nyman
    , Apr 26, 2007
    #5
  6. a écrit :

    For my curiosity, is :

    > 1 <hit>
    > 2 :) Check if the initial <pb> is the child of another element,
    > 3 and print the name of that element. :)
    > 4 {
    > 5 let $i1 := //pb[@urn='urn:nbn:se:kb:digark-7886']
    > 6 return
    > 7 if ($i1[parent::p]) then


    this :

    > 8 '<p rend="noindent">'
    > 9 else
    > 10 if ($i1[parent::lg]) then


    this :

    > 11 '<lg>'
    > 12 else()
    > 13 }
    > 14 :) Print the material between the pagebreaks. :)
    > 15 {
    > 16 let $i1 := //pb[@urn='urn:nbn:se:kb:digark-7886'],
    > 17 $i2 := //pb[@urn='urn:nbn:se:kb:digark-7887']
    > 18 for $n in //text()
    > 19 where $n >> $i1 and $n << $i2
    > 20 return $n
    > 21 }
    > 22 :) Check if the final <pb> is the child of another element,
    > 23 and print the name of that element. :)
    > 24 {
    > 25 let $i2 := //pb[@urn='urn:nbn:se:kb:digark-7887']
    > 26 return
    > 27 if ($i2[parent::p]) then


    this :

    > 28 '</p>'
    > 29 else
    > 30 if ($i2[parent::lg]) then


    and this :

    > 31 '</lg>'
    > 32 else()
    > 33 }
    > 34 </hit>


    .... supposed to be mark-up in the resulting sequence ?

    p.b.
    Pierrick Brihaye, Apr 26, 2007
    #6
  7. Pierrick Brihaye wrote:
    > For my curiosity, is :
    > this :
    >> 8 '<p rend="noindent">'

    > ... supposed to be mark-up in the resulting sequence ?


    I certainly hope not, because if so I'd consider it an abuse of XQuery,
    akin to trying to hand-construct tags in XSLT.

    If the goal is to construct document structure, construct structure, not
    text that looks like structure.


    --
    Joe Kesselman / Beware the fury of a patient man. -- John Dryden
    Joseph Kesselman, Apr 26, 2007
    #7
  8. Hi,

    How about something like this:

    let $i1 := //pb[@urn='urn:nbn:se:kb:digark-7886'],
    $i2 := //pb[@urn='urn:nbn:se:kb:digark-7887']
    return <hit>{
    if ($i1[parent::p])
    then <p rend="noindent">{$i1/following-sibling::node()}</p>
    else ()
    ,
    for $n in //p
    where $n >> $i1 and $n << $i2 and not($n/*[. is $i1]) and not($n/*[. is
    $i2])
    return $n
    ,
    if ($i2[parent::p])
    then <p>{$i2/preceding-sibling::node()}</p>
    else ()

    }</hit>

    Hope that helps,
    Priscilla

    ---------------------------------------------
    Priscilla Walmsley
    Author, XQuery (2007, O'Reilly Media)
    http://www.datypic.com
    http://www.xqueryfunctions.com
    ---------------------------------------------

    *** Sent via Developersdex http://www.developersdex.com ***
    Priscilla Walmsley, Apr 27, 2007
    #8
  9. Guest

    On 27 Apr, 19:13, Priscilla Walmsley <> wrote:
    > Hi,
    >
    > How about something like this:
    >
    > let $i1 := //pb[@urn='urn:nbn:se:kb:digark-7886'],
    > $i2 := //pb[@urn='urn:nbn:se:kb:digark-7887']
    > return <hit>{
    > if ($i1[parent::p])
    > then <p rend="noindent">{$i1/following-sibling::node()}</p>
    > else ()
    > ,
    > for $n in //p
    > where $n >> $i1 and $n << $i2 and not($n/*[. is $i1]) and not($n/*[. is
    > $i2])
    > return $n
    > ,
    > if ($i2[parent::p])
    > then <p>{$i2/preceding-sibling::node()}</p>
    > else ()
    >
    > }</hit>
    >
    > Hope that helps,
    > Priscilla
    >
    > ---------------------------------------------
    > Priscilla Walmsley
    > Author, XQuery (2007, O'Reilly Media)http://www.datypic.comhttp://www.xqueryfunctions.com
    > ---------------------------------------------
    >
    > *** Sent via Developersdexhttp://www.developersdex.com***


    Thanks a lot for this. I cannot test it until wednesday, but then I'll
    let you know.

    /Patrik Nyman
    , Apr 29, 2007
    #9
  10. Guest

    On 27 Apr, 19:13, Priscilla Walmsley <> wrote:
    > Hi,
    >
    > How about something like this:
    >
    > let $i1 := //pb[@urn='urn:nbn:se:kb:digark-7886'],
    > $i2 := //pb[@urn='urn:nbn:se:kb:digark-7887']
    > return <hit>{
    > if ($i1[parent::p])
    > then <p rend="noindent">{$i1/following-sibling::node()}</p>
    > else ()
    > ,
    > for $n in //p
    > where $n >> $i1 and $n << $i2 and not($n/*[. is $i1]) and not($n/*[. is
    > $i2])
    > return $n
    > ,
    > if ($i2[parent::p])
    > then <p>{$i2/preceding-sibling::node()}</p>
    > else ()
    >
    > }</hit>
    >
    > Hope that helps,
    > Priscilla


    Yes, it works, and is much better than my version!
    Thanks a lot,
    Patrik
    , May 3, 2007
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. kitkat
    Replies:
    2
    Views:
    2,452
  2. kitkat
    Replies:
    0
    Views:
    426
    kitkat
    Jan 12, 2005
  3. Replies:
    0
    Views:
    519
  4. Replies:
    2
    Views:
    1,400
  5. vishnu

    need FIFO material

    vishnu, Mar 17, 2006, in forum: VHDL
    Replies:
    0
    Views:
    497
    vishnu
    Mar 17, 2006
Loading...

Share This Page