python/xpath question...

Discussion in 'Python' started by bruce, Jul 6, 2006.

  1. bruce

    bruce Guest

    for guys with python/xpath expertise..

    i'm playing with xpath.. and i'm trying to solve an issue...

    i have the following kind of situation where i'm trying to get certain data.

    i have a bunch of tr/td...

    i can create an xpath, that gets me all of the tr.. i only want to get the
    sibling tr up until i hit a 'tr' that has a 'th' anybody have an idea as to
    how this query might be created?..


    the idea would be to start at the "Summer B", to skip the 1st "tr", to get
    the next "tr"s until you get to the next "Summer" section...

    sample data.....

    <tr> <Th colspan=14 class="soc_comment"> Summer B </th> </tr>
    <!-- START RA.CTLIB(SOCPHDR1) -->
    <tr>
    <td nowrap valign="bottom" class="colhelp">
    <a href="#">Course<span>
    <b>Course</b>
    <br>Course number and suffix, if applicable.
    <br>C = combined lecture and lab course
    <br>L = laboratory course
    </span></a></td>
    </tr>
    <!-- END RA.CTLIB(SOCPHDR1) -->
    <tr>
    <td valign="top" nowrap><a href="javascript:crsdescunderpop('AST1002');">AST
    1002</a></td>
    </tr>
    <tr>
    <td valign="top" nowrap><a
    href="javascript:crsdescunderpop('AST1022L');">AST 1022L</a></td>
    </tr>
    <tr>
    <td valign="top" nowrap><a
    href="javascript:crsdescunderpop('AST1022L');">AST 1022L</a></td>
    </tr>
    <tr>
    <td valign="top" nowrap><a
    href="javascript:crsdescunderpop('AST1022L');">AST 1022L</a></td>
    </tr>
    <tr> <Th colspan=14 class="soc_comment"> Summer C </th> </tr>
    <!-- START RA.CTLIB(SOCPHDR1) -->
    <tr>
    <td nowrap valign="bottom" class="colhelp">
    <a href="#">Course<span>
    ..
    ..
    ..

    thanks...

    -bruce
     
    bruce, Jul 6, 2006
    #1
    1. Advertising

  2. bruce wrote:
    > for guys with python/xpath expertise..
    >
    > i'm playing with xpath.. and i'm trying to solve an issue...
    >
    > i have the following kind of situation where i'm trying to get certain data.
    >
    > i have a bunch of tr/td...
    >
    > i can create an xpath, that gets me all of the tr.. i only want to get the
    > sibling tr up until i hit a 'tr' that has a 'th' anybody have an idea as to
    > how this query might be created?..


    I'm not quite sure how this is supposed to be related to Python, but if you're
    trying to find a sibling, what about using the "sibling" axis in XPath?

    Stefan
     
    Stefan Behnel, Jul 6, 2006
    #2
    1. Advertising

  3. bruce

    John J. Lee Guest

    (Damn gmane's authorizor, I think I lost four postings because the
    auth messages went to my work email address (and I thought the
    authorization was supposed to be one-time only per group anyway??). I
    deleted them as spam since I hadn't posted from there for days :-(
    Grrr. At least I could reconstruct this one...)

    "bruce" <> writes:

    > for guys with python/xpath expertise..
    >
    > i'm playing with xpath.. and i'm trying to solve an issue...
    >
    > i have the following kind of situation where i'm trying to get certain data.
    >
    > i have a bunch of tr/td...
    >
    > i can create an xpath, that gets me all of the tr.. i only want to get the
    > sibling tr up until i hit a 'tr' that has a 'th' anybody have an idea as to
    > how this query might be created?..

    [...]

    ((//tr/th)[2]/../following-sibling::tr/td/..)[count(.|((//tr/th)[3]/../preceding-sibling::*))=count((//tr/th)[3]/../preceding-sibling::*)]


    which makes use of the following idiom for writing an intersection:

    $set1[count(.|$set2)=count($set2)]


    and gets the second group in the sequence you describe. IMHO, this
    illustrates what happens when XPath is pushed too far ;-) I don't see
    an easier way, but perhaps I missed one.

    Example code:

    (Note that the expression used here doesn't get any trailing group of
    tr elements if there's no terminating tr/th -- that fits your
    specification, but may not be what you really wanted. To fix that,
    meditate on the above expression for an hour or two <0.8 wink>.)

    #---------------------------------------------------------
    def xpath(path, source):
    import StringIO
    import pprint
    from lxml import etree
    f = StringIO.StringIO(source)
    tree = etree.parse(f)
    r = tree.xpath(path)
    #return "\n".join(etree.tostring(el) for el in r)
    return pprint.pformat([etree.tostring(el) for el in r])

    simple = """\
    <html>
    <tr><th>A</th></tr>
    <tr><td>B</td></tr>
    <tr><td>C</td></tr>
    <tr><th>D</th></tr>
    <tr><td>E</td></tr>
    <tr><td>F</td></tr>
    <tr><th>G</th></tr>
    <tr><td>H</td></tr>
    <tr><td>I</td></tr>
    </html>
    """

    for i in range(3):
    expr = '((//tr/th)[%s]/../following-sibling::tr/td/..)[count(.|((//tr/th)[%s]/../preceding-sibling::*))=count((//tr/th)[%s]/../preceding-sibling::*)]' % (i+1, i+2, i+2)
    print "---------------------"
    print xpath(expr, simple)
    #---------------------------------------------------------


    john[0]$ tst.py
    ---------------------
    ['<tr><td>B</td></tr>\n', '<tr><td>C</td></tr>\n']
    ---------------------
    ['<tr><td>E</td></tr>\n', '<tr><td>F</td></tr>\n']
    ---------------------
    []


    Knowing what you're doing, though, you'd probably be better off with
    BeautifulSoup than XPath. Also note that mechanize (which I know
    you're using) only supports BeautifulSoup 2 at present. You can't use
    BeautifulSoup 3 yet (I hope to fix that 'RSN').


    John
     
    John J. Lee, Jul 9, 2006
    #3
  4. bruce

    John J. Lee Guest

    Stefan Behnel <> writes:
    [...]
    > I'm not quite sure how this is supposed to be related to Python, but if you're
    > trying to find a sibling, what about using the "sibling" axis in XPath?


    <nit>
    There's no "sibling" axis in XPath. I'm sure you meant
    "following-sibling" and/or "preceding-sibling".
    </nit>


    John
     
    John J. Lee, Jul 9, 2006
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Marvin_123456

    "Memory leak" in javax.xml.xpath.XPath

    Marvin_123456, Jul 29, 2005, in forum: Java
    Replies:
    4
    Views:
    1,996
    jan V
    Jul 29, 2005
  2. Alastair Cameron
    Replies:
    1
    Views:
    7,450
    SQL Server Development Team [MSFT]
    Jul 8, 2003
  3. Anna
    Replies:
    0
    Views:
    542
  4. goog
    Replies:
    0
    Views:
    519
  5. Tjerk Wolterink

    XPath: efficiency in xpath expressions

    Tjerk Wolterink, Nov 13, 2004, in forum: XML
    Replies:
    1
    Views:
    1,667
    Richard Tobin
    Nov 13, 2004
Loading...

Share This Page