REXML feature request: XPath.match.text & better text documentation

Discussion in 'Ruby' started by Dan Kohn, Sep 15, 2005.

  1. Dan Kohn

    Dan Kohn Guest

    Sean, et al, thanks for a great piece of software in REXML. I would
    appreciate if you would consider adding the text and texts method to
    XPath and Elements.

    I believe the following shows why it would be useful, but please let me
    know if this isn't clear enough.

    require "rexml/document"
    include REXML
    string = <<EOF
    <html>
    <td class="t4"><a href="javascript:lu('OZ')">OZ</a>
    0204 F Class
    <a href="/cgi/get?apt:uMl8TIcSlHI*itn/airports/ICN,itn/air/mp">
    ICN</a> to <a
    href="/cgi/get?apt:uMl8TIcSlHI*itn/airports/LAX,itn/air/mp">
    LAX</a></td>
    <tr>
    <td class="t4"><font color="white">UNITED</font></td>
    <td colspan="4" align="right">
    <strong>48,164</strong></td>
    </tr>
    <tr>
    <td class="t4"><font color="white">Star
    Alliance</font></td>
    <td colspan="4" align="right">
    <strong>49,072</strong></td>
    </tr>
    </html>
    EOF
    doc = Document.new string.gsub!(/\s+|&nbsp;/," ")

    #This works fine:
    actsumarray = Array.new
    XPath.each( doc,
    "//td[@colspan='4']/child::*") { |cell|
    actsumarray << cell.text.to_s }
    puts actsumarray # 48,164 & 49,072

    # But either of these would be much more convenient:
    # actsumarray = Xpath.match.text ( doc, "//td[@colspan='4']/child::*")
    # actsumarray = doc.elements.text.to_a( "//td[@colspan='4']/child::*")

    # Converting to text is also pretty confusing.
    # You might consider adding a method like
    # remove_tag (which should be enhanced to support
    # multiple tags). I suspect others would find it useful.

    def remove_tag( rexml_array,tag)
    # Removes tag but leaves the text inside the tag as text inside
    # the parent of the now removed tag
    while rexml_array.elements["//#{tag}"]
    rexml_array.elements["//#{tag}"].replace_with( Text.new(
    rexml_array.elements["//#{tag}"].text.strip))
    end
    end

    # These sorts of examples would be great for the documentation
    # to show how much the results can vary.
    cell = doc.elements["//td[@class='t4']"]
    puts cell #[ugly HTML]
    puts cell.text.to_s # 0204 F Class
    puts cell.texts.to_s # 0204 F Class to
    remove_tag( cell, "a") #<td class='t4'>OZ 0204\
    puts cell #F Class ICN to LAX</td>
    puts cell.text.to_s #OZ
    puts cell.texts.to_s #OZ 0204 F Class ICN to LAX



    - dan
    --
    Dan Kohn <mailto:>
    <http://www.dankohn.com/> <tel:+1-415-233-1000>
     
    Dan Kohn, Sep 15, 2005
    #1
    1. Advertising

  2. On Sep 15, 2005, at 3:56 AM, Dan Kohn wrote:
    > doc = Document.new string.gsub!(/\s+|&nbsp;/," ")


    One aside - you might like to know about:

    doc = Document.new( string, :ignore_whitespace_nodes => :all )
     
    Gavin Kistner, Sep 15, 2005
    #2
    1. Advertising

  3. On Sep 15, 2005, at 3:56 AM, Dan Kohn wrote:
    > Sean, et al, thanks for a great piece of software in REXML. I would
    > appreciate if you would consider adding the text and texts method to
    > XPath and Elements.


    Does this help you?

    require 'rexml/document'
    include REXML

    d = Document.new <<ENDXML
    <root>
    <foo>Raw text</foo>
    <foo>Raw text2</foo>
    <foo>AA <bar>Nested Text</bar>ZZ</foo>
    </root>
    ENDXML

    p XPath.match( d, '//foo//text()' ).collect{ |textnode|
    textnode.value
    }
    #=> ["Raw text", "Raw text2", "AA", "Nested Text", "ZZ"]

    class REXML::Element
    def inner_text
    self.each_element( './/text()' ){}.join( '' )
    end
    end

    p XPath.match( d, '//foo' ).collect{ |foo|
    foo.inner_text
    }
    #=> ["Raw text", "Raw text2", "AA Nested TextZZ"]
     
    Gavin Kistner, Sep 15, 2005
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Michele Simionato

    feature request: a better str.endswith

    Michele Simionato, Jul 18, 2003, in forum: Python
    Replies:
    24
    Views:
    804
    Peter Hansen
    Jan 9, 2004
  2. Damphyr
    Replies:
    2
    Views:
    163
    Damphyr
    Jul 16, 2003
  3. Alexey Verkhovsky
    Replies:
    0
    Views:
    186
    Alexey Verkhovsky
    Aug 3, 2004
  4. Daniel Berger

    rexml error - REXML::Validation

    Daniel Berger, Oct 12, 2004, in forum: Ruby
    Replies:
    2
    Views:
    168
    Henrik Horneber
    Oct 12, 2004
  5. Alexey Verkhovsky

    Ordering of REXML::XPath#match result

    Alexey Verkhovsky, Oct 23, 2004, in forum: Ruby
    Replies:
    0
    Views:
    129
    Alexey Verkhovsky
    Oct 23, 2004
Loading...

Share This Page