D
Dan Kohn
Sean, et al, thanks for a great piece of software in REXML. I would
appreciate if you would consider adding the text and texts method to
XPath and Elements.
I believe the following shows why it would be useful, but please let me
know if this isn't clear enough.
require "rexml/document"
include REXML
string = <<EOF
<html>
<td class="t4"><a href="javascript:lu('OZ')">OZ</a>
0204 F Class
<a href="/cgi/get?apt:uMl8TIcSlHI*itn/airports/ICN,itn/air/mp">
ICN</a> to <a
href="/cgi/get?apt:uMl8TIcSlHI*itn/airports/LAX,itn/air/mp">
LAX</a></td>
<tr>
<td class="t4"><font color="white">UNITED</font></td>
<td colspan="4" align="right">
<strong>48,164</strong></td>
</tr>
<tr>
<td class="t4"><font color="white">Star
Alliance</font></td>
<td colspan="4" align="right">
<strong>49,072</strong></td>
</tr>
</html>
EOF
doc = Document.new string.gsub!(/\s+| /," ")
#This works fine:
actsumarray = Array.new
XPath.each( doc,
"//td[@colspan='4']/child::*") { |cell|
actsumarray << cell.text.to_s }
puts actsumarray # 48,164 & 49,072
# But either of these would be much more convenient:
# actsumarray = Xpath.match.text ( doc, "//td[@colspan='4']/child::*")
# actsumarray = doc.elements.text.to_a( "//td[@colspan='4']/child::*")
# Converting to text is also pretty confusing.
# You might consider adding a method like
# remove_tag (which should be enhanced to support
# multiple tags). I suspect others would find it useful.
def remove_tag( rexml_array,tag)
# Removes tag but leaves the text inside the tag as text inside
# the parent of the now removed tag
while rexml_array.elements["//#{tag}"]
rexml_array.elements["//#{tag}"].replace_with( Text.new(
rexml_array.elements["//#{tag}"].text.strip))
end
end
# These sorts of examples would be great for the documentation
# to show how much the results can vary.
cell = doc.elements["//td[@class='t4']"]
puts cell #[ugly HTML]
puts cell.text.to_s # 0204 F Class
puts cell.texts.to_s # 0204 F Class to
remove_tag( cell, "a") #<td class='t4'>OZ 0204\
puts cell #F Class ICN to LAX</td>
puts cell.text.to_s #OZ
puts cell.texts.to_s #OZ 0204 F Class ICN to LAX
- dan
appreciate if you would consider adding the text and texts method to
XPath and Elements.
I believe the following shows why it would be useful, but please let me
know if this isn't clear enough.
require "rexml/document"
include REXML
string = <<EOF
<html>
<td class="t4"><a href="javascript:lu('OZ')">OZ</a>
0204 F Class
<a href="/cgi/get?apt:uMl8TIcSlHI*itn/airports/ICN,itn/air/mp">
ICN</a> to <a
href="/cgi/get?apt:uMl8TIcSlHI*itn/airports/LAX,itn/air/mp">
LAX</a></td>
<tr>
<td class="t4"><font color="white">UNITED</font></td>
<td colspan="4" align="right">
<strong>48,164</strong></td>
</tr>
<tr>
<td class="t4"><font color="white">Star
Alliance</font></td>
<td colspan="4" align="right">
<strong>49,072</strong></td>
</tr>
</html>
EOF
doc = Document.new string.gsub!(/\s+| /," ")
#This works fine:
actsumarray = Array.new
XPath.each( doc,
"//td[@colspan='4']/child::*") { |cell|
actsumarray << cell.text.to_s }
puts actsumarray # 48,164 & 49,072
# But either of these would be much more convenient:
# actsumarray = Xpath.match.text ( doc, "//td[@colspan='4']/child::*")
# actsumarray = doc.elements.text.to_a( "//td[@colspan='4']/child::*")
# Converting to text is also pretty confusing.
# You might consider adding a method like
# remove_tag (which should be enhanced to support
# multiple tags). I suspect others would find it useful.
def remove_tag( rexml_array,tag)
# Removes tag but leaves the text inside the tag as text inside
# the parent of the now removed tag
while rexml_array.elements["//#{tag}"]
rexml_array.elements["//#{tag}"].replace_with( Text.new(
rexml_array.elements["//#{tag}"].text.strip))
end
end
# These sorts of examples would be great for the documentation
# to show how much the results can vary.
cell = doc.elements["//td[@class='t4']"]
puts cell #[ugly HTML]
puts cell.text.to_s # 0204 F Class
puts cell.texts.to_s # 0204 F Class to
remove_tag( cell, "a") #<td class='t4'>OZ 0204\
puts cell #F Class ICN to LAX</td>
puts cell.text.to_s #OZ
puts cell.texts.to_s #OZ 0204 F Class ICN to LAX
- dan