A simple Hpricot text setter

C

Chris Gehlker

If anyone is trying to use Hpricot to clean up the actual content of
a site while leaving the markup alone, theymight find the following
tiny method useful:

class Hpricot::Text
# Adds a simple Hpricot method to change
# the text embedded in an HTML document
#
# Example of use:
# body.traverse_text do |text|
# text_out = text.to_s
# manupulate text_out
# text.set(text_out)
# end
def set(string)
@content = string
self.raw_string = string
end
end

The trick is to set both @content in Hpricot::Text and @raw_string in
it's parent.
 
W

why the lucky stiff

If anyone is trying to use Hpricot to clean up the actual content of
a site while leaving the markup alone, theymight find the following
tiny method useful:

class Hpricot::Text
# Adds a simple Hpricot method to change
# the text embedded in an HTML document
#
# Example of use:
# body.traverse_text do |text|
# text_out = text.to_s
# manupulate text_out
# text.set(text_out)
# end
def set(string)
@content = string
self.raw_string = string
end
end

You can also use Elements#inner_html= and Element#inner_html= for this.

(body/:a).inner_html = "New Link Text"

Also: set, html, remove, append, prepend, before, after, and wrap, which all
work just like their JQuery cousins.[1]

Thankyou for using Hpricot, it helps the all horses' hearts when you do.

_why

[1] http://jquery.com/docs/base/
 
C

Chris Gehlker

If anyone is trying to use Hpricot to clean up the actual content of
a site while leaving the markup alone, theymight find the following
tiny method useful:

class Hpricot::Text
# Adds a simple Hpricot method to change
# the text embedded in an HTML document
#
# Example of use:
# body.traverse_text do |text|
# text_out = text.to_s
# manupulate text_out
# text.set(text_out)
# end
def set(string)
@content = string
self.raw_string = string
end
end

You can also use Elements#inner_html= and Element#inner_html= for
this.

(body/:a).inner_html = "New Link Text"

Also: set, html, remove, append, prepend, before, after, and wrap,
which all
work just like their JQuery cousins.[1]

Thanks for responding, why: and thanks very much for Hpricot.

I'm a long way from completely understanding Hpricot but I did try to
use inner_html in what I though was the correct way.

Here is a little sample program:

require 'rubygems'
require_gem 'hpricot'

doc = Hpricot(open('TestFile.html'))
body = doc.search('body')
body.each {|elmnt| elmnt.inner_html}
body.inner_html
(body/:a).inner_html = "New Link Text"
puts doc

The output is:
testHpricot.rb:6: undefined method `inner_html' for #<Hpricot::Elem:
0x7546bc> (NoMethodError)
from testHpricot.rb:6:in `each'
from testHpricot.rb:6

If I comment out the body.each... line I get:

testHpricot.rb:7: undefined method `inner_html' for
#<Hpricot::Elements:0x753d48> (NoMethodError)

If I comment out that line, I get:

testHpricot.rb:8: undefined method `inner_html=' for []:Array
(NoMethodError)


What may be related is that the file text.rb is at:
/usr/local/lib/ruby/gems/1.8/gems/hpricot-0.3/lib/hpricot/text.rb
but it is not actually being required anywhere in Hpricot. When i
tried to require it manually, i found that it was requiring files
that gem didn't give me. This is all in Hpricot 0.3.

Thanks again for both your time and Hpricot.
 
W

why the lucky stiff

What may be related is that the file text.rb is at:
/usr/local/lib/ruby/gems/1.8/gems/hpricot-0.3/lib/hpricot/text.rb
but it is not actually being required anywhere in Hpricot. When i
tried to require it manually, i found that it was requiring files
that gem didn't give me. This is all in Hpricot 0.3.

Okay, yeah, you'll need the latest Hpricot (0.4.43):

gem install hpricot --source code.whytheluckystiff.net

Also, don't forget to remove `require_gem 'hpricot'` and use, instead,
`require 'hpricot'`.

_why
 
C

Chris Gehlker

Okay, yeah, you'll need the latest Hpricot (0.4.43):

gem install hpricot --source code.whytheluckystiff.net

Also, don't forget to remove `require_gem 'hpricot'` and use, instead,
`require 'hpricot'`.

_why

You seem to be making great progress with Hpricot, committing changes
every day.

Yep, 'require_gem' no longer works. Just using 'require' seems better.

I don't know that I communicated my idea behind adding a set method
for Hpricot::Text. There are times when one wants to scan an
potentially change everything that's *not* markup. The markup should
be left unchanged or modified only in trivial ways such as changing
the order of attribute declarations.

Hpricott::Traverse#traverse_text is great for finding as the stuff
that's *not* markup, the pcdata, in an HTML file. I just added a
method to change that data.

You suggested using inner_html= but the only way I can see that
working is to parse the tree looking for those elements which only
have Hpricot::Text children and then using inner_html= on them. But
that would involve essentially recreating
Hpricott::Traverse#traverse_text to find such elements although the
common code could mostly be factored out.
 
W

why the lucky stiff

Hpricott::Traverse#traverse_text is great for finding as the stuff
that's *not* markup, the pcdata, in an HTML file. I just added a
method to change that data.

Okay, I get it. I guess I need to get //div[contains(text(), '...')]
working. Be assured, traverse_text will stick around.

_why
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top