Using Scrubyt on bad markup pages

R

Rolin Nelson

I am having trouble scrubbing a page that has bad markup. After
fetching the page, the Scrubyt::Extractor exits while parsing the
document. The Apple Safari web inspector shows numerous errors from the
page:

<meta> is not allowed inside <td>. Moving <meta> into the <head>.
Unmatched </embed> encountered. Ignoring tag.
Unmatched </span> encountered. Ignoring tag.
Unmatched </a> encountered. Ignoring tag.

Is there anyway to scrub a page with scrubyt that is poorly formated? I
am using the latest version (0.4.1) of scrubyt.

Thanks,
Rolin
 
R

Ryan Davis

I am having trouble scrubbing a page that has bad markup. After
fetching the page, the Scrubyt::Extractor exits while parsing the
document. The Apple Safari web inspector shows numerous errors from
the
page:

<meta> is not allowed inside <td>. Moving <meta> into the <head>.
Unmatched </embed> encountered. Ignoring tag.
Unmatched </span> encountered. Ignoring tag.
Unmatched </a> encountered. Ignoring tag.

Is there anyway to scrub a page with scrubyt that is poorly
formated? I
am using the latest version (0.4.1) of scrubyt.

switch to mechanize and update your gems. scrubyt depends on hpricot
and a very old version of mechanize. Mechanize now uses nokogiri
instead of hpricot and is much more resilient with errors.
 
R

Rolin Nelson

Ryan said:
switch to mechanize and update your gems. scrubyt depends on hpricot
and a very old version of mechanize. Mechanize now uses nokogiri
instead of hpricot and is much more resilient with errors.

Thank you, I will try to use Mechanize directly. However, when I
installed scrubyt 0.4.1 it did appear to have a dependency on nokogiri.
I've cut and pasted the standard output.

$ sudo gem install scrubyt-0.4.11.gem
Password:
Building native extensions. This could take a while...
Successfully installed scrubyt-0.4.1
Successfully installed nokogiri-1.2.3
2 gems installed
Installing ri documentation for scrubyt-0.4.1...
Installing ri documentation for nokogiri-1.2.3...
Installing RDoc documentation for scrubyt-0.4.1...
Installing RDoc documentation for nokogiri-1.2.3...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top