J
Jeremy Woertink
I'm working with Mechanize doing some screen scraping. Because of the
project, I have to use an older version of Mechanize for now, I'm using
0.8.4.
The goal of what I'm trying to do is take a string and insert pipes '|'
before words that are *not* inside of <a></a>.
I have:=> "<pre><a href=\"javascript:document.f6.SLID.value='F36';
document.f6.submit();\" onMouseOut=\"window.status='';\"
title=\"Select\" onMouseOver=\"window.status='Select'; return
true;\">HEAD</a> \n <a
href=\"javascript:document.f6.SLID.value='F37'; document.f6.submit();\"
onMouseOut=\"window.status='';\" title=\"Select\"
onMouseOver=\"window.status='Select'; return true;\">TITLE</a> <a
href=\"javascript:document.f6.SLID.value='F38'; document.f6.submit();\"
onMouseOut=\"window.status='';\" title=\"Select\"
onMouseOver=\"window.status='Select'; return true;\">\"test\"</a>\n<a
href=\"javascript:document.f6.SLID.value='F39'; document.f6.submit();\"
onMouseOut=\"window.status='';\" title=\"Select\"
onMouseOver=\"window.status='Select'; return true;\">BODY</a> \n
<a href=\"javascript:document.f6.SLID.value='F40';
document.f6.submit();\" onMouseOut=\"window.status='';\"
title=\"Select\" onMouseOver=\"window.status='Select'; return
true;\">DIV</a> id <a
href=\"javascript:document.f6.SLID.value='F41'; document.f6.submit();\"
onMouseOut=\"window.status='';\" title=\"Select\"
onMouseOver=\"window.status='Select'; return true;\">\"main-div\"</a>\n
<a href=\"javascript:document.f6.SLID.value='F42';
document.f6.submit();\" onMouseOut=\"window.status='';\"
title=\"Select\" onMouseOver=\"window.status='Select'; return
true;\">CSS-WITH-LINK</a> destination <a
href=\"javascript:document.f6.SLID.value='F43'; document.f6.submit();\"
onMouseOut=\"window.status='';\" title=\"Select\"
onMouseOver=\"window.status='Select'; return true;\">TO</a> <a
href=\"javascript:document.f6.SLID.value='F44'; document.f6.submit();\"
onMouseOut=\"window.status='';\" title=\"Select\"
onMouseOver=\"window.status='Select'; return true;\">:index</a>\n
<a href=\"javascript:document.f6.SLID.value='F45';
document.f6.submit();\" onMouseOut=\"window.status='';\"
title=\"Select\" onMouseOver=\"window.status='Select'; return
true;\">IMAGE</a> source <a
href=\"javascript:document.f6.SLID.value='F46'; document.f6.submit();\"
onMouseOut=\"window.status='';\" title=\"Select\"
onMouseOver=\"window.status='Select'; return
true;\">RENDER</a> image <a
href=\"javascript:document.f6.SLID.value='F47'; document.f6.submit();\"
onMouseOut=\"window.status='';\" title=\"Select\"
onMouseOver=\"window.status='Select'; return true;\">@image</a>\n
max-height <a href=\"javascript:document.f6.SLID.value='F48';
document.f6.submit();\" onMouseOut=\"window.status='';\"
title=\"Select\" onMouseOver=\"window.status='Select'; return
true;\">m-h</a>\n</pre>"
The words I'm trying to insert the pipe before have a non-breaking space
tags around them. I had it working where I can iterate through
everything and return a new string, but I end up losing all line breaks
and non-breaking spaces using
new_body = ''
template_body.to_html.split(" ").each do |el|
el.split("\n").each do |e|
unless e.empty? or e =~ /<\/?[^>]*>/
e = '|' + e
end
end
new_body += el
end
Any ideas?
Thanks,
~Jeremy
project, I have to use an older version of Mechanize for now, I'm using
0.8.4.
The goal of what I'm trying to do is take a string and insert pipes '|'
before words that are *not* inside of <a></a>.
I have:=> "<pre><a href=\"javascript:document.f6.SLID.value='F36';
document.f6.submit();\" onMouseOut=\"window.status='';\"
title=\"Select\" onMouseOver=\"window.status='Select'; return
true;\">HEAD</a> \n <a
href=\"javascript:document.f6.SLID.value='F37'; document.f6.submit();\"
onMouseOut=\"window.status='';\" title=\"Select\"
onMouseOver=\"window.status='Select'; return true;\">TITLE</a> <a
href=\"javascript:document.f6.SLID.value='F38'; document.f6.submit();\"
onMouseOut=\"window.status='';\" title=\"Select\"
onMouseOver=\"window.status='Select'; return true;\">\"test\"</a>\n<a
href=\"javascript:document.f6.SLID.value='F39'; document.f6.submit();\"
onMouseOut=\"window.status='';\" title=\"Select\"
onMouseOver=\"window.status='Select'; return true;\">BODY</a> \n
<a href=\"javascript:document.f6.SLID.value='F40';
document.f6.submit();\" onMouseOut=\"window.status='';\"
title=\"Select\" onMouseOver=\"window.status='Select'; return
true;\">DIV</a> id <a
href=\"javascript:document.f6.SLID.value='F41'; document.f6.submit();\"
onMouseOut=\"window.status='';\" title=\"Select\"
onMouseOver=\"window.status='Select'; return true;\">\"main-div\"</a>\n
<a href=\"javascript:document.f6.SLID.value='F42';
document.f6.submit();\" onMouseOut=\"window.status='';\"
title=\"Select\" onMouseOver=\"window.status='Select'; return
true;\">CSS-WITH-LINK</a> destination <a
href=\"javascript:document.f6.SLID.value='F43'; document.f6.submit();\"
onMouseOut=\"window.status='';\" title=\"Select\"
onMouseOver=\"window.status='Select'; return true;\">TO</a> <a
href=\"javascript:document.f6.SLID.value='F44'; document.f6.submit();\"
onMouseOut=\"window.status='';\" title=\"Select\"
onMouseOver=\"window.status='Select'; return true;\">:index</a>\n
<a href=\"javascript:document.f6.SLID.value='F45';
document.f6.submit();\" onMouseOut=\"window.status='';\"
title=\"Select\" onMouseOver=\"window.status='Select'; return
true;\">IMAGE</a> source <a
href=\"javascript:document.f6.SLID.value='F46'; document.f6.submit();\"
onMouseOut=\"window.status='';\" title=\"Select\"
onMouseOver=\"window.status='Select'; return
true;\">RENDER</a> image <a
href=\"javascript:document.f6.SLID.value='F47'; document.f6.submit();\"
onMouseOut=\"window.status='';\" title=\"Select\"
onMouseOver=\"window.status='Select'; return true;\">@image</a>\n
max-height <a href=\"javascript:document.f6.SLID.value='F48';
document.f6.submit();\" onMouseOut=\"window.status='';\"
title=\"Select\" onMouseOver=\"window.status='Select'; return
true;\">m-h</a>\n</pre>"
The words I'm trying to insert the pipe before have a non-breaking space
tags around them. I had it working where I can iterate through
everything and return a new string, but I end up losing all line breaks
and non-breaking spaces using
new_body = ''
template_body.to_html.split(" ").each do |el|
el.split("\n").each do |e|
unless e.empty? or e =~ /<\/?[^>]*>/
e = '|' + e
end
end
new_body += el
end
Any ideas?
Thanks,
~Jeremy