html method on watir don't return the real html

M

Mario Ruiz

Hi,
I'm using watir to get the html of a page in order to verify every
single page on w3c.org but what I get with watir is something like:
<HTML lang=is xml:lang="is"
xmlns="http://www.w3.org/1999/xhtml"><HEAD><TITLE>Certus Games</TITLE>
<META http-equiv=Cache-Control content=no-cache>
<META http-equiv=Pragma content=no-cache>
<META http-equiv=Expires content=-1>
<META http-equiv=Content-Type content="text/html; charset=utf-8"><LINK
media=screen href="/CF/css/screen.css" type=text/css
rel=stylesheet><LINK media=print href="/CF/css/print.css" type=text/css
rel=stylesheet><LINK media=screen href="/CF/css/lib/iestyles.css"
type=text/css rel=stylesheet>
<SCRIPT src="/CF/js/lib/jquery.js" type=text/javascript></SCRIPT>
....

And the real content is:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" lang="is" xml:lang="is">
<head>
<meta http-equiv="Cache-Control" content="no-cache"/>
<meta http-equiv="Pragma" content="no-cache"/>
<meta http-equiv="Expires" content="-1"/>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"
/>
....


Any idea how to get the real html content?

thank you.
 
M

Mario Ruiz

Since I think it's impossible to get the real html I'm trying to do it
using net/http library.

In my case I need first to login the user onto the webpage and then go
to the pages I want to get the html.
This is what I'm doing... but it seems to me the user can't login:

require 'net/http'
require 'activesupport'
server="myserver.com"
port=80

http = Net::HTTP.new(server, port)
http.start()

data={":loginName=>"myLoginName",
:password=>"myPassord"
}

resp,data=http.post("/login.do",data.to_query)
if resp.code=="200" then
headers={
"Cookie"=>resp["set-cookie"].to_s()
}
end
resp,data=http.get('/MyAccount.do',headers)
html=data
http.finish()

Any idea???

Thanks.
 
H

Heesob Park

Hi,

2009/5/19 Mario Ruiz said:
Hi,
I'm using watir to get the html of a page in order to verify every
single page on w3c.org but what I get with watir is something like:
<HTML lang=3Dis xml:lang=3D"is"
xmlns=3D"http://www.w3.org/1999/xhtml"><HEAD><TITLE>Certus Games</TITLE>
<META http-equiv=3DCache-Control content=3Dno-cache>
<META http-equiv=3DPragma content=3Dno-cache>
<META http-equiv=3DExpires content=3D-1>
<META http-equiv=3DContent-Type content=3D"text/html; charset=3Dutf-8"><L= INK
media=3Dscreen href=3D"/CF/css/screen.css" type=3Dtext/css
rel=3Dstylesheet><LINK media=3Dprint href=3D"/CF/css/print.css" type=3Dte= xt/css
rel=3Dstylesheet><LINK media=3Dscreen href=3D"/CF/css/lib/iestyles.css"
type=3Dtext/css rel=3Dstylesheet>
<SCRIPT src=3D"/CF/js/lib/jquery.js" type=3Dtext/javascript></SCRIPT>
....

And the real content is:

<?xml version=3D"1.0" encoding=3D"UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xmlns=3D"http://www.w3.org/1999/xhtml" lang=3D"is" xml:lang=3D"is">
=C2=A0<head>
=C2=A0 =C2=A0<meta http-equiv=3D"Cache-Control" content=3D"no-cache"/>
=C2=A0 =C2=A0<meta http-equiv=3D"Pragma" content=3D"no-cache"/>
=C2=A0 =C2=A0<meta http-equiv=3D"Expires" content=3D"-1"/>
=C2=A0 =C2=A0<meta http-equiv=3D"Content-Type" content=3D"text/html; char= set=3Dutf-8"
/>
....


Any idea how to get the real html content?
If you are using Ruby 1.8.6 on Windows, try this code:

require 'win32/api'
require 'watir'

IID_IPersistStreamInit =3D
[0x7FD52380,0x4E07,0x101B,0xAE,0x2D,0x08,0x00,0x2B,0x2E,0xC7,0x13].pack('LS=
SC8')

CreateStreamOnHGlobal =3D Win32::API.new('CreateStreamOnHGlobal', 'LLP',
'L','ole32')
OleSaveToStream =3D Win32::API.new('OleSaveToStream', 'LL', 'L','ole32')
GetHGlobalFromStream =3D Win32::API.new('GetHGlobalFromStream', 'LP', 'L','=
ole32')
GlobalSize =3D Win32::API.new('GlobalSize', 'L', 'L')
GlobalLock =3D Win32::API.new('GlobalLock', 'L', 'L')
CopyMemory =3D Win32::API.new('RtlMoveMemory', 'PLL', 'V')
GlobalUnlock =3D Win32::API.new('GlobalUnlock', 'L', 'L')

browser =3D Watir::Browser.new
browser.goto("http://www.google.com")

ptr =3D browser.document.inspect.scan(/:(0x[\da-f]+)?>/).to_s.hex
p =3D 0.chr * 4
CopyMemory.call(p,ptr+16,4)
data =3D p.unpack('L').first
p =3D 0.chr * 4
CopyMemory.call(p,data,4)
dispatch =3D p.unpack('L').first

lpVtbl =3D 0.chr * 4
table =3D 0.chr * 28
CopyMemory.call(lpVtbl,dispatch,4)
CopyMemory.call(table,lpVtbl.unpack('L').first,28)
table =3D table.unpack('L*')
queryInterface =3D Win32::API::Function.new(table[0],'PPP','L')
p =3D 0.chr * 4
hr =3D queryInterface.call(dispatch,IID_IPersistStreamInit,p)
persiststream =3D p.unpack('L').first

p =3D 0.chr * 4
CreateStreamOnHGlobal.call(0, 1, p)
stream =3D p.unpack('L').first
if OleSaveToStream.call(persiststream,stream) =3D=3D 0
p =3D 0.chr * 4
GetHGlobalFromStream.call(stream,p)
h =3D p.unpack('L').first
size =3D GlobalSize.call(h)
buf =3D 0.chr * size
ptr =3D GlobalLock.call(h)
if ptr !=3D 0
CopyMemory.call(buf,ptr,size)
GlobalUnlock.call(ptr)
html =3D buf[16..-1]
else
html =3D nil
end
end

puts html


Regards,
Park Heesob
 
M

Mario Ruiz

It's working!!!!! Thanks a lot.

Since I'm not using the last Watir version I changed Watir::Browser.new
for Watir::IE.new and it's working fine.

Again... thanks a lot.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top