Hpricot not returning the right html??

H

Hannes Rammer

Hi when i get this url

http://www.basketball-bund.net/index.jsp?Action=100&Verband=100

it shows a table with basketball info at page 1

if i want to go to page 2

this is the working url

http://www.basketball-bund.net/index.jsp?Action=100&Verband=100&startrow=10

like it says it starts with the 10th result...

wellif i enter the url into the browser address bar it works fine.. but
when i look for the html in Hpricot it just returns the first page..

ive found out that if the startrow bit is wrong then it always shows the
first page.. but itseems to be right as its working in the browser...


i got the same problem using

URI.parse



here is my code


q =
'http://www.basketball-bund.net/index.jsp?Action=100&Verband=100&viewid=&startrow=10'


f = open(q)
f.rewind
doc = Hpricot(Iconv.conv('utf-8', f.charset, f.readlines.join("\n")))
form = doc.search("//form[@name=ligaliste]")


can anyone help me pls

thx
 
H

Hannes Rammer

hmmm noone replied... well in case someone hase the same problem.. i
have found the solution



q = "http://www.basketball-bund.net/index.jsp?#{search_string}"
agent = WWW::Mechanize.new
doc =
agent.get("http://www.basketball-bund.net/index.jsp?Action=100&Verband=100")
doc = agent.get(q)

doc = doc.search('body').to_html
#convert iso15 to utf8
doc = Iconv.iconv("UTF-8", "ISO-8859-15", doc).to_s
#make it hpricot
doc = Hpricot(doc)
##end crawling
@q = q
form = doc.search("//form")

it seems that its because of cookcies or something.. that i needed to
reload the page once before i try to do my own search.. thats why i call
the agent.get twice

hope this helps anyone


Hannes said:
Hi when i get this url

http://www.basketball-bund.net/index.jsp?Action=100&Verband=100

it shows a table with basketball info at page 1

if i want to go to page 2

this is the working url

http://www.basketball-bund.net/index.jsp?Action=100&Verband=100&startrow=10

like it says it starts with the 10th result...

wellif i enter the url into the browser address bar it works fine.. but
when i look for the html in Hpricot it just returns the first page..

ive found out that if the startrow bit is wrong then it always shows the
first page.. but itseems to be right as its working in the browser...


i got the same problem using

URI.parse



here is my code


q =
'http://www.basketball-bund.net/index.jsp?Action=100&Verband=100&viewid=&startrow=10'


f = open(q)
f.rewind
doc = Hpricot(Iconv.conv('utf-8', f.charset, f.readlines.join("\n")))
form = doc.search("//form[@name=ligaliste]")


can anyone help me pls

thx
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top