Html download challenge

D

Darko Aleksic

That would be my guess, yes. The secret that I have been unable to uncover
is what properties do I need to set to make it work.

Exactly what he said. After you open the connection, set its request
property "User-Agent" to anything you want that doesn't contain "Java"
in it (I tried conn.setRequestProperty("User-Agent", "Paul") and it
worked.

I don't know what you are trying to do, but I just started using this
package, it is handy (and saves you some time/headaches):
http://sourceforge.net/projects/htmlparser

Darko
 
P

Patrick May

Paul Battersby said:
Here is some sample code. You will see me loading from 2 urls. The
first one works. The second (the one I care about) does not even
though my browser (Internet Explorer) has no trouble with the url.

I ran this against the RequestHeaderExample servlet provided with
Tomcat and got the following:

user-agent Java/1.5.0_02
host localhost:8080
accept text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
connection keep-alive
content-type application/x-www-form-urlencoded

Evidently the Java classes you're using are setting these defaults
and, as noted in other replies, Google doesn't like the user agent.

Regards,

Patrick
 
P

Paul Battersby

Yes, I did this:

System.setProperty("http.agent", "Test/1.0" +
"(" + System.getProperty("os.name") + ")");

and it worked.
 
R

Raymond DeCampo

Andrea said:
that's what I thought, but I dumped all the request parameters, and the
list turned out to be empty. If the class does it, it hides it.
I would really like to know how they do it.

Using the TcpTunnelGui from apache SOAP, I see that this is what is sent
by Java when running the above program:

============= start =========================================
GET / HTTP/1.1
User-Agent: Java/1.5.0_03
Host: localhost:8888
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive
Content-type: application/x-www-form-urlencoded

GET /search?q=business HTTP/1.1
User-Agent: Java/1.5.0_03
Host: localhost:8888
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive
Content-type: application/x-www-form-urlencoded

============= end =========================================

You could probably use the HttpClient module from apache to change the
User-Agent field.

Ray
 
S

sks

Raymond DeCampo said:
Using the TcpTunnelGui from apache SOAP, I see that this is what is sent
by Java when running the above program:

============= start =========================================
GET / HTTP/1.1
User-Agent: Java/1.5.0_03
Host: localhost:8888
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive
Content-type: application/x-www-form-urlencoded

GET /search?q=business HTTP/1.1
User-Agent: Java/1.5.0_03
Host: localhost:8888
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive
Content-type: application/x-www-form-urlencoded

============= end =========================================

Ah so I was right ;)
 
A

Andrea Desole

sks said:
Ah so I was right ;)

I would say so, yes.
I'm still wondering why I didn't see it with getRequestProperties().
Maybe I misunderstand what the method does
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,562
Members
45,038
Latest member
OrderProperKetocapsules

Latest Threads

Top