HttpURLConnection and GET

F

Frank Natoli

I understand that HttpURLConnection can be connected and then the argument
URL (file) read using the stream returned from getInputStream().

But what if I want to establish the connection, the repeatedly request a
file, download the file, then request another file, etc.?

HTTP server (SunOne) appears to disregard output stream data:

GET /filename_to_download HTTP/1.1

and simply and always give me index.html at the server root.

How to interact with the HTTP server for a series of files?

Thanks.
 
R

Roedy Green

I understand that HttpURLConnection can be connected and then the argument
URL (file) read using the stream returned from getInputStream().

I think you need a new connection for each file. This why it pays to
bundle files up in jars rather than download them one by one.

However, I have seen Images load over and over, presumably with the
same connection and the same filename. I have not snooped on the
conversation to verify that.
 
B

Babu Kalakrishnan

Roedy said:
I think you need a new connection for each file. This why it pays to
bundle files up in jars rather than download them one by one.

However, I have seen Images load over and over, presumably with the
same connection and the same filename. I have not snooped on the
conversation to verify that.

The "new connection" for every file (actually should be "for every
request") isn't strictly true. The HTTP/1.1 specification allows for
persistent connections wherein you could pipeline multiple requests
through a single TCP connection - basically to avoid the big overhead
for establishing a connection. You should find details in the relevant
RFC (2616 if my memory serves me right)

However different clients (and servers) implement these differently -
and at least till JDK 1.3 timeframe Sun's implementation was one of the
worst.(For example if you open a connection to a URL that carries a
100MB download - no further connections to the same host would succed
until the download completed because Sun's dumb HttpClient insisted on
reusing the connection).

Haven't checked later versions to check whether if it's true even now
because I found that there were better HttpClient implementations
available in the market that gave you far more flexibility, and hence
never went back to the Sun HttpClient. I'd suggest that the OP takes a
look at the Jakarta (Apache) implementation - (There are a few more good
ones which I'm certain would show up in a Google search).

And as for Roedy's comment about "Images loading over and over", I'd
think that it's mostly due to browser caching that you don't perceive a
delay for subsequent loads. The browser doesn't actually retrieve them
multiple times - but uses the cached copy that it has (unless the
resource declares itself as non-cacheable or its expiry period has
passed - both of which doesn't happen in the case of of static resources
too often).

BK
 
T

Thomas Weidenfeller

Babu said:
The "new connection" for every file (actually should be "for every
request") isn't strictly true. The HTTP/1.1 specification allows for
persistent connections wherein you could pipeline multiple requests
through a single TCP connection - basically to avoid the big overhead
for establishing a connection.

Pipelining and keeping the connection open are related, but different
concepts. HTTP's normal way or working is that you send a request and
then wait for a response. You can keep the connection open after you get
the response (HTTP/1.1) to avoid the overhead of setting up the
connection for the next request. But this alone is not pipelining.

When you do pipelining you send requests without waiting for a response
before sending the next request. You do that over a connection which you
keep open. The idea with pipelining is to avoid delaying requests
because there is an outstanding response. Pipelining is an experimental
feature, while keeping a connection open is an official feature since
HTTP/1.1 and was often hacked into HTTP/1.0 (Keep-Alive).

/Thomas
 
R

Roedy Green

And as for Roedy's comment about "Images loading over and over", I'd
think that it's mostly due to browser caching that you don't perceive a
delay for subsequent loads. The browser doesn't actually retrieve them
multiple times - but uses the cached copy that it has (unless the
resource declares itself as non-cacheable or its expiry period has
passed - both of which doesn't happen in the case of of static resources
too often).

I have seen these with webcams where simply flushing the Image causes
it to reload a new jpg. You need a new image - a new file had to have
been downloaded using the same Image connection.
 
F

FrankNatoli

Babu said:
The "new connection" for every file (actually should be "for every
request") isn't strictly true. The HTTP/1.1 specification allows for
persistent connections wherein you could pipeline multiple requests
through a single TCP connection - basically to avoid the big overhead
for establishing a connection. You should find details in the relevant
RFC (2616 if my memory serves me right)

However different clients (and servers) implement these differently -
and at least till JDK 1.3 timeframe Sun's implementation was one of the
worst.(For example if you open a connection to a URL that carries a
100MB download - no further connections to the same host would succed
until the download completed because Sun's dumb HttpClient insisted on
reusing the connection).

Haven't checked later versions to check whether if it's true even now
because I found that there were better HttpClient implementations
available in the market that gave you far more flexibility, and hence
never went back to the Sun HttpClient. I'd suggest that the OP takes a
look at the Jakarta (Apache) implementation - (There are a few more good
ones which I'm certain would show up in a Google search).

And as for Roedy's comment about "Images loading over and over", I'd
think that it's mostly due to browser caching that you don't perceive a
delay for subsequent loads. The browser doesn't actually retrieve them
multiple times - but uses the cached copy that it has (unless the
resource declares itself as non-cacheable or its expiry period has
passed - both of which doesn't happen in the case of of static resources
too often).

BK

If I understand correctly, HttpClient is an Apache product, whereas
HttpURLConnection is part of the Sun API. I have seen sample code for
HttpClient that appears to perform GET interactions, but that same
interaction does not appear to work with HttpURLConnection.
Specifically, after performing method connect, then using PrintWriter
or BufferedFileWriter to send "GET /filename HTTP1.1\r\n", then using
BufferedFileReader to get the reply and data, the reply always is
"index.html", not "filename". That is, the SunONE virtual server
appears to disregard the "GET /filename HTTP1.1".

If I could get over the hump of getting SunONE to pay attention to my
"GET", I could probably keep the connection open for repeat downloads.
How to do that?

Thanks for your time.
 
B

Babu Kalakrishnan

Thomas said:
Pipelining and keeping the connection open are related, but different
concepts. HTTP's normal way or working is that you send a request and
then wait for a response. You can keep the connection open after you get
the response (HTTP/1.1) to avoid the overhead of setting up the
connection for the next request. But this alone is not pipelining.

When you do pipelining you send requests without waiting for a response
before sending the next request. You do that over a connection which you
keep open. The idea with pipelining is to avoid delaying requests
because there is an outstanding response. Pipelining is an experimental
feature, while keeping a connection open is an official feature since
HTTP/1.1 and was often hacked into HTTP/1.0 (Keep-Alive).

OK - the terminology "pipelining" might have been wrong (apologize for
the error), but what I was referring to HttpUrlConnection's lame
implementation of the "Keep-Alive (1.0)" or persistent connections (1.1)
feature wherein it would refuse to open a new connection to a host to
whom it is already connected.

I hit this problem when we had an application that needed to use a
persistent connection to update a (non-browser) client whenever the
server had something new to send. (This was a non-browser application -
we just needed the HTTP protocol because it was a client specification).
But then the clients sometimes had to send some stuff to the server
asynchronously, and it would never go through till the earlier
connection was either closed or timed out. We eventualy ended up using
the HttpClient implementation from http://www.innovation.ch, and we've
never had a problem report todate (about 4 years since the project was
deployed). Yes I admit our implementation was somewhat hacky (as hacky
as any server push technique gets) but we had the liberty of having to
work with only a specified type of server - Tomcat4.

BK


BK
 
B

Babu Kalakrishnan

Roedy said:
I have seen these with webcams where simply flushing the Image causes
it to reload a new jpg. You need a new image - a new file had to have
been downloaded using the same Image connection.

Dynamic content such as webcam outputs is not what I referred to (Sorry
if I misunderstood your post).

For all that matter, Sun's implementatim may very well work fine for
such scenarios. The difference (as I have observed - I may be wrong) is
that most third party client implementations would open a new connection
if the existing connection hasn't completed its response unless you
explicitly tell it not to - (which might be the case if the server
supports pipelining - see Thomas's post for the difference ) - they
would reuse connections only if one was free - whereas Sun's
implementation would doggedly stick to the policy of reusing connections
- even if it had to wait an eternity to do so.

BK
 
B

Babu Kalakrishnan

If I understand correctly, HttpClient is an Apache product, whereas
HttpURLConnection is part of the Sun API. I have seen sample code for
HttpClient that appears to perform GET interactions, but that same
interaction does not appear to work with HttpURLConnection.
Specifically, after performing method connect, then using PrintWriter
or BufferedFileWriter to send "GET /filename HTTP1.1\r\n", then using
BufferedFileReader to get the reply and data, the reply always is
"index.html", not "filename". That is, the SunONE virtual server
appears to disregard the "GET /filename HTTP1.1".

If I could get over the hump of getting SunONE to pay attention to my
"GET", I could probably keep the connection open for repeat downloads.
How to do that?

Have you checked if the requests you're submitting are fully 1.1
compliant ? For instance a simple GET /xyz request is not deemed
sufficient under HTTP/1.1, you need to also have a request header that
specified the hostname (whereas it might have been sufficient if your
request waslabelled HTTP/1,0)

BK
 
F

FrankNatoli

Compliant? I guess what I really, really need is some sample code that
(1) connects an HttpURLConnection, (2) writes a "GET" request and (3)
reads the reply. My code gets no errors, but also never gets anything
except index.html, regardless of the "GET" argument.

Would be MOST grateful if you could vector me to some documentation
that described HttpURLConnection operation. Sun appears to have ZERO
information in that regard. Sun is good with reference material, but
abysmal with educational material.

Thanks for your time.
 
R

Raymond DeCampo

Compliant? I guess what I really, really need is some sample code that
(1) connects an HttpURLConnection, (2) writes a "GET" request and (3)
reads the reply. My code gets no errors, but also never gets anything
except index.html, regardless of the "GET" argument.

Would be MOST grateful if you could vector me to some documentation
that described HttpURLConnection operation. Sun appears to have ZERO
information in that regard. Sun is good with reference material, but
abysmal with educational material.

It seems to me that you are misusing HttpURLConnection class. It is a
high level class, that is supposed to do the grunt work of sending "GET
/blah.html" to server. You are probably just writing your GET request
to the bit bucket.

If you want to use HttpURLConnection, then construct it with the URL you
want to retrieve.

If you want to be in the business of writing the GET request yourself,
just open a socket and go for it.

HTH,
Ray
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top