Dave said:
Have a look at my implementation of wget. The HttpUrlConnection class
inherits from URLConnection. The header info is visible there.
Okay.
I'm pretty sure Keep-Alive is a tcp function.
It is, but I suspect Roedy actually meant to ask about the "Persistent
Connection" feature of HTTP 1.1. The HttpURLConnection API docs do say
"Calling the disconnect() method may close the underlying socket if a
persistent connection is otherwise idle at that time," which is at least
one avenue of approach. If checking many URLs on the same server,
however, then it may be counterproductive to do that.
public class wget {
public static void main(String [] args) {
[...]
URL u = new URL(args[0]);
HttpURLConnection h =
(HttpURLConnection)u.openConnection();
And that should work reliably, as long as the URL is an HTTP URL.
InputStream is = h.getInputStream();
BufferedReader in = new BufferedReader(
new InputStreamReader(is));
That, however, is pretty broken. It depends on the system's default
charset being suitable. It will work for ASCII, UTF-8, or any of the
ISO-8859 family of charsets, as they all agree on the first 128 code
points, but it will break on many others.
String line;
while ((line = in.readLine()) != null) {
System.out.println(line);
}
And that's pretty naive. It does not account for any content coding
that might have been specified in the header, nor (in the case of HTML
content) for a charset specified via a meta tag. It also doesn't do
anything sensible with content that is inherently binary.
[...]
That also doesn't answer the actual question, which was about obtaining
the content type without retrieving the entire content. Assuming that
the point is to reduce impact on the server, using the "HEAD" HTTP
method to make the request is the only way that has a reasonable chance
of solving the problem. If the concern is more for the client side then
the client could always close the connection without reading the content
-- for large entities that might prevent the content from being fully
retrieved when the server treats "HEAD" as "GET".
John Bollinger
(e-mail address removed)