URLConnection tricks

R

Roedy Green

URLConnection urlc = url.openConnection();
urlc.setAllowUserInteraction( false );
urlc.setDoInput( true );
urlc.setDoOutput( false );
urlc.setUseCaches( true );
urlc.connect();
String mime = urlc.getContentType();


How would I modify that code to avoid getting the contents of the
file, get just the headers?

How would I turn off Keep-Alive?
 
T

Thomas Fritsch

Roedy said:
URLConnection urlc = url.openConnection();
urlc.setAllowUserInteraction( false );
urlc.setDoInput( true );
urlc.setDoOutput( false );
urlc.setUseCaches( true );
urlc.connect();
String mime = urlc.getContentType();


How would I modify that code to avoid getting the contents of the
file, get just the headers?
don't know...
How would I turn off Keep-Alive?
I think
urlc.setRequestProperty("Connection", "Close");
should do the trick. See http://www.faqs.org/rfcs/rfc2068.html (chapter
"14.10 Connection").
But I didn't actually test it.
 
G

Gordon Beaton

URLConnection urlc = url.openConnection();
urlc.setAllowUserInteraction( false );
urlc.setDoInput( true );
urlc.setDoOutput( false );
urlc.setUseCaches( true );
urlc.connect();
String mime = urlc.getContentType();


How would I modify that code to avoid getting the contents of the
file, get just the headers?

Use HEAD instead of GET. They are identical, except that the server
response must not include a message body. That probably rules out the
use of URLConnection though.
How would I turn off Keep-Alive?

Include the request header "Connection: close".

/gordon
 
C

Chris Uppal

Roedy said:
URLConnection urlc = url.openConnection();
urlc.setAllowUserInteraction( false );
urlc.setDoInput( true );
urlc.setDoOutput( false );
urlc.setUseCaches( true );
urlc.connect();
String mime = urlc.getContentType();


How would I modify that code to avoid getting the contents of the
file, get just the headers?

IIRC: cast the URLConnection to an HttpURLConnection and use
setRequestMethod("HEAD") before you connect().

BTW, a surprisingly large number of servers ignore the "HEAD" and treat it as a
"GET" -- that shouldn't affect your logic, but it can be confusing if you are
debugging with a network sniffer.

-- chris
 
T

Thomas Fritsch

Roedy said:
URLConnection urlc = url.openConnection();
urlc.setAllowUserInteraction( false );
urlc.setDoInput( true );
urlc.setDoOutput( false );
urlc.setUseCaches( true );
urlc.connect();
String mime = urlc.getContentType();


How would I modify that code to avoid getting the contents of the
file, get just the headers?
See
http://java.sun.com/j2se/1.4.2/docs...ction.html#setRequestMethod(java.lang.String)
==> Probably
((HttpURLConnection) urlc).setRequestMethod("HEAD");
will do. But obviously this depends on my unproved assumption, that
url.openConnection()
returns an instance of java.net.HttpURLConnection.
 
D

Dave Monroe

Roedy Green said:
URLConnection urlc = url.openConnection();
urlc.setAllowUserInteraction( false );
urlc.setDoInput( true );
urlc.setDoOutput( false );
urlc.setUseCaches( true );
urlc.connect();
String mime = urlc.getContentType();


How would I modify that code to avoid getting the contents of the
file, get just the headers?

How would I turn off Keep-Alive?


Have a look at my implementation of wget. The HttpUrlConnection class
inherits from URLConnection. The header info is visible there.

I'm pretty sure Keep-Alive is a tcp function.

import java.net.*;
import java.io.*;
import java.util.*;

public class wget {
public static void main(String [] args) {
if(args.length == 0) {
System.err.println("usage: java wget url");
System.exit(1);
}

try {
URL u = new URL(args[0]);
HttpURLConnection h =
(HttpURLConnection)u.openConnection();

InputStream is = h.getInputStream();

BufferedReader in = new BufferedReader(
new InputStreamReader(is));

String line;

while ((line = in.readLine()) != null) {
System.out.println(line);
}

}
catch(Exception e) {
System.err.println(e);
System.exit(1);
}
}
}
 
J

John C. Bollinger

Dave said:
Have a look at my implementation of wget. The HttpUrlConnection class
inherits from URLConnection. The header info is visible there.
Okay.

I'm pretty sure Keep-Alive is a tcp function.

It is, but I suspect Roedy actually meant to ask about the "Persistent
Connection" feature of HTTP 1.1. The HttpURLConnection API docs do say
"Calling the disconnect() method may close the underlying socket if a
persistent connection is otherwise idle at that time," which is at least
one avenue of approach. If checking many URLs on the same server,
however, then it may be counterproductive to do that.
public class wget {
public static void main(String [] args) {
[...]

URL u = new URL(args[0]);
HttpURLConnection h =
(HttpURLConnection)u.openConnection();

And that should work reliably, as long as the URL is an HTTP URL.
InputStream is = h.getInputStream();

BufferedReader in = new BufferedReader(
new InputStreamReader(is));

That, however, is pretty broken. It depends on the system's default
charset being suitable. It will work for ASCII, UTF-8, or any of the
ISO-8859 family of charsets, as they all agree on the first 128 code
points, but it will break on many others.
String line;

while ((line = in.readLine()) != null) {
System.out.println(line);
}

And that's pretty naive. It does not account for any content coding
that might have been specified in the header, nor (in the case of HTML
content) for a charset specified via a meta tag. It also doesn't do
anything sensible with content that is inherently binary.

[...]

That also doesn't answer the actual question, which was about obtaining
the content type without retrieving the entire content. Assuming that
the point is to reduce impact on the server, using the "HEAD" HTTP
method to make the request is the only way that has a reasonable chance
of solving the problem. If the concern is more for the client side then
the client could always close the connection without reading the content
-- for large entities that might prevent the content from being fully
retrieved when the server treats "HEAD" as "GET".


John Bollinger
(e-mail address removed)
 
R

Roedy Green

I'm pretty sure Keep-Alive is a tcp function.

Connect: Keep-Alive is also a field in the GET header to say, keep the
socket open after sending the response for more traffic. Most of the
time you don't need it and you needlessly keep the server dangling.
 
A

Andrew Thompson


Neat! Would you consider adding some JS(!)
to allow configuring it by URL?

If your applet accepted a 'url' parameter,
you could write the parameter into the applet
tag using JS, or default to the 'no param'
applet for those UA's with no JS.

Yes, yes. I am familiar with your aversion
to mixing JS/Java. But you can gain extra
functionality while losing nothing in this
case.

The ability to specify the entire check
as an URL would be most handy.
[ I can recall situations in the c.i.w.a.h.
and c.i.w.a.s groups where I could show
someone what is going wrong using the
added functionality. ]
 
R

Roedy Green

Neat! Would you consider adding some JS(!)
to allow configuring it by URL?

You want to invoke the applet, and give it a parameter with just an
URL.

e.g. http://mindprod.com/mimecheck.html?http:somewhere.org/app.jnlp

JavaScript on that page takes the parameter, and feeds it to MimeCheck
as an Applet tag, which then starts up and displays the URL, and
auto-presses TEST. The browser then views the completed applet page.

I don't know enough JavaScript to do that, but I do know how to add a
tag called URL to the Applet. If you want to cook up the javascript
part, MimeCheck version 1.2 now has an optional

<param name="URL" value="http://mindprod.com/esper.jnlp">
 
A

Andrew Thompson

...
Your code is now hooked up. You can now test a url all on the command
line.

Excellent, I checked it worked from here
and removed the files from my server.
For example to test the MIME types being sent by
http://java.sun.com/index.jsp you would type into your browser's
command line:
http://mindprod.com/urlcheck.html?url=http://java.sun.com/index.jsp

Not that you can _also_ use..

http://mindprod.com/urlcheck.html?=http://java.sun.com/index.jsp
or
<http://mindprod.com/urlcheck.html?t...llutterlyignore=http://java.sun.com/index.jsp>

;-)

(shrugs) The 'url' bit seemed logical,
but the JS will work without it. [All
it needs is '?=' before the actual URL ]
 
R

Roedy Green

Not that you can _also_ use..

Not really. that is not the documented interface. I reserve the
right to create new optional parameters, and if you used something
other than url, it could stop working.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,024
Latest member
ARDU_PROgrammER

Latest Threads

Top