simulating a browser to get redirected URL location.

R

Roedy Green

I am trying to write some code that chases HTML redirect chains and
makes a list of URLs that have been permanently moved and where they
went.

The code I think should work to get at a "Location:" field in the
response header does not work. I just get null.

urlc.connect();
String location = urlc.getHeaderField( "Location" );

I found some code on the net that claims to work, but it is pretty
ugly:
http://www.kodejava.org/examples/198.html

Is there something obvious I am missing?
 
A

Arne Vajhøj

I am trying to write some code that chases HTML redirect chains and
makes a list of URLs that have been permanently moved and where they
went.

The code I think should work to get at a "Location:" field in the
response header does not work. I just get null.

urlc.connect();
String location = urlc.getHeaderField( "Location" );

I found some code on the net that claims to work, but it is pretty
ugly:
http://www.kodejava.org/examples/198.html

Is there something obvious I am missing?

Use Jakarta HttpClient instead.

Arne
 
R

Roedy Green

Is urlc set up to automatically follow redirects?

yes. That works. It is also the default. When you turn it off you get
a little message as the page content about the redirect.
 
R

Roedy Green

yes. That works. It is also the default. When you turn it off you get
a little message as the page content about the redirect.

However, I still can't get the Location parm. I figured out how to
get Intellij to trace through getHeaderField, and it is looking
through a list of parms, just none is Location.

I have two idea to attack.

1. Use wireshare to find out if Location is indeed in the returned
header, both with and without followRedirection and find out what
status codes you get.

2. At some point getHeaderField must flip from scanning the header to
send to the header you received. I must find out precisely when that
is and if it does happen as expected. Perhaps it happens only after
you open the InputStream.
 
R

Roedy Green

However, I still can't get the Location parm. I figured out how to
get Intellij to trace through getHeaderField, and it is looking
through a list of parms, just none is Location.

It is sort of working now. You see the location only if you turn off
follow redirects. I think you are likely then just finding out about
the first leg.

Browsers must do the fetch in explicit stages. The Location: field is
not there when you fetch the last leg. You have to get it from the
second to last leg.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,765
Messages
2,569,568
Members
45,042
Latest member
icassiem

Latest Threads

Top