simulating a browser to get redirected URL location.

Discussion in 'Java' started by Roedy Green, Nov 16, 2010.

  1. Roedy Green

    Roedy Green Guest

    I am trying to write some code that chases HTML redirect chains and
    makes a list of URLs that have been permanently moved and where they
    went.

    The code I think should work to get at a "Location:" field in the
    response header does not work. I just get null.

    urlc.connect();
    String location = urlc.getHeaderField( "Location" );

    I found some code on the net that claims to work, but it is pretty
    ugly:
    http://www.kodejava.org/examples/198.html

    Is there something obvious I am missing?
    --
    Roedy Green Canadian Mind Products
    http://mindprod.com

    Finding a bug is a sign you were asleep a the switch when coding. Stop debugging, and go back over your code line by line.
     
    Roedy Green, Nov 16, 2010
    #1
    1. Advertising

  2. Roedy Green

    Arne Vajhøj Guest

    On 16-11-2010 11:36, Roedy Green wrote:
    > I am trying to write some code that chases HTML redirect chains and
    > makes a list of URLs that have been permanently moved and where they
    > went.
    >
    > The code I think should work to get at a "Location:" field in the
    > response header does not work. I just get null.
    >
    > urlc.connect();
    > String location = urlc.getHeaderField( "Location" );
    >
    > I found some code on the net that claims to work, but it is pretty
    > ugly:
    > http://www.kodejava.org/examples/198.html
    >
    > Is there something obvious I am missing?


    Use Jakarta HttpClient instead.

    Arne
     
    Arne Vajhøj, Nov 16, 2010
    #2
    1. Advertising

  3. Roedy Green

    Roedy Green Guest

    On Tue, 16 Nov 2010 19:16:13 +0100, Jake Jarvis
    <> wrote, quoted or indirectly quoted someone
    who said :

    >
    >Is urlc set up to automatically follow redirects?


    yes. That works. It is also the default. When you turn it off you get
    a little message as the page content about the redirect.
    --
    Roedy Green Canadian Mind Products
    http://mindprod.com

    Finding a bug is a sign you were asleep a the switch when coding. Stop debugging, and go back over your code line by line.
     
    Roedy Green, Nov 16, 2010
    #3
  4. Roedy Green

    Roedy Green Guest

    On Tue, 16 Nov 2010 13:48:57 -0800, Roedy Green
    <> wrote, quoted or indirectly quoted
    someone who said :

    >>
    >>Is urlc set up to automatically follow redirects?

    >
    >yes. That works. It is also the default. When you turn it off you get
    >a little message as the page content about the redirect.


    However, I still can't get the Location parm. I figured out how to
    get Intellij to trace through getHeaderField, and it is looking
    through a list of parms, just none is Location.

    I have two idea to attack.

    1. Use wireshare to find out if Location is indeed in the returned
    header, both with and without followRedirection and find out what
    status codes you get.

    2. At some point getHeaderField must flip from scanning the header to
    send to the header you received. I must find out precisely when that
    is and if it does happen as expected. Perhaps it happens only after
    you open the InputStream.
    --
    Roedy Green Canadian Mind Products
    http://mindprod.com

    Finding a bug is a sign you were asleep a the switch when coding. Stop debugging, and go back over your code line by line.
     
    Roedy Green, Nov 17, 2010
    #4
  5. Roedy Green

    Roedy Green Guest

    On Tue, 16 Nov 2010 21:37:54 -0800, Roedy Green
    <> wrote, quoted or indirectly quoted
    someone who said :

    >
    >However, I still can't get the Location parm. I figured out how to
    >get Intellij to trace through getHeaderField, and it is looking
    >through a list of parms, just none is Location.


    It is sort of working now. You see the location only if you turn off
    follow redirects. I think you are likely then just finding out about
    the first leg.

    Browsers must do the fetch in explicit stages. The Location: field is
    not there when you fetch the last leg. You have to get it from the
    second to last leg.

    --
    Roedy Green Canadian Mind Products
    http://mindprod.com

    Finding a bug is a sign you were asleep a the switch when coding. Stop debugging, and go back over your code line by line.
     
    Roedy Green, Nov 17, 2010
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Kaidi
    Replies:
    3
    Views:
    1,379
    Kaidi
    Jan 4, 2004
  2. Muggle

    Finding the Redirected URL

    Muggle, Aug 26, 2008, in forum: Java
    Replies:
    1
    Views:
    3,903
    Stefan Ram
    Aug 26, 2008
  3. Will
    Replies:
    5
    Views:
    283
    Alan J. Flavell
    Dec 2, 2003
  4. Edward Diener
    Replies:
    11
    Views:
    440
    Bart Van der Donck
    Oct 26, 2007
  5. Alex Bird
    Replies:
    3
    Views:
    190
    Thomas 'PointedEars' Lahn
    May 9, 2008
Loading...

Share This Page