Getting final url when original url redirects

Discussion in 'Python' started by IanR, Mar 12, 2009.

  1. IanR

    IanR Guest

    I'm processing RSS content from a # of given sources. Most of the
    time the url given by the RSS feed redirects to the real URL (I'm
    guessing they do this for tracking purposes)

    For example.

    This is a url that I get from and RSS feed,
    http://www.pheedcontent.com/click.phdo?i=d22e9bc7641aab8a0566526f61806512
    It redirects to
    http://www.macsimumnews.com/index.php/archive/klipsch_developing_headphones_for_new_ipod_shuffle/

    I want to record the final URL and not the URL I get from the RSS feed
    (However sometimes there is no redirect so I might want the original
    URL)

    I've tried sniffing the header and don't see any "Location:"... I
    think sites are using different ways to redirect. Does anyone have
    any suggestions on how I might handle this?
    IanR, Mar 12, 2009
    #1
    1. Advertising

  2. On Mar 12, 2009, at 3:57 PM, IanR wrote:

    > I'm processing RSS content from a # of given sources. Most of the
    > time the url given by the RSS feed redirects to the real URL (I'm
    > guessing they do this for tracking purposes)
    >
    > For example.
    >
    > This is a url that I get from and RSS feed,
    > http://www.pheedcontent.com/click.phdo?i=d22e9bc7641aab8a0566526f61806512
    > It redirects to
    > http://www.macsimumnews.com/index.php/archive/klipsch_developing_headphones_for_new_ipod_shuffle/
    >
    > I want to record the final URL and not the URL I get from the RSS feed
    > (However sometimes there is no redirect so I might want the original
    > URL)
    >
    > I've tried sniffing the header and don't see any "Location:"... I
    > think sites are using different ways to redirect. Does anyone have
    > any suggestions on how I might handle this?



    Hi Ian,
    Using Firefox's Live HTTP Headers extension, I see a 302 redirect with
    a Location header (see session log below). Are aware that urrlib2
    resolves redirects for you? That might be why you're not seeing what
    you expect. If you want a record of each URL you'll have to implement
    an HTTPRedirectHandler.



    http://www.pheedcontent.com/click.phdo?i=d22e9bc7641aab8a0566526f61806512

    GET /click.phdo?i=d22e9bc7641aab8a0566526f61806512 HTTP/1.1
    Host: www.pheedcontent.com
    User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:
    1.9.0.7) Gecko/2009021906 Firefox/3.0.7
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
    Accept-Language: en-us,en;q=0.7,sv;q=0.3
    Accept-Encoding: gzip,deflate
    Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
    Keep-Alive: 300
    Connection: keep-alive

    HTTP/1.x 302 Found
    Date: Thu, 12 Mar 2009 20:41:29 GMT
    Server: Apache
    X-Powered-By: PHP/5.2.3-1ubuntu6.3
    Pragma: no-cache
    Cache-Control: no-cache, must-revalidate
    Set-Cookie: phdo=1-tst
    %7Cv3
    %3Ac3cbcae440ff783381d0d9fa96f14d05
    %3Aa8t5sELbkk9oy3pXsrohSnPslqQxQKIhVP%2F8Ots%3D; expires=Fri, 13-
    Mar-2009 20:41:29 GMT; path=/; domain=pheedo.com
    Location: http://www.macsimumnews.com/index.php/archive/klipsch_developing_headphones_for_new_ipod_shuffle/
    Content-Encoding: gzip
    Vary: Accept-Encoding
    Content-Length: 26
    Connection: close
    Content-Type: text/html
    ----------------------------------------------------------
    http://www.macsimumnews.com/index.php/archive/klipsch_developing_headphones_for_new_ipod_shuffle/


    etc. etc.
    Philip Semanchuk, Mar 12, 2009
    #2
    1. Advertising

  3. On Thu, 2009-03-12 at 12:57 -0700, IanR wrote:
    > I'm processing RSS content from a # of given sources. Most of the
    > time the url given by the RSS feed redirects to the real URL (I'm
    > guessing they do this for tracking purposes)
    >
    > For example.
    >
    > This is a url that I get from and RSS feed,
    > http://www.pheedcontent.com/click.phdo?i=d22e9bc7641aab8a0566526f61806512
    > It redirects to
    > http://www.macsimumnews.com/index.php/archive/klipsch_developing_headphones_for_new_ipod_shuffle/
    >
    > I want to record the final URL and not the URL I get from the RSS feed
    > (However sometimes there is no redirect so I might want the original
    > URL)
    >
    > I've tried sniffing the header and don't see any "Location:"... I
    > think sites are using different ways to redirect. Does anyone have
    > any suggestions on how I might handle this?


    If you are using urllib[2]:

    >>> url =

    'http://www.pheedcontent.com/click.phdo?i=d22e9bc7641aab8a0566526f61806512'
    >>> o = urllib2.urlopen(url)
    >>> o.url

    'http://www.macsimumnews.com/index.php/archive/klipsch_developing_headphones_for_new_ipod_shuffle/'
    Albert Hopkins, Mar 12, 2009
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Anders Skar
    Replies:
    1
    Views:
    2,084
    Anders Skar
    Jul 22, 2004
  2. JFCM
    Replies:
    4
    Views:
    5,723
  3. Replies:
    5
    Views:
    502
    Chris Uppal
    Nov 17, 2006
  4. David Shorthouse

    OT?: framesets, redirects, and URL address

    David Shorthouse, Jan 30, 2005, in forum: ASP General
    Replies:
    9
    Views:
    160
    Dave Anderson
    Feb 1, 2005
  5. Daniel Otero
    Replies:
    1
    Views:
    84
    Michael J. I. Jackson
    Jun 25, 2009
Loading...

Share This Page