Getting final url when original url redirects

Discussion in 'Python' started by IanR, Mar 12, 2009.

  1. IanR

    IanR Guest

    I'm processing RSS content from a # of given sources. Most of the
    time the url given by the RSS feed redirects to the real URL (I'm
    guessing they do this for tracking purposes)

    For example.

    This is a url that I get from and RSS feed,
    http://www.pheedcontent.com/click.phdo?i=d22e9bc7641aab8a0566526f61806512
    It redirects to
    http://www.macsimumnews.com/index.php/archive/klipsch_developing_headphones_for_new_ipod_shuffle/

    I want to record the final URL and not the URL I get from the RSS feed
    (However sometimes there is no redirect so I might want the original
    URL)

    I've tried sniffing the header and don't see any "Location:"... I
    think sites are using different ways to redirect. Does anyone have
    any suggestions on how I might handle this?
     
    IanR, Mar 12, 2009
    #1
    1. Advertisements

  2. On Mar 12, 2009, at 3:57 PM, IanR wrote:

    > I'm processing RSS content from a # of given sources. Most of the
    > time the url given by the RSS feed redirects to the real URL (I'm
    > guessing they do this for tracking purposes)
    >
    > For example.
    >
    > This is a url that I get from and RSS feed,
    > http://www.pheedcontent.com/click.phdo?i=d22e9bc7641aab8a0566526f61806512
    > It redirects to
    > http://www.macsimumnews.com/index.php/archive/klipsch_developing_headphones_for_new_ipod_shuffle/
    >
    > I want to record the final URL and not the URL I get from the RSS feed
    > (However sometimes there is no redirect so I might want the original
    > URL)
    >
    > I've tried sniffing the header and don't see any "Location:"... I
    > think sites are using different ways to redirect. Does anyone have
    > any suggestions on how I might handle this?



    Hi Ian,
    Using Firefox's Live HTTP Headers extension, I see a 302 redirect with
    a Location header (see session log below). Are aware that urrlib2
    resolves redirects for you? That might be why you're not seeing what
    you expect. If you want a record of each URL you'll have to implement
    an HTTPRedirectHandler.



    http://www.pheedcontent.com/click.phdo?i=d22e9bc7641aab8a0566526f61806512

    GET /click.phdo?i=d22e9bc7641aab8a0566526f61806512 HTTP/1.1
    Host: www.pheedcontent.com
    User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:
    1.9.0.7) Gecko/2009021906 Firefox/3.0.7
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
    Accept-Language: en-us,en;q=0.7,sv;q=0.3
    Accept-Encoding: gzip,deflate
    Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
    Keep-Alive: 300
    Connection: keep-alive

    HTTP/1.x 302 Found
    Date: Thu, 12 Mar 2009 20:41:29 GMT
    Server: Apache
    X-Powered-By: PHP/5.2.3-1ubuntu6.3
    Pragma: no-cache
    Cache-Control: no-cache, must-revalidate
    Set-Cookie: phdo=1-tst
    %7Cv3
    %3Ac3cbcae440ff783381d0d9fa96f14d05
    %3Aa8t5sELbkk9oy3pXsrohSnPslqQxQKIhVP%2F8Ots%3D; expires=Fri, 13-
    Mar-2009 20:41:29 GMT; path=/; domain=pheedo.com
    Location: http://www.macsimumnews.com/index.php/archive/klipsch_developing_headphones_for_new_ipod_shuffle/
    Content-Encoding: gzip
    Vary: Accept-Encoding
    Content-Length: 26
    Connection: close
    Content-Type: text/html
    ----------------------------------------------------------
    http://www.macsimumnews.com/index.php/archive/klipsch_developing_headphones_for_new_ipod_shuffle/


    etc. etc.
     
    Philip Semanchuk, Mar 12, 2009
    #2
    1. Advertisements

  3. On Thu, 2009-03-12 at 12:57 -0700, IanR wrote:
    > I'm processing RSS content from a # of given sources. Most of the
    > time the url given by the RSS feed redirects to the real URL (I'm
    > guessing they do this for tracking purposes)
    >
    > For example.
    >
    > This is a url that I get from and RSS feed,
    > http://www.pheedcontent.com/click.phdo?i=d22e9bc7641aab8a0566526f61806512
    > It redirects to
    > http://www.macsimumnews.com/index.php/archive/klipsch_developing_headphones_for_new_ipod_shuffle/
    >
    > I want to record the final URL and not the URL I get from the RSS feed
    > (However sometimes there is no redirect so I might want the original
    > URL)
    >
    > I've tried sniffing the header and don't see any "Location:"... I
    > think sites are using different ways to redirect. Does anyone have
    > any suggestions on how I might handle this?


    If you are using urllib[2]:

    >>> url =

    'http://www.pheedcontent.com/click.phdo?i=d22e9bc7641aab8a0566526f61806512'
    >>> o = urllib2.urlopen(url)
    >>> o.url

    'http://www.macsimumnews.com/index.php/archive/klipsch_developing_headphones_for_new_ipod_shuffle/'
     
    Albert Hopkins, Mar 12, 2009
    #3
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bradley M. Small

    Getting Original Values For Update

    Bradley M. Small, Jul 16, 2003, in forum: ASP .Net
    Replies:
    2
    Views:
    452
    Bradley M. Small
    Jul 17, 2003
  2. gertjan

    Getting ORIGINAL username

    gertjan, Jan 13, 2004, in forum: ASP .Net
    Replies:
    0
    Views:
    2,120
    gertjan
    Jan 13, 2004
  3. Anders Skar
    Replies:
    1
    Views:
    2,438
    Anders Skar
    Jul 22, 2004
  4. JFCM
    Replies:
    4
    Views:
    6,027
  5. Replies:
    5
    Views:
    746
    Chris Uppal
    Nov 17, 2006
  6. David Shorthouse

    OT?: framesets, redirects, and URL address

    David Shorthouse, Jan 30, 2005, in forum: ASP General
    Replies:
    9
    Views:
    344
    Dave Anderson
    Feb 1, 2005
  7. Patrick Gundlach
    Replies:
    4
    Views:
    144
    Patrick Gundlach
    Jan 26, 2007
  8. Daniel Otero
    Replies:
    1
    Views:
    153
    Michael J. I. Jackson
    Jun 25, 2009
Loading...