Obtaining Webpage Source with Python

Discussion in 'Python' started by Ryan Kaskel, Jun 24, 2004.

  1. Ryan Kaskel

    Ryan Kaskel Guest

    How can I obtain the source of a remote webpage (e.g.
    http://www.python.org/index.html) using Python?

    Something like:

    pyPage = open('http://www.python.org/index.html',r).read()

    Obviously that won't work but how can I do something to that effect?
    Thanks,
    Ryan Kaskel

    --I posed this before but it seems it is not showing up...
    Ryan Kaskel, Jun 24, 2004
    #1
    1. Advertising

  2. Ryan Kaskel

    Paul Rubin Guest

    (Ryan Kaskel) writes:
    > Something like:
    >
    > pyPage = open('http://www.python.org/index.html',r).read()
    >
    > Obviously that won't work but how can I do something to that effect?


    import urllib
    pyPage = urllib.urlopen('http://www.python.org/index.html',r).read()
    Paul Rubin, Jun 24, 2004
    #2
    1. Advertising

  3. Ryan Kaskel

    Paul Rubin Guest

    Paul Rubin <http://> writes:
    > import urllib
    > pyPage = urllib.urlopen('http://www.python.org/index.html',r).read()


    oops:

    import urllib
    pyPage = urllib.urlopen('http://www.python.org/index.html').read()

    i.e. omit the 'r' argument to urlib.urlopen.
    Paul Rubin, Jun 24, 2004
    #3

  4. > pyPage = open('http://www.python.org/index.html',r).read()


    using open() for local files and ORLs is called url-fopen and works in
    PHP, which is a major security hole, because it even allows one to
    include() code files from the web without knowing it, that kind of thing...

    python has two functions so you know what you're doing.

    If your webpage needs cookies or something, you'll need urllib2

    If you wanna parse it afterwards use Htmllib or BeautifulSoup
    =?iso-8859-15?Q?Pierre-Fr=E9d=E9ric_Caillaud?=, Jun 24, 2004
    #4
  5. Ryan Kaskel

    Phil Frost Guest

    Take a look at the urllib module:

    http://python.org/doc/2.3.3/lib/module-urllib.html

    On Wed, Jun 23, 2004 at 10:03:04PM -0700, Ryan Kaskel wrote:
    > How can I obtain the source of a remote webpage (e.g.
    > http://www.python.org/index.html) using Python?
    >
    > Something like:
    >
    > pyPage = open('http://www.python.org/index.html',r).read()
    >
    > Obviously that won't work but how can I do something to that effect?
    > Thanks,
    > Ryan Kaskel
    >
    > --I posed this before but it seems it is not showing up...
    Phil Frost, Jun 24, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    1
    Views:
    411
  2. Replies:
    2
    Views:
    275
    Manish Pandit
    Jan 18, 2007
  3. Paul
    Replies:
    14
    Views:
    834
    Alexey Smirnov
    Jun 19, 2008
  4. Replies:
    1
    Views:
    82
    Yukihiro Matsumoto
    Aug 7, 2006
  5. sifar
    Replies:
    5
    Views:
    395
Loading...

Share This Page