read() returns data of different sizes

Discussion in 'Python' started by jimgardener, Oct 2, 2010.

  1. jimgardener

    jimgardener Guest

    hi
    while trying out urllib.urlopen ,I wrote this code to read a url and
    return the data length

    import datetime,time,urllib

    def get_page_size(pageurlstr):
    h=urllib.urlopen(pageurlstr)
    data=h.read()
    return len(data)

    while True:
    print 'reading url www.google.com
    at',datetime.datetime.now().isoformat(' ')
    print 'size=%d'%get_page_size('http://www.google.com')
    time.sleep(5)


    I got this output

    reading url www.google.com at 2010-10-02 17:22:24.691654
    size=9512
    reading url www.google.com at 2010-10-02 17:22:30.681236
    size=9530
    reading url www.google.com at 2010-10-02 17:22:36.886369
    size=9530
    reading url www.google.com at 2010-10-02 17:22:42.315392
    size=9512
    reading url www.google.com at 2010-10-02 17:22:48.763693
    size=9512
    reading url www.google.com at 2010-10-02 17:22:54.711666
    size=9548
    reading url www.google.com at 2010-10-02 17:23:00.151843
    size=9530
    reading url www.google.com at 2010-10-02 17:23:05.844620
    size=9548


    Why is it that the sizes are different?what must I do to ensure that
    the whole page is read ?
    thanks
    jim
     
    jimgardener, Oct 2, 2010
    #1
    1. Advertisements

  2. jimgardener

    Chris Rebert Guest

    Because Google does not always send back the *exact* same HTML every
    time you request their homepage (note how small the variance is). You
    can easily verify this using the "Save Page" function of your browser
    and diff-ing the HTML for 2 different loads. What is varying is
    possibly some sort of tracking ID.
    Nothing. Using .read() already ensures it.

    Cheers,
    Chris
     
    Chris Rebert, Oct 2, 2010
    #2
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.