Re: difference between urllib2.urlopen and firefox view 'page source'?

Discussion in 'Python' started by wesley chun, Mar 21, 2007.

  1. wesley chun

    wesley chun Guest

    On Mar 20, 8:33 am, wrote:
    > On Mar 20, 1:56 am, Tina I <> wrote:
    > > > I am trying to screen scrape some stock data from yahoo, so I am
    > > > trying to use urllib2 to retrieve the html and beautiful soup for the
    > > > parsing.

    >
    > You can do this fairly easily. I found a similar program in the book Core
    > PythonProgramming. It actually sticks the stocks into an Excel
    > spreadsheet.



    i'd like to add that the solution that mike proposes from the book is
    an *alternative* to what the OP wanted, which was to parse the actual
    stock quote web page. instead of doing that, the code snippet
    actually uses Yahoo!'s CSV interface which you can read more about
    from their help pages:

    http://search.cc.yahoo.com/search?property=finance&question_box=csv

    if the data is all that's important to you, then this is a good proxy
    for what you proposed, and will be simpler to implement. however, if
    you're looking for a screen-scraping and HTML-parsing exercise, i'd
    stick with your original idea and use the generic output that you get.
    as a previous poster has already mentioned, it's probably the
    "cleanest" output, filtering out some of the extra browser-specific JS
    and stuff that you don't need.

    cheers,
    -wesley

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    "Core Python Programming", Prentice Hall, (c)2007,2001
    http://corepython.com

    wesley.j.chun :: wescpy-at-gmail.com
    python training and technical consulting
    cyberweb.consulting : silicon valley, ca
    http://cyberwebconsulting.com
     
    wesley chun, Mar 21, 2007
    #1
    1. Advertising

  2. wesley chun

    cjl Guest

    Group:

    Thank you for all the informative replies, they have helped me figure
    things out. Next up is learning beautiful soup.

    Thank you for the code example, but I am trying to learn how to
    'screen scrape', because Yahoo does make historical stock data
    available using the CSV format, but they do not do this for stock
    options, which is what I am ultimately attempting to scrap.

    Here is what I have so far, I know how broken and ugly it is:

    import urllib2, sys
    from BeautifulSoup import BeautifulSoup

    page = urllib2.urlopen("http://finance.yahoo.com/q/op?s=" +
    sys.argv[1])
    soup = BeautifulSoup(page)
    print soup.find("table",{"id" :"yfncsubtit"}).big.b.contents[0]

    This actually works, and will print out the current stock price for
    whatever ticker symbol you supply as the command line argument when
    you launch this script. Later I will add error checking, etc.

    Any advice on how I am using beautiful soup in the above code?

    thanks again,
    cjl
     
    cjl, Mar 21, 2007
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Xu, C.S.
    Replies:
    5
    Views:
    481
    John J. Lee
    Sep 17, 2003
  2. Chris
    Replies:
    0
    Views:
    1,054
    Chris
    Jul 10, 2005
  3. cjl
    Replies:
    5
    Views:
    512
    John Nagle
    Mar 20, 2007
  4. Massi
    Replies:
    8
    Views:
    698
    Piet van Oostrum
    Aug 7, 2009
  5. koranthala

    Urllib2 urlopen and read - difference

    koranthala, Apr 15, 2010, in forum: Python
    Replies:
    3
    Views:
    2,991
Loading...

Share This Page