python/xpath issue..

Discussion in 'Python' started by bruce, Aug 25, 2008.

  1. bruce

    bruce Guest

    hey guys...

    got a weird, hopefully simple issue.

    the following sample bit of script is stripped down, and simply gets the
    "form" node from the specified site "schedule.psu.edu".

    the problem i run into is that the dom/xpath from the libxml2dom works, and
    i get the dom object everytime i run the app, but that the xpath is
    intermittent!!! in other words, i can run the script 10 times.. and it might
    work 7 or 8 times.. the other times, the xpath doesn't give the nodes
    back...

    when it works, name1_ in the app should be a list of nodes (for the 2 forms
    in the page). and len_ should be 2.

    is there anything you might suggest that i try in order to get a better
    handle on exactly what might be going on here...

    keep in mind, i'm not a python guy, just trying to get this to consistently
    work... my suspicion is that the culprit might be memory related...

    i'm running linux, on a x86 dual core with 4G ram. the python is 2.5.1.

    thoughts/comments/etc would be appreciated...

    -thanks!!!


    #!/usr/bin/python
    #
    # test.py
    #
    # scrapes/extracts the basic data for the college
    #
    #
    # the app gets/stores
    # name
    # url
    # address (street/city/state
    # phone
    #
    ######################################################################
    #test python script
    import re
    import libxml2dom
    import urllib
    import urllib2
    import sys, string
    from mechanize import Browser
    import mechanize
    #import tidy
    import os.path
    import cookielib
    from libxml2dom import Node
    from libxml2dom import NodeList
    import subprocess
    import time

    ########################
    #
    # Parse pricegrabber.com
    ########################
    ##cj = "p"
    ##COOKIEFILE = 'cookies.lwp'
    #cookielib = 1


    urlopen = urllib2.urlopen
    #cj = urllib2.cookielib.LWPCookieJar()
    ##cj = cookielib.LWPCookieJar()
    Request = urllib2.Request
    br = Browser()
    br2 = Browser()

    ##if cj != None:
    ## print "sss"
    ###install the CookieJar for the default CookieProcessor
    ## if os.path.isfile(COOKIEFILE):
    ## cj.load(COOKIEFILE)
    ## print "foo\n"
    ## if cookielib:
    ## opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
    ## urllib2.install_opener(opener)
    ## print "foo2\n"

    user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
    values1 = {'name' : 'Michael Foord',
    'location' : 'Northampton',
    'language' : 'Python' }
    headers = { 'User-Agent' : user_agent }

    url="http://schedule.psu.edu/"
    #=======================================


    if __name__ == "__main__":
    # main app

    txdata = None

    #----------------------------

    ##br.set_cookiejar(cj)
    br.set_handle_redirect(True)
    br.set_handle_referer(True)
    br.set_handle_robots(False)
    br.addheaders = [('User-Agent', 'Firefox')]

    print "url =",url
    br.open(url)
    ##cj.save(COOKIEFILE) # resave cookies

    res = br.response() # this is a copy of response
    s = res.read()
    print "slen=",len(s)

    # s contains HTML not XML text
    d = libxml2dom.parseString(s, html=1)
    print "d",d

    name_=[]
    len_=0
    name_ = d.xpath("//form")
    #name_ = d.xpath("/html/body/form")
    print "name1",name_
    len_ = len(name_)
    print "len",len(name_)
    #print "sdlfs"
    sys.exit()
    # else:
    # print "err in form_ID"


    print "here..."
     
    bruce, Aug 25, 2008
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Marvin_123456

    "Memory leak" in javax.xml.xpath.XPath

    Marvin_123456, Jul 29, 2005, in forum: Java
    Replies:
    4
    Views:
    2,028
    jan V
    Jul 29, 2005
  2. Alastair Cameron
    Replies:
    1
    Views:
    7,524
    SQL Server Development Team [MSFT]
    Jul 8, 2003
  3. Anna
    Replies:
    0
    Views:
    572
  4. goog
    Replies:
    0
    Views:
    536
  5. Tjerk Wolterink

    XPath: efficiency in xpath expressions

    Tjerk Wolterink, Nov 13, 2004, in forum: XML
    Replies:
    1
    Views:
    1,704
    Richard Tobin
    Nov 13, 2004
Loading...

Share This Page