Re: How to Convert IO Stream to XML Document

Discussion in 'Python' started by naugiedoggie, Sep 11, 2010.

  1. naugiedoggie

    naugiedoggie Guest

    On Sep 10, 12:20 pm, jakecjacobson <> wrote:
    > I am trying to build a Python script that reads a Sitemap file and
    > push the URLs to a Google Search Appliance.  I am able to fetch the
    > XML document and parse it with regular expressions but I want to move
    > to using native XML tools to do this.  The problem I am getting is if
    > I use urllib.urlopen(url) I can convert the IO Stream to a XML
    > document but if I use urllib2.urlopen and then read the response, I
    > get the content but when I use minidom.parse() I get a "IOError:
    > [Errno 2] No such file or directory:" error


    Hello,

    This may not be helpful, but I note that you are doing two different
    things with your requests, and judging from the documentation, the
    objects returned by urllib and urllib2 openers do not appear to be the
    same. I don't know why you are calling urllib.urlopen(url) and
    urllib2.urlopen(request), but I can tell you that I have used urllib2
    opener to retrieve a web services document in XML and then parse it
    with minidom.parse().


    >
    > THIS WORKS but will have issues if the IO Stream is a compressed file
    > def GetPageGuts(net, url):
    >         pageguts = urllib.urlopen(url)
    >         xmldoc = minidom.parse(pageguts)
    >         return xmldoc
    >
    > # THIS DOESN'T WORK, but I don't understand why
    > def GetPageGuts(net, url):
    >         request=getRequest_obj(net, url)
    >         response = urllib2.urlopen(request)
    >         response.headers.items()
    >         pageguts = response.read()


    Did you note the documentation says:

    "One caveat: the read() method, if the size argument is omitted or
    negative, may not read until the end of the data stream; there is no
    good way to determine that the entire stream from a socket has been
    read in the general case."

    No EOF marker might be the cause of the parsing problem.

    Thanks.

    mp
     
    naugiedoggie, Sep 11, 2010
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ben Turner
    Replies:
    2
    Views:
    9,252
    Ken Cox [Microsoft MVP]
    Jan 2, 2004
  2. Tony Prichard
    Replies:
    0
    Views:
    768
    Tony Prichard
    Dec 12, 2003
  3. Manish Hatwalne
    Replies:
    1
    Views:
    411
    Martin Honnen
    Jul 13, 2004
  4. Replies:
    2
    Views:
    445
    TextDoctor
    May 7, 2005
  5. Replies:
    1
    Views:
    389
Loading...

Share This Page