Simple Question : files and URLLIB

Discussion in 'Python' started by Richard Shea, Oct 14, 2003.

  1. Richard Shea

    Richard Shea Guest

    Hi - I'm new to Python. I've been trying to use URLLIB and the 'tidy'
    function (part of the mx.tidy package). There's one thing I'm having
    real difficulties understanding. When I did this ...

    finA= urllib.urlopen('http://www.python.org/')
    foutA=open('C:\\testout.html','w')
    tidy(finA,foutA,None)

    I get ...

    Traceback (most recent call last):
    File "<interactive input>", line 1, in ?
    File "mx\Tidy\Tidy.py", line 38, in tidy
    return mxTidy.tidy(input, output, errors, kws)
    TypeError: inputstream must be a file object or string

    .... what I don't understand is surely the result of a urllib is a file
    object ? Isn't it ? To quote the manual at :

    http://www.python.org/doc/current/lib/module-urllib.html

    "If all went well, a file-like object is returned". I can make the
    tidy function happy but changing the code to read ...

    finA= urllib.urlopen('http://www.python.org/').read()

    .... I haven't had time to look into this properly yet but I suspect
    finA is now a string not a file handle ?

    Anyway if anyone can throw light on this I would be grateful.

    thanks

    richard.shea.
     
    Richard Shea, Oct 14, 2003
    #1
    1. Advertising

  2. Richard Shea

    bromden Guest

    > "If all went well, a file-like object is returned". I can make the

    file-like means having similar interface to a file object (methods read,
    readline, etc.), but not a real file though,

    mxTidy.tidy most probably requires a real file to be passed,
    just you look into Tidy.py (line 38) and you'll know for sure

    --
    bromden[at]gazeta.pl
     
    bromden, Oct 14, 2003
    #2
    1. Advertising

  3. Richard Shea

    Mark Carter Guest

    > finA= urllib.urlopen('http://www.python.org/').read()
    >
    > ... I haven't had time to look into this properly yet but I suspect
    > finA is now a string not a file handle ?


    Correct. If you do:
    print type(finA)
    you obtain the result:
    <type 'str'>

    If you do:
    finA= urllib.urlopen('http://www.python.org/')
    print type(finA)
    then you obtain the result:
    <type 'instance'>

    Compare this with:
    finA = open("blah", "w")
    print type(finA)
    which gives the result:
    <type 'file'>

    According to the docs on urlopen( url[, data[, proxies]]) :
    "If all went well, a file-like object is returned."
    So the answer would appear to be: "close, but no cigar".
     
    Mark Carter, Oct 14, 2003
    #3
  4. Richard Shea

    Terry Reedy Guest

    "Richard Shea" <> wrote in message
    news:...
    > Hi - I'm new to Python. I've been trying to use URLLIB and the

    'tidy'
    > function (part of the mx.tidy package). There's one thing I'm having
    > real difficulties understanding. When I did this ...
    >
    > finA= urllib.urlopen('http://www.python.org/')
    > foutA=open('C:\\testout.html','w')
    > tidy(finA,foutA,None)
    >
    > I get ...
    >
    > Traceback (most recent call last):
    > File "<interactive input>", line 1, in ?
    > File "mx\Tidy\Tidy.py", line 38, in tidy
    > return mxTidy.tidy(input, output, errors, kws)
    > TypeError: inputstream must be a file object or string
    >
    > ... what I don't understand is surely the result of a urllib is a

    file
    > object ? Isn't it ? To quote the manual at :
    >
    > http://www.python.org/doc/current/lib/module-urllib.html
    >
    > "If all went well, a file-like object is returned".


    'file-like object' is different from 'file object' From urllib.py doc
    string:
    "The object returned by URLopener().open(file) will differ per
    protocol. All you know is that is has methods read(), readline(),
    readlines(), fileno(), close() and info()."

    Why this is not good enough for mx.tidy is a question for it's author.

    > I can make the tidy function happy by changing the code to read ...
    >
    > finA= urllib.urlopen('http://www.python.org/').read()
    >
    > ... I haven't had time to look into this properly yet but I suspect
    > finA is now a string not a file handle ?


    Yes. So it meets the 'file or string' requirement.

    Terry J. Reedy
     
    Terry Reedy, Oct 14, 2003
    #4
  5. Richard Shea

    Richard Shea Guest

    Thanks to everyone for the info/feedback. In particular I didn't know
    you could that ...

    type(finA)

    .... business (which shows you how new to Python I am probably) but
    it'll come in handy.

    As I think you realised I had misunderstood exactly what urllib was
    offering however the blah.read() approach is quite good enough. Just
    out of curiousity though if 'tidy' demanded a file (rather than being
    prepared to take a string as it is)would the only sure approach be to
    ....

    f1=open('C:\\workfile.html','w')
    strHTML= urllib.urlopen('http://www.python.org/').read()
    f1.write(strHTML)
    tidy(f1,strOut,None)

    .... that is to take the string that results from the read on urllib
    file-like object and write it back out to a file ?

    Just wondering ...

    Thanks again for the information on my original question.

    regards

    richard.





    (Richard Shea) wrote in message news:<>...
    > Hi - I'm new to Python. I've been trying to use URLLIB and the 'tidy'
    > function (part of the mx.tidy package). There's one thing I'm having
    > real difficulties understanding. When I did this ...
    >
    > finA= urllib.urlopen('http://www.python.org/')
    > foutA=open('C:\\testout.html','w')
    > tidy(finA,foutA,None)
    >
    > I get ...
    >
    > Traceback (most recent call last):
    > File "<interactive input>", line 1, in ?
    > File "mx\Tidy\Tidy.py", line 38, in tidy
    > return mxTidy.tidy(input, output, errors, kws)
    > TypeError: inputstream must be a file object or string
    >
    > ... what I don't understand is surely the result of a urllib is a file
    > object ? Isn't it ? To quote the manual at :
    >
    > http://www.python.org/doc/current/lib/module-urllib.html
    >
    > "If all went well, a file-like object is returned". I can make the
    > tidy function happy but changing the code to read ...
    >
    > finA= urllib.urlopen('http://www.python.org/').read()
    >
    > ... I haven't had time to look into this properly yet but I suspect
    > finA is now a string not a file handle ?
    >
    > Anyway if anyone can throw light on this I would be grateful.
    >
    > thanks
    >
    > richard.shea.
     
    Richard Shea, Oct 15, 2003
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Oyvind Ostlund

    Downloading files using URLLib

    Oyvind Ostlund, Jun 27, 2005, in forum: Python
    Replies:
    1
    Views:
    315
    Fuzzyman
    Jun 27, 2005
  2. A. Murat Eren

    Re: Downloading files using URLLib

    A. Murat Eren, Jun 27, 2005, in forum: Python
    Replies:
    0
    Views:
    359
    A. Murat Eren
    Jun 27, 2005
  3. Replies:
    6
    Views:
    669
  4. Jonathan Gardner

    Asynchronous urllib (urllib+asyncore)?

    Jonathan Gardner, Feb 26, 2008, in forum: Python
    Replies:
    1
    Views:
    473
    Terry Jones
    Feb 27, 2008
  5. Chris McDonald
    Replies:
    0
    Views:
    292
    Chris McDonald
    Nov 1, 2010
Loading...

Share This Page