Why does this fail?

Discussion in 'Python' started by Dave Murray, Jan 5, 2004.

  1. Dave Murray

    Dave Murray Guest

    New to Python question, why does this fail?

    Thanks,
    Dave

    ---testcase.py---
    import sys, urllib, htmllib
    def Checkit(URL):
    try:
    print "Opening", URL
    f = urllib.open(URL)
    f.close()
    return 1
    except:
    return 0

    rtfp = Checkit("http://www.python.org/doc/Summary.html")
    if rtfp == 1:
    print "OK"
    else:
    print "Fail"


    python testcase.py
    Dave Murray, Jan 5, 2004
    #1
    1. Advertising

  2. On Sun, 2004-01-04 at 18:58, Dave Murray wrote:
    > New to Python question, why does this fail?


    [...]
    > try:

    [...]
    > except:

    [...]

    Because you're treating all errors as if they're what you expect. You
    should be more specific in your except clause. Do this and you'll see
    what I mean:

    try:
    whatever
    except:
    raise # raise whatever exception occurred
    return 0

    In other words, you should be explicit about the errors you silence.

    Also, it's not clear what Checkit() is actually supposed to do. Is it
    supposed to verify the URL actually exists? urllib doesn't raise an
    error for 404 not found--urllib2 does. Try that instead.

    Cheers,

    // m
    Mark McEahern, Jan 5, 2004
    #2
    1. Advertising

  3. Dave Murray

    Isaac To Guest

    >>>>> "Dave" == Dave Murray <> writes:

    Dave> New to Python question, why does this fail? Thanks, Dave

    Dave> f = urllib.open(URL)

    urllib does not have an open function. Instead, it has a constructor called
    URLopener, which creates an object with such a method. So instead, you have
    to say

    opener = urllib.URLopener()
    f = opener(URL)

    Regards,
    Isaac.
    Isaac To, Jan 5, 2004
    #3
  4. Dave Murray

    Dave Murray Guest

    Thank you all, this is a hell of a news group. The diversity of answers
    helped me with some unasked questions, and provided more elegant solutions
    to what I thought that I had figured out on my own. I appreciate it.

    It's part of a spider that I'm working on to verify my own (and friends) web
    page and check for broken links. Looks like making it follow robot rules
    (robots.txt and meta field exclusions) is what's left.

    I have found the library for html/sgml to be not very robust. Big .php and
    ..html with lot's of cascades and external references break it very
    ungracefully (sgmllib.SGMLParseError: expected name token). I'd like to be
    able to trap that stuff and just move on to the next file, accepting the
    error. I'm reading in the external links and printing the title as a sanity
    check in addition to collecting href anchors. This problem that I asked
    about reared it's head when I started testing for a robots.txt file, which
    may or may not exist.

    The real point is to learn the language. When a new grad wrote a useful
    utility at work in Python faster than I could have written it in C I decided
    that I needed to learn Python. He's very sharp but he sold me on the
    language too. Since I often must write utilities, Python seems to be a very
    good thing since I normally don't have much time to kill on them.

    Dave
    Dave Murray, Jan 5, 2004
    #4
  5. On Sun, 2004-01-04 at 20:58, Dave Murray wrote:
    [...]
    > I have found the library for html/sgml to be not very robust. Big .php and
    > .html with lot's of cascades and external references break it very
    > ungracefully (sgmllib.SGMLParseError: expected name token).


    I'd suggest using htmllib.

    // m
    Mark McEahern, Jan 5, 2004
    #5
  6. Dave Murray

    Anand Pillai Guest

    I could not help replying to this thread...

    There are already quite a lot of spider programs existing
    in Python. I am the author of one of the first programs of
    the kind, called HarvestMan. It is multithreaded and has
    many features for downloading websites, checking links etc.
    You can get it from the HarvestMan homepage at
    http://harvestman.freezope.org. HarvestMan is quite
    comprehensive and is a bit more than a link checker or
    web crawler. My feeling is that it is not easy to understand
    for a Python beginner though the program is distributed
    as source code in true Python tradition.

    If you want something simpler, try spider.py. You can get
    information on it from the PyPi pages.

    My point was that, there is nothing to gain from re-inventing
    the wheel again and again. Spider programs have been written in
    Python, so you should try to use them rather than writing code
    from scratch. If you think that you are having new ideas, then
    take the code of HarvestMan(or spider) and customize it or
    improve on it. I will be happy to merge the changes back in the
    code if I think they improve the program, if it is for HarvestMan.

    This is the main reason why developers release programs as
    opensource. Help the community, and help yourselves. Re-inventing
    the wheel is perhaps not the way to go.

    best regards

    -Anand


    "Dave Murray" <> wrote in message news:<>...
    > Thank you all, this is a hell of a news group. The diversity of answers
    > helped me with some unasked questions, and provided more elegant solutions
    > to what I thought that I had figured out on my own. I appreciate it.
    >
    > It's part of a spider that I'm working on to verify my own (and friends) web
    > page and check for broken links. Looks like making it follow robot rules
    > (robots.txt and meta field exclusions) is what's left.
    >
    > I have found the library for html/sgml to be not very robust. Big .php and
    > .html with lot's of cascades and external references break it very
    > ungracefully (sgmllib.SGMLParseError: expected name token). I'd like to be
    > able to trap that stuff and just move on to the next file, accepting the
    > error. I'm reading in the external links and printing the title as a sanity
    > check in addition to collecting href anchors. This problem that I asked
    > about reared it's head when I started testing for a robots.txt file, which
    > may or may not exist.
    >
    > The real point is to learn the language. When a new grad wrote a useful
    > utility at work in Python faster than I could have written it in C I decided
    > that I needed to learn Python. He's very sharp but he sold me on the
    > language too. Since I often must write utilities, Python seems to be a very
    > good thing since I normally don't have much time to kill on them.
    >
    > Dave
    Anand Pillai, Jan 5, 2004
    #6
  7. Dave Murray

    Dave Murray Guest

    Thank you for the information. I will check them out after I finish my
    effort. My purpose isn't to obtain a spider program, it is to learn Python
    by doing. If the exercise will result in something that I can use, it gives
    me incentive to not abandon the effort because the exercise is interesting
    to me. The sources that you pointed out should be rich in information on how
    I could have done it better if I had been more experienced in Python
    (knowledgeable about it's libraries, etc.)

    Whenever I learn something new I like to work at it, get help if I'm stuck
    on something silly (why waste time?), assess what I did against a higher
    standard, repeat. It's just the way that I learn. I can see that this forum
    will be just what I need for a chunk of that process. I appreciate it.

    Regards,
    Dave

    ----- Original Message -----
    From: "Anand Pillai" <>


    > I could not help replying to this thread...
    >
    > There are already quite a lot of spider programs existing
    > in Python. --
    > This is the main reason why developers release programs as
    > opensource. Help the community, and help yourselves. Re-inventing
    > the wheel is perhaps not the way to go.
    Dave Murray, Jan 5, 2004
    #7
  8. Dave Murray

    Dave Murray Guest

    intellectual property agreements and open source . was - Re: Why does this fail? [2]

    After re-reading this part, I can see that it is an idea that I like. How
    does participating in open source work for someone (me) who has signed the
    customary intellectual property agreement with the corporation that they
    work for? Since programming is part of my job, developing test solutions
    implemented on automatic test equipment (the hardware too) I don't know if I
    would/could be poison to an open source project. How does that work? I've
    never participated. If all the work is done on someone's own time, not using
    company resources, yadda-yadda-hadda-hadda, do corporate lawwwyaahhhs have a
    history of trying to dispute that and stake a claim? No doubt, many of you
    are in the same position.

    Regards,
    Dave


    "Anand Pillai" <> wrote in message
    news:...
    > This is the main reason why developers release programs as
    > opensource. Help the community, and help yourselves. Re-inventing
    > the wheel is perhaps not the way to go.
    Dave Murray, Jan 5, 2004
    #8
  9. Re: intellectual property agreements and open source . was - Re: Whydoes this fail? [2]

    Dave> How does participating in open source work for someone (me) who
    Dave> has signed the customary intellectual property agreement with the
    Dave> corporation that they work for? Since programming is part of my
    Dave> job, developing test solutions implemented on automatic test
    Dave> equipment (the hardware too) I don't know if I would/could be
    Dave> poison to an open source project. How does that work?

    Only your corporate counsel knows for sure. <wink> Seriously, the degree to
    which you are allowed to release code to an open source project and the
    manner in which is released is probably a matter best taken up with your
    company's legal department. Some companies are fairly enlightened. Some
    are not. You may need very little review to release bug fixes or test cases
    (my guess is you might be pretty good at writing test cases ;-), more review
    to release a new module or package, and considerable participation by
    management and the legal eagles if you want to release a sophisticated
    application into the wild.

    In any case, if you make large contributions to an open source project such
    as Python, I'm pretty sure a release form for substantial amounts of code
    will be required at the Python end of things. See here

    http://www.python.org/psf/psf-contributor-agreement.html

    for more details. Note that it hasn't been updated in a couple years. I
    don't know if MAL has something which is more up-to-date.

    Skip
    Skip Montanaro, Jan 5, 2004
    #9
  10. Re: intellectual property agreements and open source . was - Re: Why does this fail? [2]

    |Thus Spake Dave Murray On the now historical date of Sun, 04 Jan 2004
    23:54:57 -0700|

    > After re-reading this part, I can see that it is an idea that I like.
    > How does participating in open source work for someone (me) who has
    > signed the customary intellectual property agreement with the
    > corporation that they work for? Since programming is part of my job,
    > developing test solutions implemented on automatic test equipment (the
    > hardware too) I don't know if I would/could be poison to an open source
    > project. How does that work? I've never participated. If all the work is
    > done on someone's own time, not using company resources,
    > yadda-yadda-hadda-hadda, do corporate lawwwyaahhhs have a history of
    > trying to dispute that and stake a claim? No doubt, many of you are in
    > the same position.


    IANAL (I Am Not A Lawyer)

    As suggested elsewhere, consult your legal counsel. Dig up that NDA. Go
    to the corporate lawwwyaahhhs and ask them to provide you with a clear
    delineation in writing. Have your legal counsel look over that document
    to make sure it says what you think it says. Be prepared to explain the
    difference between general purpose tools and special purpose tools
    directly related to the job. Specifically, be prepared to explain how
    contributing to general purpose tools can allow you to more quickly (and
    inexpensively, time is money yadda-yadda) develop the special purpose
    tools. By contributing to, say, a web spider when your business involves
    stress-testing web servers would allow you to leverage the knowledge and
    work of others towards the companies goals. As I understand it, no
    open-source license has yet been tested in court, so your guess is as good
    as anyone's about how much risk is involved. That's why everyone is
    waiting with baited breath over the SCO vs IBM fiasco. It may be that
    first legal test. In fact, go to www.groklaw.net and read up on the SCO
    vs IBM suit. That's as good of a starting place as any.

    Oh, and be sure to take a look at the specific license involved in a
    project you contribute to. Some licenses, like BSD, have little to no
    restrictions on how an individual or company uses the code. Most, such as
    GPL require that you simply distribute the source and any changes you've
    made if and only if you distribute the product or any products including
    code from the project to a third party (in the case of companies, that
    means outside the companies.) YMMV and again, IANAL

    HTH

    Sam Walters

    --
    Never forget the halloween documents.
    http://www.opensource.org/halloween/
    """ Where will Microsoft try to drag you today?
    Do you really want to go there?"""
    Samuel Walters, Jan 5, 2004
    #10
  11. Dave Murray

    Peter Hansen Guest

    Re: intellectual property agreements and open source . was - Re: Whydoes this fail? [2]

    Dave Murray wrote:
    >
    > After re-reading this part, I can see that it is an idea that I like. How
    > does participating in open source work for someone (me) who has signed the
    > customary intellectual property agreement with the corporation that they
    > work for? Since programming is part of my job, developing test solutions
    > implemented on automatic test equipment (the hardware too) I don't know if I
    > would/could be poison to an open source project. How does that work? I've
    > never participated. If all the work is done on someone's own time, not using
    > company resources, yadda-yadda-hadda-hadda, do corporate lawwwyaahhhs have a
    > history of trying to dispute that and stake a claim? No doubt, many of you
    > are in the same position.


    My own agreement, which is not quite as archaic in restricting me as some I've
    seen, boils down to saying that if I work on something that is either (done
    on company time or with company resources) OR (relates to the current or
    likely future business of the company) then I'm agreeing that the company
    in effect gets an exclusive right to whatever it is.

    If, on the other hand, it's on my own time AND does not involve what the
    company's business is (in contrast to, say, simply relating to tools that
    they might use within the business), then they don't get any right to it.
    We use various test tools at work, but just because I work on a similar
    open source test tool doesn't mean the company has any exclusive right to it.
    We sell RF stuff, not test tools, so test tools are not the company's
    business, nor are they likely ever to be...

    I believe many or most agreements these days boil down to the same thing,
    but of course your own might not so reading it would be a good idea.

    Generally there is lots of boilerplate legalese but it surrounds one or
    two key paragraphs of fairly simple English with the essence described above,
    and it's not as hard to dig the key ideas out as it might seem at first glance.

    -Peter
    Peter Hansen, Jan 5, 2004
    #11
  12. Dave Murray

    John J. Lee Guest

    "Dave Murray" <> writes:

    > New to Python question, why does this fail?

    [...]
    > def Checkit(URL):

    [...]

    (already answered six times, so I won't bother...)

    You might want to have a look at the unittest module.

    Also (advert ;-), if you're doing any kind of web scraping in Python
    (including functional testing), you might want to look at this little
    FAQ (though it certainly doesn't nearly cover everything relevant):

    http://wwwsearch.sf.net/bits/clientx.html

    BTW, in response to another question in this thread (IIRC), and
    entirely contrary to my previous assertion here <wink>, it appears
    that HTMLParser.HTMLParser is a bit more finicky with HTML than is
    sgmllib/htmllib (htmllib is a thin wrapper over sgmllib). I hope to
    investigate and fix that -- HTMLParser.HTMLParser knows about XHTML,
    so in that respect is a better choice than sgmllib/htmllib. If you
    want to process junk HTML, though (or perhaps even valid HTML that the
    library you're using doesn't like), look at mxTidy or uTidylib. I
    should link to those on my FAQ page...


    John
    John J. Lee, Jan 6, 2004
    #12
  13. John> Also (advert ;-), if you're doing any kind of web scraping in
    John> Python (including functional testing), you might want to look at
    John> this little FAQ (though it certainly doesn't nearly cover
    John> everything relevant):

    John> http://wwwsearch.sf.net/bits/clientx.html

    A possible addition to your "Embedded script is messing up my web-scraping"
    section: Wasn't there mention of the Mozilla project's standalone
    JavaScript interpreter (don't remember what it's called) recently alongside
    some Python interface?

    Skip
    Skip Montanaro, Jan 6, 2004
    #13
  14. Dave Murray

    John J Lee Guest

    On Tue, 6 Jan 2004, Skip Montanaro wrote:
    [...]
    > John> http://wwwsearch.sf.net/bits/clientx.html
    >
    > A possible addition to your "Embedded script is messing up my web-scraping"
    > section: Wasn't there mention of the Mozilla project's standalone
    > JavaScript interpreter (don't remember what it's called) recently alongside
    > some Python interface?


    (Just finished updating that page a few seconds ago, BTW.)

    I don't remember that, other than PyXPCOM (linked to from that page -- at
    least, it is now ;-) and my own

    http://wwwsearch.sf.net/python-spidermonkey
    http://wwwsearch.sf.net/DOMForm

    Be warned, all my JavaScript-support code is very alpha (DOMForm itself
    shouldn't be anywhere near so bad, but still very early alpha).


    John
    John J Lee, Jan 6, 2004
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Fred
    Replies:
    0
    Views:
    606
  2. Jim West
    Replies:
    5
    Views:
    573
    Jim West
    Oct 14, 2003
  3. Wenjie

    if (f() != FAIL) or if (FAIL != f())?

    Wenjie, Jul 28, 2003, in forum: C Programming
    Replies:
    3
    Views:
    441
    E. Robert Tisdale
    Jul 31, 2003
  4. Mr. SweatyFinger

    why why why why why

    Mr. SweatyFinger, Nov 28, 2006, in forum: ASP .Net
    Replies:
    4
    Views:
    880
    Mark Rae
    Dec 21, 2006
  5. Mr. SweatyFinger
    Replies:
    2
    Views:
    1,844
    Smokey Grindel
    Dec 2, 2006
Loading...

Share This Page