sgmllib.py

Discussion in 'Python' started by elsa, Aug 24, 2009.

  1. elsa

    elsa Guest

    Hi all,

    I'm new to both this forum and Python, and I've got a bit stuck trying
    to learn how to parse HTML.... here is my problem

    I'm using a textbook that uses sgmllib.py for all its examples. I'm
    aware that sgmllib is not in the current release, however I want to
    get it to work, as I have python 2.5, and the text book uses it.

    So, the first example says to type something like (to test the
    sgmllib):

    python sgmllib.py "path/to/my/file.html" .... example (1)

    this doesn't work for me. I think I have figured out the problem -
    the error says

    "/System/Library/Frameworks/Python.framework/Versions/2.5/Resources/
    Python.app/Contents/MacOS/Python: can't open file 'sgmllib.py': [Errno
    2] No such file or directory"

    the problem is that this path is wrong. My sgmllib.py is in:

    "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/
    python2.5/sgmllib.py"

    if I substitute this path for sgmllib.py in example (1), everything
    works fine. However, I don't want to do all that typing everytime I
    want to use sgmllib.py. So, I thought maybe the problem was with
    PYTHONPATH. I executed the following command:

    export PYTHONPATH=/System/Library/Frameworks/Python.framework/Versions/
    2.5/li/python2.5:$PYTHONPATH

    this seemed to work - no errors raised. However, when I retyped
    example (1), I got the same original error.

    Any assistance would be much appreciated. I'm working on max os x
    leopard.

    Thanks,

    elsa
     
    elsa, Aug 24, 2009
    #1
    1. Advertising

  2. elsa wrote:
    > I'm new to both this forum and Python, and I've got a bit stuck trying
    > to learn how to parse HTML...


    If what you want to do is *parse* the HTML instead of trying to *learn* how
    to parse it, you might want to give the existing (external) HTML parser
    libraries a try. There's lxml.html (extremely fast and fixes up broken
    HTML), html5lib (very slow, but very browser-like parse results) and
    BeautifulSoup (slow, but good encoding detection if you need that).

    Here are a couple of (only slightly biased) comparisons:

    http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/
    http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/


    > python sgmllib.py "path/to/my/file.html" .... example (1)
    >
    > this doesn't work for me. I think I have figured out the problem -
    > the error says
    >
    > "/System/Library/Frameworks/Python.framework/Versions/2.5/Resources/
    > Python.app/Contents/MacOS/Python: can't open file 'sgmllib.py': [Errno
    > 2] No such file or directory"
    >
    > the problem is that this path is wrong. My sgmllib.py is in:
    >
    > "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/
    > python2.5/sgmllib.py"


    You can use "python -m sgmllib" to call a module from the stdlib (or the
    PYTHONPATH, to be more accurate).

    But note that sgmllib is a particularly cumbersome way to deal with HTML.

    Stefan
     
    Stefan Behnel, Aug 24, 2009
    #2
    1. Advertising

  3. elsa

    Dave Angel Guest

    elsa wrote:
    > Hi all,
    >
    > I'm new to both this forum and Python, and I've got a bit stuck trying
    > to learn how to parse HTML.... here is my problem
    >
    > I'm using a textbook that uses sgmllib.py for all its examples. I'm
    > aware that sgmllib is not in the current release, however I want to
    > get it to work, as I have python 2.5, and the text book uses it.
    >
    > So, the first example says to type something like (to test the
    > sgmllib):
    >
    > python sgmllib.py "path/to/my/file.html" .... example (1)
    >
    > this doesn't work for me. I think I have figured out the problem -
    > the error says
    >
    > "/System/Library/Frameworks/Python.framework/Versions/2.5/Resources/
    > Python.app/Contents/MacOS/Python: can't open file 'sgmllib.py': [Errno
    > 2] No such file or directory"
    >
    > the problem is that this path is wrong. My sgmllib.py is in:
    >
    > "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/
    > python2.5/sgmllib.py"
    >
    > if I substitute this path for sgmllib.py in example (1), everything
    > works fine. However, I don't want to do all that typing everytime I
    > want to use sgmllib.py. So, I thought maybe the problem was with
    > PYTHONPATH. I executed the following command:
    >
    > export PYTHONPATH=/System/Library/Frameworks/Python.framework/Versions/
    > 2.5/li/python2.5:$PYTHONPATH
    >
    > this seemed to work - no errors raised. However, when I retyped
    > example (1), I got the same original error.
    >
    > Any assistance would be much appreciated. I'm working on max os x
    > leopard.
    >
    > Thanks,
    >
    > elsa
    >
    >

    The path in the error message simply refers to the full path string to
    your Python interpreter, and reflects %0 in your shell. So I'd assume
    you've got a script called 'python' on your path, which spells out the
    full path name.

    That has nothing to do with your problem. Your problem, as you
    correctly surmised, is that Python's search path doesn't include the
    directory containing sgmllib.py

    Unfortunately, I haven't used Unix for over 15 years, so I don't recall the specifics of the shell 'export' operation. But you should be able to check the workings by inspecting the PYTHONPATH variable after trying to change it.

    There are a number of places that Python looks for a .py file. My choice, for a standard library like this that you don't plan to modify, would be to put it in the site-packages directory. You can see exactly where that is by doing the following in a simple script, and examining the output.

    import sys
    print sys.path
     
    Dave Angel, Aug 24, 2009
    #3
  4. Dave Angel schrieb:
    > elsa wrote:
    >> python sgmllib.py "path/to/my/file.html" .... example (1)

    >
    > The path in the error message simply refers to the full path string to
    > your Python interpreter, and reflects %0 in your shell. So I'd assume
    > you've got a script called 'python' on your path, which spells out the
    > full path name.


    No, the problem is that "sgmllib.py" simply isn't in the directory where
    the Python interpreter is run. When you say

    python sgmllib.py

    you are instructing the Python interpreter to run the script "sgmllib.py"
    *in the current directory*. According to the original post, that's clearly
    not the intention of the OP.

    Stefan
     
    Stefan Behnel, Aug 24, 2009
    #4
  5. elsa

    Nobody Guest

    On Mon, 24 Aug 2009 09:08:07 +0200, Stefan Behnel wrote:

    > But note that sgmllib is a particularly cumbersome way to deal with HTML.


    Mostly because it only provides a tokeniser, not a parser. Whoever wrote
    it doesn't appear to understand the difference.
     
    Nobody, Aug 25, 2009
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. C. Titus Brown

    sgmllib problem & proposed fix.

    C. Titus Brown, Dec 17, 2004, in forum: Python
    Replies:
    1
    Views:
    363
    C. Titus Brown
    Dec 17, 2004
  2. Harlin Seritt

    SGMLlib module

    Harlin Seritt, May 8, 2005, in forum: Python
    Replies:
    3
    Views:
    332
    John J. Lee
    May 8, 2005
  3. Sakcee
    Replies:
    1
    Views:
    310
  4. Richard Hsu
    Replies:
    2
    Views:
    288
    Richard Hsu
    Apr 12, 2006
  5. Michael Butscher

    Py 2.5: Bug in sgmllib

    Michael Butscher, Oct 22, 2006, in forum: Python
    Replies:
    2
    Views:
    313
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Oct 22, 2006
Loading...

Share This Page