sgmllib.py

E

elsa

Hi all,

I'm new to both this forum and Python, and I've got a bit stuck trying
to learn how to parse HTML.... here is my problem

I'm using a textbook that uses sgmllib.py for all its examples. I'm
aware that sgmllib is not in the current release, however I want to
get it to work, as I have python 2.5, and the text book uses it.

So, the first example says to type something like (to test the
sgmllib):

python sgmllib.py "path/to/my/file.html" .... example (1)

this doesn't work for me. I think I have figured out the problem -
the error says

"/System/Library/Frameworks/Python.framework/Versions/2.5/Resources/
Python.app/Contents/MacOS/Python: can't open file 'sgmllib.py': [Errno
2] No such file or directory"

the problem is that this path is wrong. My sgmllib.py is in:

"/System/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/sgmllib.py"

if I substitute this path for sgmllib.py in example (1), everything
works fine. However, I don't want to do all that typing everytime I
want to use sgmllib.py. So, I thought maybe the problem was with
PYTHONPATH. I executed the following command:

export PYTHONPATH=/System/Library/Frameworks/Python.framework/Versions/
2.5/li/python2.5:$PYTHONPATH

this seemed to work - no errors raised. However, when I retyped
example (1), I got the same original error.

Any assistance would be much appreciated. I'm working on max os x
leopard.

Thanks,

elsa
 
S

Stefan Behnel

elsa said:
I'm new to both this forum and Python, and I've got a bit stuck trying
to learn how to parse HTML...

If what you want to do is *parse* the HTML instead of trying to *learn* how
to parse it, you might want to give the existing (external) HTML parser
libraries a try. There's lxml.html (extremely fast and fixes up broken
HTML), html5lib (very slow, but very browser-like parse results) and
BeautifulSoup (slow, but good encoding detection if you need that).

Here are a couple of (only slightly biased) comparisons:

http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/
http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/

python sgmllib.py "path/to/my/file.html" .... example (1)

this doesn't work for me. I think I have figured out the problem -
the error says

"/System/Library/Frameworks/Python.framework/Versions/2.5/Resources/
Python.app/Contents/MacOS/Python: can't open file 'sgmllib.py': [Errno
2] No such file or directory"

the problem is that this path is wrong. My sgmllib.py is in:

"/System/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/sgmllib.py"

You can use "python -m sgmllib" to call a module from the stdlib (or the
PYTHONPATH, to be more accurate).

But note that sgmllib is a particularly cumbersome way to deal with HTML.

Stefan
 
D

Dave Angel

elsa said:
Hi all,

I'm new to both this forum and Python, and I've got a bit stuck trying
to learn how to parse HTML.... here is my problem

I'm using a textbook that uses sgmllib.py for all its examples. I'm
aware that sgmllib is not in the current release, however I want to
get it to work, as I have python 2.5, and the text book uses it.

So, the first example says to type something like (to test the
sgmllib):

python sgmllib.py "path/to/my/file.html" .... example (1)

this doesn't work for me. I think I have figured out the problem -
the error says

"/System/Library/Frameworks/Python.framework/Versions/2.5/Resources/
Python.app/Contents/MacOS/Python: can't open file 'sgmllib.py': [Errno
2] No such file or directory"

the problem is that this path is wrong. My sgmllib.py is in:

"/System/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/sgmllib.py"

if I substitute this path for sgmllib.py in example (1), everything
works fine. However, I don't want to do all that typing everytime I
want to use sgmllib.py. So, I thought maybe the problem was with
PYTHONPATH. I executed the following command:

export PYTHONPATH=/System/Library/Frameworks/Python.framework/Versions/
2.5/li/python2.5:$PYTHONPATH

this seemed to work - no errors raised. However, when I retyped
example (1), I got the same original error.

Any assistance would be much appreciated. I'm working on max os x
leopard.

Thanks,

elsa
The path in the error message simply refers to the full path string to
your Python interpreter, and reflects %0 in your shell. So I'd assume
you've got a script called 'python' on your path, which spells out the
full path name.

That has nothing to do with your problem. Your problem, as you
correctly surmised, is that Python's search path doesn't include the
directory containing sgmllib.py

Unfortunately, I haven't used Unix for over 15 years, so I don't recall the specifics of the shell 'export' operation. But you should be able to check the workings by inspecting the PYTHONPATH variable after trying to change it.

There are a number of places that Python looks for a .py file. My choice, for a standard library like this that you don't plan to modify, would be to put it in the site-packages directory. You can see exactly where that is by doing the following in a simple script, and examining the output.

import sys
print sys.path
 
S

Stefan Behnel

Dave said:
The path in the error message simply refers to the full path string to
your Python interpreter, and reflects %0 in your shell. So I'd assume
you've got a script called 'python' on your path, which spells out the
full path name.

No, the problem is that "sgmllib.py" simply isn't in the directory where
the Python interpreter is run. When you say

python sgmllib.py

you are instructing the Python interpreter to run the script "sgmllib.py"
*in the current directory*. According to the original post, that's clearly
not the intention of the OP.

Stefan
 
N

Nobody

But note that sgmllib is a particularly cumbersome way to deal with HTML.

Mostly because it only provides a tokeniser, not a parser. Whoever wrote
it doesn't appear to understand the difference.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,540
Members
45,025
Latest member
KetoRushACVFitness

Latest Threads

Top