problem with my regex?

Brian · May 22, 2006

I have a simple script below that is causing me some problems and I am
having a hard time tracking them down. Here is the code:

import urllib
import re

def getPicLinks():
found = []
try:
page =
urllib.urlopen("http://continuouswave.com/whaler/cetacea/")
except:
print "ERROR RREADING PAGE."
sys.exit()
page1 = page.read()
cetLinks = re.compile("cetaceaPage..\.html", page1)
for line in page1:
found.append(cetLinks.findall(line))
print found

This is the error message:
"/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/sre_parse.py",
line 396, in _parse
if state.flags & SRE_FLAG_VERBOSE:
TypeError: unsupported operand type(s) for &: 'str' and 'int'

I am trying to extract the links on a web page that have a similar
pattern. Here is an example of the html source:

<HR>
<A HREF="photoLog.html">PHOTO-LOG</A> 
<A HREF="guide.html">How-To-Submit</A> <A
HREF="cetaceaPage01.html">01</A> | <A
HREF="cetaceaPage02.html">02</A> | <A
HREF="cetaceaPage03.html">03</A> | <A
HREF="cetaceaPage04.html">04</A> | <A
HREF="cetaceaPage05.html">05</A> | <A
HREF="cetaceaPage06.html">06</A> | <A
HREF="cetaceaPage07.html">07</A> | <A
HREF="cetaceaPage08.html">08</A> | <A
HREF="cetaceaPage09.html">09</A> | <A
HREF="cetaceaPage10.html">10</A>
 <A>

My problem is that I can't seem to be able to figure out what is going
wrong here. Mostly because I am a bit confused by the error message as
it points to a file (presumable part of re) that I am unfamiliar with,
and I am a bit new with python.

Any help is greatly appreciated, as is your patience.

Brian

Brian · May 22, 2006

I sincerely appreciate your reply and the time you took to explain it.

Thank you,
Brian

Bruno Desthuilliers · May 23, 2006

Brian a écrit :

I have a simple script below that is causing me some problems and I am
having a hard time tracking them down. Here is the code:

import urllib
import re

def getPicLinks():
found = []
try:
page =
urllib.urlopen("http://continuouswave.com/whaler/cetacea/")
except:

Do everyone a favor: don't use bare expect clause

print "ERROR RREADING PAGE."
> sys.exit()

stdout is for normal program outputs. Error messages should go to
stderr. And FWIW, your exception handling here is more than useless.
You'd better let the exception propagate - at worse, it will also exit
the program, but with the right return value for the system and a
meaningful traceback.

page1 = page.read()
cetLinks = re.compile("cetaceaPage..\.html", page1)

Are you sure you've carefully read the doc for re.compile() ?-)

You want something like this (NB : regexp not tested):

html = page.read()
page.close() # dont forget to free resources
cetLinks = re.compile(r"cetaceaPage[0-9]{2}\.html")
found = cetLinks.findall(html)
print "\n".join(found)

This is the error message:
"/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/sre_parse.py",
line 396, in _parse
if state.flags & SRE_FLAG_VERBOSE:
TypeError: unsupported operand type(s) for &: 'str' and 'int'

This is not the *full* traceback.

(snip)

My problem is that I can't seem to be able to figure out what is going
wrong here.

What's going wrong is that you are passing the html page content as the
second argument for re.compile(), (instead of an integer value
representing a combination of various flags, cf the doc for the re module).

Mostly because I am a bit confused by the error message as
it points to a file (presumable part of re)

It is.

The last parts of the traceback are the file and line where the
exception has been raised and the exception's message. But before, you
had all the call stack, including the line where you called re.compile()
with the wrong arguments. Exception tracebacks are usually really useful
once you know how to read them.

HTH

Help with my responsive home page	2	Dec 14, 2022
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
using refresh to reload page, problem is it loads header, footer andeverything, can I avoind those t	8	Oct 10, 2010
I need help fixing my website	2	Oct 15, 2023
Help with code	0	Jun 12, 2022
Parsing html with Beautifulsoup	0	Dec 10, 2009
SendGrid email issue in responsive Gmail	1	Nov 4, 2021
Only one table shows up with the information	2	Mar 29, 2023

problem with my regex?

Brian

Brian

Bruno Desthuilliers

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads