problem with my regex?

Discussion in 'Python' started by Brian, May 22, 2006.

  1. Brian

    Brian Guest

    I have a simple script below that is causing me some problems and I am
    having a hard time tracking them down. Here is the code:

    import urllib
    import re

    def getPicLinks():
    found = []
    try:
    page =
    urllib.urlopen("http://continuouswave.com/whaler/cetacea/")
    except:
    print "ERROR RREADING PAGE."
    sys.exit()
    page1 = page.read()
    cetLinks = re.compile("cetaceaPage..\.html", page1)
    for line in page1:
    found.append(cetLinks.findall(line))
    print found

    This is the error message:
    "/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/sre_parse.py",
    line 396, in _parse
    if state.flags & SRE_FLAG_VERBOSE:
    TypeError: unsupported operand type(s) for &: 'str' and 'int'

    I am trying to extract the links on a web page that have a similar
    pattern. Here is an example of the html source:

    <HR>
    <P><SMALL><A HREF="photoLog.html">PHOTO-LOG</A><br>
    <A HREF="guide.html">How-To-Submit</A><BR><A
    HREF="cetaceaPage01.html">01</A>&nbsp;|&nbsp;<A
    HREF="cetaceaPage02.html">02</A>&nbsp;|&nbsp;<A
    HREF="cetaceaPage03.html">03</A>&nbsp;|&nbsp;<A
    HREF="cetaceaPage04.html">04</A>&nbsp;|&nbsp;<A
    HREF="cetaceaPage05.html">05</A>&nbsp;|&nbsp;<A
    HREF="cetaceaPage06.html">06</A>&nbsp;|&nbsp;<A
    HREF="cetaceaPage07.html">07</A>&nbsp;|&nbsp;<A
    HREF="cetaceaPage08.html">08</A>&nbsp;|&nbsp;<A
    HREF="cetaceaPage09.html">09</A>&nbsp;|&nbsp;<A
    HREF="cetaceaPage10.html">10</A>
    <BR><A>

    My problem is that I can't seem to be able to figure out what is going
    wrong here. Mostly because I am a bit confused by the error message as
    it points to a file (presumable part of re) that I am unfamiliar with,
    and I am a bit new with python.

    Any help is greatly appreciated, as is your patience.

    Brian
    Brian, May 22, 2006
    #1
    1. Advertising

  2. Brian

    Brian Guest

    I sincerely appreciate your reply and the time you took to explain it.

    Thank you,
    Brian
    Brian, May 22, 2006
    #2
    1. Advertising

  3. Brian a écrit :
    > I have a simple script below that is causing me some problems and I am
    > having a hard time tracking them down. Here is the code:
    >
    > import urllib
    > import re
    >
    > def getPicLinks():
    > found = []
    > try:
    > page =
    > urllib.urlopen("http://continuouswave.com/whaler/cetacea/")
    > except:

    Do everyone a favor: don't use bare expect clause

    > print "ERROR RREADING PAGE."
    > sys.exit()


    stdout is for normal program outputs. Error messages should go to
    stderr. And FWIW, your exception handling here is more than useless.
    You'd better let the exception propagate - at worse, it will also exit
    the program, but with the right return value for the system and a
    meaningful traceback.

    > page1 = page.read()
    > cetLinks = re.compile("cetaceaPage..\.html", page1)


    Are you sure you've carefully read the doc for re.compile() ?-)

    You want something like this (NB : regexp not tested):

    html = page.read()
    page.close() # dont forget to free resources
    cetLinks = re.compile(r"cetaceaPage[0-9]{2}\.html")
    found = cetLinks.findall(html)
    print "\n".join(found)

    > This is the error message:
    > "/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/sre_parse.py",
    > line 396, in _parse
    > if state.flags & SRE_FLAG_VERBOSE:
    > TypeError: unsupported operand type(s) for &: 'str' and 'int'


    This is not the *full* traceback.

    (snip)

    > My problem is that I can't seem to be able to figure out what is going
    > wrong here.


    What's going wrong is that you are passing the html page content as the
    second argument for re.compile(), (instead of an integer value
    representing a combination of various flags, cf the doc for the re module).

    > Mostly because I am a bit confused by the error message as
    > it points to a file (presumable part of re)


    It is.

    The last parts of the traceback are the file and line where the
    exception has been raised and the exception's message. But before, you
    had all the call stack, including the line where you called re.compile()
    with the wrong arguments. Exception tracebacks are usually really useful
    once you know how to read them.

    HTH
    Bruno Desthuilliers, May 23, 2006
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?SmViQnVzaGVsbA==?=

    Is ASP Validator Regex Engine Same As VS2003 Find Regex Engine?

    =?Utf-8?B?SmViQnVzaGVsbA==?=, Oct 22, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    699
    =?Utf-8?B?SmViQnVzaGVsbA==?=
    Oct 22, 2005
  2. Rick Venter

    perl regex to java regex

    Rick Venter, Oct 29, 2003, in forum: Java
    Replies:
    5
    Views:
    1,622
    Ant...
    Nov 6, 2003
  3. Replies:
    2
    Views:
    596
  4. Xah Lee
    Replies:
    1
    Views:
    939
    Ilias Lazaridis
    Sep 22, 2006
  5. Replies:
    3
    Views:
    757
    Reedick, Andrew
    Jul 1, 2008
Loading...

Share This Page