Something weird about re.finditer()

Discussion in 'Python' started by Gilles Ganault, Apr 15, 2009.

  1. Hello

    I stumbled upon something funny while downloading web pages and
    trying to extract one or more blocks from a page: Even though Python
    seems to return at least one block, it doesn't actually enter the for
    loop:

    ======
    re_block = re.compile('before (.+?) after',re.I|re.S|re.M)

    #Here, get web page and put it into "response"

    blocks = None
    blocks = re_block.finditer(response)
    if blocks == None:
    print "No block found"
    else:
    print "Before blocks"
    for block in blocks:
    #Never displayed!
    print "In blocks"
    ======

    Since "blocks" is no longer set to None after calling finditer()...
    but doesn't contain a single block... what does it contain then?

    Thank you for any tip.
     
    Gilles Ganault, Apr 15, 2009
    #1
    1. Advertising

  2. Gilles Ganault

    Peter Otten Guest

    Gilles Ganault wrote:

    >         I stumbled upon something funny while downloading web pages and
    > trying to extract one or more blocks from a page: Even though Python
    > seems to return at least one block, it doesn't actually enter the for
    > loop:
    >
    > ======
    > re_block = re.compile('before (.+?) after',re.I|re.S|re.M)
    >
    > #Here, get web page and put it into "response"
    >
    > blocks = None
    > blocks = re_block.finditer(response)
    > if blocks == None:
    >         print "No block found"
    > else:
    >         print "Before blocks"
    >         for block in blocks:
    >                 #Never displayed!
    >                 print "In blocks"
    > ======
    >
    > Since "blocks" is no longer set to None after calling finditer()...
    > but doesn't contain a single block... what does it contain then?


    This is by design. When there are no matches re.finditer() returns an empty
    iterator, not None.

    Change your code to something like

    has_matches = False
    for match in re_block.finditer(response):
    if not has_matches:
    has_matches = True
    print "before blocks"
    print "in blocks"
    if not has_matches:
    print "no block found"

    or

    match = None
    for match in re_block.finditer(response):
    print "in blocks"
    if match is None:
    print "no block found"

    Peter
     
    Peter Otten, Apr 15, 2009
    #2
    1. Advertising

  3. Gilles Ganault

    John Machin Guest

    On Apr 15, 6:46 pm, Gilles Ganault <> wrote:
    > Hello
    >
    >         I stumbled upon something funny while downloading web pages and
    > trying to extract one or more blocks from a page: Even though Python
    > seems to return at least one block, it doesn't actually enter the for
    > loop:
    >
    > ======
    > re_block = re.compile('before (.+?) after',re.I|re.S|re.M)
    >
    > #Here, get web page and put it into "response"
    >
    > blocks = None
    > blocks = re_block.finditer(response)
    > if blocks == None:
    >         print "No block found"
    > else:
    >         print "Before blocks"
    >         for block in blocks:
    >                 #Never displayed!
    >                 print "In blocks"
    > ======
    >
    > Since "blocks" is no longer set to None after calling finditer()...
    > but doesn't contain a single block... what does it contain then?
    >
    > Thank you for any tip.


    Tip 0: contemplate what type you could infer from the name findITER
    Tip 1: Read the manual to see what type is returned by re.finditer
    (or do import re; help(re.finditer))
    Tip 2: Append
    , type(blocks)
    to the relevant print statements in your above code, and inspect the
    output.

    Metatip 0: Following the tips can be done rapidly without any need for
    an internet connection.

    Meta**2tip 0: The Tips and the Metatip can be applied to many things,
    not just re.finditer.

    HTH,
    John
     
    John Machin, Apr 15, 2009
    #3
  4. On Apr 15, 4:46 pm, Gilles Ganault <> wrote:
    > re_block = re.compile('before (.+?) after',re.I|re.S|re.M)
    >
    > #Here, get web page and put it into "response"
    >
    > blocks = None
    > blocks = re_block.finditer(response)
    > if blocks == None:
    >         print "No block found"
    > else:
    >         print "Before blocks"
    >         for block in blocks:
    >                 #Never displayed!
    >                 print "In blocks"
    > ======
    >
    > Since "blocks" is no longer set to None after calling finditer()...
    > but doesn't contain a single block... what does it contain then?
    >
    > Thank you for any tip.


    because finditer returns a generator which in your case just happens
    to be empty

    >>> import re
    >>> patt = re.compile('foo')
    >>> gen = patt.finditer('bar')
    >>> gen is None

    False
    >>> gen == None

    False
    >>> gen

    <callable-iterator object at 0x00E55B70>
    >>> list(gen)

    []
    >>>
     
    Justin Ezequiel, Apr 15, 2009
    #4
  5. On Wed, 15 Apr 2009 10:46:28 +0200, Gilles Ganault wrote:

    > Since "blocks" is no longer set to None after calling finditer()... but
    > doesn't contain a single block... what does it contain then?


    It probably took you twenty times more time and effort to ask the
    question than it would have to look for yourself.


    >>> import re
    >>> re_block = re.compile('before (.+?) after',re.I|re.S|re.M)
    >>> x = re_block.finditer("nothing to see here")
    >>> x is None

    False
    >>> x

    <callable-iterator object at 0xb7f5ecec>
    >>> list(x)

    []




    BTW, testing for None with == is not recommended, because one day
    somebody might pass your function some strange object that compares equal
    to None. Although it wouldn't have solved your problem, the recommended
    way to test if an object is None is with the `is` operator.



    --
    Steven
     
    Steven D'Aprano, Apr 15, 2009
    #5
  6. In message <>, Steven
    D'Aprano wrote:

    > BTW, testing for None with == is not recommended, because one day
    > somebody might pass your function some strange object that compares equal
    > to None.


    Presumably if it compares equal to None, that is by design, precisely so it
    would work in this way.
     
    Lawrence D'Oliveiro, Apr 18, 2009
    #6
  7. On Sat, 18 Apr 2009 12:37:09 +1200, Lawrence D'Oliveiro wrote:

    > In message <>,
    > Steven D'Aprano wrote:
    >
    >> BTW, testing for None with == is not recommended, because one day
    >> somebody might pass your function some strange object that compares
    >> equal to None.

    >
    > Presumably if it compares equal to None, that is by design, precisely so
    > it would work in this way.


    In context, no. We're not talking about somebody creating an object which
    is equivalent to None when treated as a value, but using None as a
    sentinel. Sentinels are markers, and it is important that nothing else
    can be mistaken for that marker or breakage will occur.

    Of course, if the caller knows how the sentinel is used, then he might
    choose to duplicate that usage but pass some other object. But that would
    be stupid and should be discouraged. I mean, what would be the point? I
    can think of use-cases for creating something that returns equal to None
    -- the Null object pattern comes to mind. But what would be the point of
    creating an object that was not None but would fool a function into
    treating it as the same sentinel as None?



    --
    Steven
     
    Steven D'Aprano, Apr 18, 2009
    #7
  8. Gilles Ganault

    Aaron Brady Guest

    On Apr 17, 9:37 pm, Steven D'Aprano <st...@REMOVE-THIS-
    cybersource.com.au> wrote:
    > On Sat, 18 Apr 2009 12:37:09 +1200, Lawrence D'Oliveiro wrote:
    > > In message <>,
    > > Steven D'Aprano wrote:

    >
    > >> BTW, testing for None with == is not recommended, because one day
    > >> somebody might pass your function some strange object that compares
    > >> equal to None.

    >
    > > Presumably if it compares equal to None, that is by design, precisely so
    > > it would work in this way.

    >
    > In context, no. We're not talking about somebody creating an object which
    > is equivalent to None when treated as a value, but using None as a
    > sentinel. Sentinels are markers, and it is important that nothing else
    > can be mistaken for that marker or breakage will occur.
    >
    > Of course, if the caller knows how the sentinel is used, then he might
    > choose to duplicate that usage but pass some other object. But that would
    > be stupid and should be discouraged. I mean, what would be the point? I
    > can think of use-cases for creating something that returns equal to None
    > -- the Null object pattern comes to mind. But what would be the point of
    > creating an object that was not None but would fool a function into
    > treating it as the same sentinel as None?


    In that case, they could use separate sentinels, that are instances of
    a class or classes that have defined behavior for comparing to each
    other.

    It might get as bad as setting a flag on the class or sentinel, though
    you'd have to be careful about concurrency and especially nested
    calls.

    You'd have to rely on the user function to use equality instead of
    identity testing, since 'sentinel is None' won't return true no matter
    what you do to it.
     
    Aaron Brady, Apr 18, 2009
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Robert Oschler

    An ode to re.finditer()

    Robert Oschler, Aug 1, 2004, in forum: Python
    Replies:
    0
    Views:
    347
    Robert Oschler
    Aug 1, 2004
  2. Erik Johnson

    using re.finditer()

    Erik Johnson, Oct 27, 2004, in forum: Python
    Replies:
    4
    Views:
    19,066
    Erik Johnson
    Oct 27, 2004
  3. Robert Brewer

    RE: using re.finditer()

    Robert Brewer, Oct 27, 2004, in forum: Python
    Replies:
    0
    Views:
    455
    Robert Brewer
    Oct 27, 2004
  4. Greg Lindstrom

    using re.finditer()

    Greg Lindstrom, Oct 27, 2004, in forum: Python
    Replies:
    0
    Views:
    312
    Greg Lindstrom
    Oct 27, 2004
  5. Chris Lasher
    Replies:
    8
    Views:
    345
    Michael Hoffman
    Dec 18, 2004
Loading...

Share This Page