rejecting newlines with re.match

Discussion in 'Python' started by r0g, Nov 27, 2008.

  1. r0g

    r0g Guest

    Hi,

    I want to use a regex to match a string "poo" but not "poo\n" or
    "poo"+chr(13) or "poo"+chr(10) or "poo"+chr(10)+chr(13)

    According to http://docs.python.org/library/re.html

    '.' (Dot.) In the default mode, this matches any character except a
    newline. If the DOTALL flag has been specified, this matches any
    character including a newline.


    So I tried
    a = re.compile(r'^.{1,50}$')
    print a.match("poo\n")
    <_sre.SRE_Match object at 0xb7767988>

    :-(

    The library says...

    '$' Matches the end of the string or just before the newline at the end
    of the string, and in MULTILINE mode also matches before a newline. foo
    matches both ‘foo’ and ‘foobar’, while the regular expression foo$
    matches only ‘foo’. More interestingly, searching for foo.$ in
    'foo1\nfoo2\n' matches ‘foo2’ normally, but ‘foo1’ in MULTILINE mode;
    searching for a single $ in 'foo\n' will find two (empty) matches: one
    just before the newline, and one at the end of the string.


    So that explains it but what am I to do then? I assume it isn't matching
    the newline itself as the returned string does not contain one but is
    there a switch that can stop $ matching 'just before the newline at the
    end of the string' or is there another character class I could use here?
    Any ideas greatly appreciated!

    Thanks,


    Roger.
     
    r0g, Nov 27, 2008
    #1
    1. Advertising

  2. r0g

    MRAB Guest

    r0g wrote:
    > Hi,
    >
    > I want to use a regex to match a string "poo" but not "poo\n" or
    > "poo"+chr(13) or "poo"+chr(10) or "poo"+chr(10)+chr(13)
    >

    "\n" is the same as chr(10).

    > According to http://docs.python.org/library/re.html
    >
    > '.' (Dot.) In the default mode, this matches any character except a
    > newline. If the DOTALL flag has been specified, this matches any
    > character including a newline.
    >
    >
    > So I tried
    > a = re.compile(r'^.{1,50}$')
    > print a.match("poo\n")
    > <_sre.SRE_Match object at 0xb7767988>
    >
    > :-(
    >
    > The library says...
    >
    > '$' Matches the end of the string or just before the newline at the end
    > of the string, and in MULTILINE mode also matches before a newline. foo
    > matches both ‘foo’ and ‘foobar’, while the regular expression foo$
    > matches only ‘foo’. More interestingly, searching for foo.$ in
    > 'foo1\nfoo2\n' matches ‘foo2’ normally, but ‘foo1’ in MULTILINE mode;
    > searching for a single $ in 'foo\n' will find two (empty) matches: one
    > just before the newline, and one at the end of the string.
    >
    >
    > So that explains it but what am I to do then? I assume it isn't matching
    > the newline itself as the returned string does not contain one but is
    > there a switch that can stop $ matching 'just before the newline at the
    > end of the string' or is there another character class I could use here?
    > Any ideas greatly appreciated!
    >

    There is also "\Z" which matches only at the end of the string:

    >>> a = re.compile(r'^.{1,50}\Z')
    >>> print a.match("poo\n")

    None
    >>>


    I don't know what your use case is, but do you actually need to use
    regex? Sometimes is simpler and faster if you don't.
     
    MRAB, Nov 27, 2008
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Andrew Thompson
    Replies:
    9
    Views:
    476
    Andrew Thompson
    May 6, 2004
  2. JKop
    Replies:
    16
    Views:
    1,173
  3. Guido van Rossum

    Rejecting the J2 decorators proposal

    Guido van Rossum, Sep 1, 2004, in forum: Python
    Replies:
    2
    Views:
    294
    Nicolas Fleury
    Sep 1, 2004
  4. JohnQ
    Replies:
    0
    Views:
    331
    JohnQ
    Aug 26, 2007
  5. Paul Bibbings
    Replies:
    12
    Views:
    515
    James Kanze
    Jun 22, 2010
Loading...

Share This Page