[regex] How to check for non-space character?

Discussion in 'Python' started by Gilles Ganault, Mar 21, 2009.

  1. Hello

    Some of the adresses are missing a space between the streetname and
    the ZIP code, eg. "123 Main Street01159 Someville"

    The following regex doesn't seem to work:

    #Check for any non-space before a five-digit number
    re_bad_address = re.compile('([^\s].)(\d{5}) ',re.I | re.S | re.M)

    I also tried ([^ ].), to no avail.

    What is the right way to tell the Python re module to check for any
    non-space character?

    Thank you.
     
    Gilles Ganault, Mar 21, 2009
    #1
    1. Advertising

  2. Gilles Ganault

    Tim Chase Guest

    Gilles Ganault wrote:
    > Hello
    >
    > Some of the adresses are missing a space between the streetname and
    > the ZIP code, eg. "123 Main Street01159 Someville"
    >
    > The following regex doesn't seem to work:
    >
    > #Check for any non-space before a five-digit number
    > re_bad_address = re.compile('([^\s].)(\d{5}) ',re.I | re.S | re.M)

    -------------------------------------^

    >
    > I also tried ([^ ].), to no avail.

    --------------------^

    > What is the right way to tell the Python re module to check for any
    > non-space character?


    It looks like it's these periods that are throwing you off. Just
    remove them. For a 3rd syntax:

    (\S)(\d{5})

    the \S (capital, instead of "\s") is "any NON-white-space character"

    -tkc
     
    Tim Chase, Mar 21, 2009
    #2
    1. Advertising

  3. Gilles Ganault

    John Machin Guest

    Gilles Ganault <nospam <at> nospam.com> writes:

    >
    > Hello
    >
    > Some of the adresses are missing a space between the streetname and
    > the ZIP code, eg. "123 Main Street01159 Someville"


    This problem appears very similar to the one you had in a previous episode,
    where you were deleting <br /> in address contexts where it obviously should
    have been treated as importantly as a comma or even (would you believe) a line
    break.

    The example botched output was "... St Johns WoodLondon ..." IIRC.

    Prevention is better than cure; try to find out if your earlier code is causing
    this problem.

    >
    > The following regex doesn't seem to work:


    Regexes do work. If the outcome is not what you expected, it is your
    eexpectation-to-regex translator that is not working.

    What does it do? Does it match zero addresses, all addresses, many addresses
    that contain a 5-digit number /followed/ by a space, something else? Could you
    use the answer to that question to narrow in on the problem with your regex?

    >
    > #Check for any non-space before a five-digit number
    > re_bad_address = re.compile('([^\s].)(\d{5}) ',re.I | re.S | re.M)


    The comment is quite incorrect. After removing the fog of useless parentheses,
    the regex says:
    [^\s] -- one non-whitespace character (better written as \S)
    .. -- any character (more or less, see later) (why?)
    \d{5} -- 5 digits
    -- a space (why?)

    Then there's a hail of flags:
    re.I (ignore case) -- irrelevant
    re.S (DOTALL) -- makes your pointless . match any character (instead of any
    character except newline) Do you have any newlines in your addresses?
    re.M (MULTILINE) -- I'm 99% sure you don't need this either.

    >
    > I also tried ([^ ].), to no avail.


    If not-whitespace doesn't match, changing it to not-space doesn't help.

    >
    > What is the right way to tell the Python re module to check for any
    > non-space character?


    r'[^ ]' -- but that's NOT the question you should be asking.

    HTH,
    John
     
    John Machin, Mar 21, 2009
    #3
  4. On Sat, 21 Mar 2009 08:53:10 -0500, Tim Chase
    <> wrote:
    >It looks like it's these periods that are throwing you off. Just
    >remove them. For a 3rd syntax:
    >
    > (\S)(\d{5})
    >
    >the \S (capital, instead of "\s") is "any NON-white-space character"


    Thanks guys for the tips.
     
    Gilles Ganault, Mar 22, 2009
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Shuo Xiang

    Stack space, global space, heap space

    Shuo Xiang, Jul 9, 2003, in forum: C Programming
    Replies:
    10
    Views:
    2,978
    Bryan Bullard
    Jul 11, 2003
  2. Christian Seberino
    Replies:
    21
    Views:
    1,798
    Stephen Horne
    Oct 27, 2003
  3. Ian Bicking
    Replies:
    2
    Views:
    1,107
    Steve Lamb
    Oct 23, 2003
  4. Ian Bicking
    Replies:
    2
    Views:
    783
    Michael Hudson
    Oct 24, 2003
  5. Replies:
    7
    Views:
    3,689
Loading...

Share This Page