re.match and non-alphanumeric characters

Discussion in 'Python' started by The Web President, Nov 16, 2008.

  1. Dear all,

    this is really driving me nuts and any help would be extremely
    appreciated.

    I have a string that contains some numeric data. I want to isolate
    these data using re.match, as follows.

    bogus = "IFC(35m)"
    data = re.match(r'(\d+)',bogus)
    print data.group(1)

    I would expect to have "35" printed out to screen, but instead I get
    an error that the regular expression did not match:

    Traceback (most recent call last):
    File "C:\Documents and Settings\Mattia\Desktop\Neeltje\read.py",
    line 20, in <module>
    print data.group(1)
    AttributeError: 'NoneType' object has no attribute 'group'

    Note that the same holds if I look for "35" straight, instead of "\d
    +". If instead I look for "IFC" it works fine. That is, apparently
    re.match will match only up to the first non-alphanumeric character
    and ignore anything after a "(", "_", "[" and god knows what else.

    I am using Python 2.6 (r26:66721, latest stable version). Am I missing
    something very big and very important?
     
    The Web President, Nov 16, 2008
    #1
    1. Advertising

  2. The Web President

    r Guest

    On Nov 16, 10:33 am, The Web President <>
    wrote:
    > Dear all,
    >
    > this is really driving me nuts and any help would be extremely
    > appreciated.
    >
    > I have a string that contains some numeric data. I want to isolate
    > these data using re.match, as follows.
    >
    > bogus = "IFC(35m)"
    > data = re.match(r'(\d+)',bogus)
    > print data.group(1)
    >
    > I would expect to have "35" printed out to screen, but instead I get
    > an error that the regular expression did not match:
    >
    > Traceback (most recent call last):
    >   File "C:\Documents and Settings\Mattia\Desktop\Neeltje\read.py",
    > line 20, in <module>
    >     print data.group(1)
    > AttributeError: 'NoneType' object has no attribute 'group'
    >
    > Note that the same holds if I look for "35" straight, instead of "\d
    > +". If instead I look for "IFC" it works fine. That is, apparently
    > re.match will match only up to the first non-alphanumeric character
    > and ignore anything after a "(", "_", "[" and god knows what else.
    >
    > I am using Python 2.6 (r26:66721, latest stable version). Am I missing
    > something very big and very important?


    try re.search or re.findall
    re.match is only at the beginning of a string
    i almost never use it
    >>> re.search('(\d+)', bogus).group()

    '35'
    >>> re.search('(\d+)', bogus).span()

    (4, 6)
     
    r, Nov 16, 2008
    #2
    1. Advertising

  3. The Web President

    MRAB Guest

    On Nov 16, 4:33 pm, The Web President <>
    wrote:
    > Dear all,
    >
    > this is really driving me nuts and any help would be extremely
    > appreciated.
    >
    > I have a string that contains some numeric data. I want to isolate
    > these data using re.match, as follows.
    >
    > bogus = "IFC(35m)"
    > data = re.match(r'(\d+)',bogus)
    > print data.group(1)
    >
    > I would expect to have "35" printed out to screen, but instead I get
    > an error that the regular expression did not match:
    >
    > Traceback (most recent call last):
    >   File "C:\Documents and Settings\Mattia\Desktop\Neeltje\read.py",
    > line 20, in <module>
    >     print data.group(1)
    > AttributeError: 'NoneType' object has no attribute 'group'
    >
    > Note that the same holds if I look for "35" straight, instead of "\d
    > +". If instead I look for "IFC" it works fine. That is, apparently
    > re.match will match only up to the first non-alphanumeric character
    > and ignore anything after a "(", "_", "[" and god knows what else.
    >
    > I am using Python 2.6 (r26:66721, latest stable version). Am I missing
    > something very big and very important?


    re.match() anchors the match at the start of the string. What you need
    is re.search(). It's all in the documentation! :)
     
    MRAB, Nov 16, 2008
    #3
  4. En Sun, 16 Nov 2008 14:33:42 -0200, The Web President
    <> escribió:

    > I have a string that contains some numeric data. I want to isolate
    > these data using re.match, as follows.
    >
    > bogus = "IFC(35m)"
    > data = re.match(r'(\d+)',bogus)
    > print data.group(1)
    >
    > I would expect to have "35" printed out to screen, but instead I get
    > an error that the regular expression did not match:


    http://docs.python.org/library/re.html#matching-vs-searching

    --
    Gabriel Genellina
     
    Gabriel Genellina, Nov 16, 2008
    #4
  5. The Web President wrote:

    > Dear all,
    >
    > this is really driving me nuts and any help would be extremely
    > appreciated.
    >
    > I have a string that contains some numeric data. I want to isolate
    > these data using re.match, as follows.
    >
    > bogus = "IFC(35m)"
    > data = re.match(r'(\d+)',bogus)
    > print data.group(1)
    >
    > I would expect to have "35" printed out to screen, but instead I get
    > an error that the regular expression did not match:
    >
    > Traceback (most recent call last):
    > File "C:\Documents and Settings\Mattia\Desktop\Neeltje\read.py",
    > line 20, in <module>
    > print data.group(1)
    > AttributeError: 'NoneType' object has no attribute 'group'
    >
    > Note that the same holds if I look for "35" straight, instead of "\d
    > +". If instead I look for "IFC" it works fine. That is, apparently
    > re.match will match only up to the first non-alphanumeric character
    > and ignore anything after a "(", "_", "[" and god knows what else.
    >
    > I am using Python 2.6 (r26:66721, latest stable version). Am I missing
    > something very big and very important?


    Yep - re.search. Match matches the whole string. You want searching.


    Diez
     
    Diez B. Roggisch, Nov 16, 2008
    #5
  6. The Web President

    John Machin Guest

    On Nov 17, 4:44 am, "Diez B. Roggisch" <> wrote:

    > Match matches the whole string.


    *ONLY* if the pattern ends with "$" or r"\Z"
     
    John Machin, Nov 16, 2008
    #6
  7. John Machin schrieb:
    > On Nov 17, 4:44 am, "Diez B. Roggisch" <> wrote:
    >
    >> Match matches the whole string.

    >
    > *ONLY* if the pattern ends with "$" or r"\Z"



    You think so?

    import re

    rex = re.compile("abc.*def")

    if rex.match("abc0123455678def"):
    print "matched"



    Diez
     
    Diez B. Roggisch, Nov 16, 2008
    #7
  8. The Web President

    Steve Holden Guest

    Diez B. Roggisch wrote:
    > John Machin schrieb:
    >> On Nov 17, 4:44 am, "Diez B. Roggisch" <> wrote:
    >>
    >>> Match matches the whole string.

    >>
    >> *ONLY* if the pattern ends with "$" or r"\Z"

    >
    >
    > You think so?
    >
    > import re
    >
    > rex = re.compile("abc.*def")
    >
    > if rex.match("abc0123455678def"):
    > print "matched"
    >

    Your test is inconclusive: necessary, but not sufficient.

    >>> rex = re.compile("abc.*def")
    >>>
    >>> if rex.match("abc0123455678defPLUSEXTRASTUFF"):

    .... print "Matched"
    ....
    Matched
    >>>


    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    Holden Web LLC http://www.holdenweb.com/
     
    Steve Holden, Nov 16, 2008
    #8
  9. The Web President

    John Machin Guest

    On Nov 17, 10:19 am, "Diez B. Roggisch" <> wrote:
    > John Machin schrieb:
    >
    > > On Nov 17, 4:44 am, "Diez B. Roggisch" <> wrote:

    >
    > >>  Match matches the whole string.

    >
    > > *ONLY* if the pattern ends with "$" or r"\Z"

    >
    > You think so?
    >
    > import re
    >
    > rex = re.compile("abc.*def")
    >
    > if rex.match("abc0123455678def"):
    >      print "matched"
    >


    OK, I'll try again:

    The following 3-tuples represent (pattern, string,
    matched_portion_of_string):
    ('abc', 'abc', 'abc')
    ('abc', 'abcdef', 'abc')
    ('abc$', 'abc', 'abc')
    ('abc$', 'abcdef', '<no match>')

    Saying "Match matches the whole string" is incorrect; see the second
    case. If you want to ensure that the whole string matches the pattern,
    the pattern needs to be terminated by "$" or "\Z".
     
    John Machin, Nov 17, 2008
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Steven J Sobol
    Replies:
    8
    Views:
    5,719
    Thomas Weidenfeller
    Apr 30, 2004
  2. joe

    remove non alphanumeric characters

    joe, Mar 2, 2007, in forum: C Programming
    Replies:
    5
    Views:
    864
  3. Yasin Cepeci
    Replies:
    1
    Views:
    954
    Juan T. Llibre
    Apr 26, 2007
  4. Yasin Cepeci
    Replies:
    2
    Views:
    249
    Yasin Cepeci
    Apr 26, 2007
  5. Theallnighter Theallnighter

    Newbie Question: delete all non alphanumeric characters

    Theallnighter Theallnighter, Jul 21, 2006, in forum: Ruby
    Replies:
    15
    Views:
    316
    Joe Karma
    Jul 22, 2006
Loading...

Share This Page