regex question

Discussion in 'Python' started by Helmut Jarausch, Jun 25, 2005.

  1. Felix Schwarz wrote:
    > Hi all,
    >
    > I'm experiencing problems with a regular expression and I can't figure
    > out which words I use when googling. I read the python documentation for
    > the re module multiple times now but still no idea what I'm doing wrong.
    >
    > What I want to do:
    > - Extract all digits (\d) in a string.
    > - Digits are separated by space (\w)
    >
    > What my program does:
    > - It extracts only the last digit.
    >
    > Here is my program:
    > import re
    > line = ' 1 2 3'
    > regex = '^' + '(?:\s+(\d))*' + '$'
    > match = re.match(regex, line)
    > print "lastindex is: ",match.lastindex
    > print "matches: ",match.group(1)
    >
    >
    > Obviously I do not understand how (?:\s+(\d))* works in conjunction with
    > ^ and $.
    >


    I am sure what you like to do.
    What about
    regex= re.compile('\s+\d')
    print regex.findall(line)



    --
    Helmut Jarausch

    Lehrstuhl fuer Numerische Mathematik
    RWTH - Aachen University
    D 52056 Aachen, Germany
    Helmut Jarausch, Jun 25, 2005
    #1
    1. Advertising

  2. Hi all,

    I'm experiencing problems with a regular expression and I can't figure
    out which words I use when googling. I read the python documentation for
    the re module multiple times now but still no idea what I'm doing wrong.

    What I want to do:
    - Extract all digits (\d) in a string.
    - Digits are separated by space (\w)

    What my program does:
    - It extracts only the last digit.

    Here is my program:
    import re
    line = ' 1 2 3'
    regex = '^' + '(?:\s+(\d))*' + '$'
    match = re.match(regex, line)
    print "lastindex is: ",match.lastindex
    print "matches: ",match.group(1)


    Obviously I do not understand how (?:\s+(\d))* works in conjunction with
    ^ and $.

    Does anybody know how to transform this regex to get the result I want
    to have?

    fs
    Felix Schwarz, Jun 25, 2005
    #2
    1. Advertising

  3. "Felix Schwarz" <> wrote:

    > Hi all,
    >
    > I'm experiencing problems with a regular expression and I can't figure
    > out which words I use when googling. I read the python documentation for
    > the re module multiple times now but still no idea what I'm doing wrong.
    >
    > What I want to do:
    > - Extract all digits (\d) in a string.
    > - Digits are separated by space (\w)
    >
    > What my program does:
    > - It extracts only the last digit.
    >
    > Here is my program:
    > import re
    > line = ' 1 2 3'
    > regex = '^' + '(?:\s+(\d))*' + '$'
    > match = re.match(regex, line)
    > print "lastindex is: ",match.lastindex
    > print "matches: ",match.group(1)
    >
    >
    > Obviously I do not understand how (?:\s+(\d))* works in conjunction with
    > ^ and $.
    >
    > Does anybody know how to transform this regex to get the result I want
    > to have?
    >
    > fs


    Here are three ways:

    - If you your strings consist of only white space and single digits as
    in your example, the simplest way is split():
    >>> ' 1 2 3'.split()

    ['1', '2', '3']

    - Otherwise use re.findall:
    >>> import re
    >>> digit = re.compile(r'\d')
    >>> digit.findall('1 ab 34b 6')

    ['1', '3', '4', '6']

    - Finally, for the special case you are searching for single characters
    (such as digits), perhaps the fastest way is to use string.translate:

    >>> import string
    >>> allchars = string.maketrans('','') # 2 empty strings
    >>> nondigits = allchars.translate(allchars, string.digits)
    >>> '1 ab 34 6'.translate(allchars, nondigits)

    '1346'

    Note that the result is a string of the matched characters, not a list;
    you can simply turn it to list by list('1346').

    Hope this helps,

    George
    George Sakkis, Jun 25, 2005
    #3
  4. Helmut Jarausch

    Paul McGuire Guest

    Here's a pyparsing version of this, that may be easier to maintain long
    term (although if you have your heart set on learning regexp's, they
    will certainly do the job). Note that in pyparsing, you don't have to
    spell out where the whitespace goes - pyparsing's default logic assumes
    that whitespace may be found between any grammar elements, and if
    found, it is ignored. (I believe regexp has a special magic symbol
    that will do the same thing.)

    Download pyparsing at http://pyparsing.sourceforge.net.
    -- Paul


    import pyparsing as pp

    testString = ' 1 2 3'

    integer = pp.Word(pp.nums)
    lineData = pp.OneOrMore( integer )

    results = lineData.parseString( testString )
    print results

    will print:
    ['1', '2', '3']
    Paul McGuire, Jun 25, 2005
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?SmViQnVzaGVsbA==?=

    Is ASP Validator Regex Engine Same As VS2003 Find Regex Engine?

    =?Utf-8?B?SmViQnVzaGVsbA==?=, Oct 22, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    688
    =?Utf-8?B?SmViQnVzaGVsbA==?=
    Oct 22, 2005
  2. Rick Venter

    perl regex to java regex

    Rick Venter, Oct 29, 2003, in forum: Java
    Replies:
    5
    Views:
    1,604
    Ant...
    Nov 6, 2003
  3. Replies:
    2
    Views:
    589
  4. Xah Lee
    Replies:
    1
    Views:
    927
    Ilias Lazaridis
    Sep 22, 2006
  5. Replies:
    3
    Views:
    726
    Reedick, Andrew
    Jul 1, 2008
Loading...

Share This Page