Help with Regular Expressions

Discussion in 'Python' started by Harlin Seritt, Aug 10, 2005.

  1. I have been looking at the Python re module and have been trying to
    make sense of a simple function that I'd like to do. However, no amount
    of reading or googling has helped me with this. Forgive my
    stone-headedness. I have done this with .NET and Java in the past but
    damn if I can't get it done with Python for some reason. As such I am
    sure it is something even simpler.

    I am trying to find some matches and have them put into a list when
    processing is done. I'll use a simple example like email addresses.

    My input is the following:
    wordList = ['myname1', '', '',
    'myname4@domain', '']

    My regular expression would be something like '\w\@\w\.\w' (I realize
    it could and should be more detailed but that's not the point for now).

    I would like to find out how to output the matches for this expression
    of my 'wordList' into a neat list variable. How do I get this done?

    Thanks,

    Harlin Seritt
    Harlin Seritt, Aug 10, 2005
    #1
    1. Advertising

  2. Harlin Seritt

    Devan L Guest

    Harlin Seritt wrote:
    > I have been looking at the Python re module and have been trying to
    > make sense of a simple function that I'd like to do. However, no amount
    > of reading or googling has helped me with this. Forgive my
    > stone-headedness. I have done this with .NET and Java in the past but
    > damn if I can't get it done with Python for some reason. As such I am
    > sure it is something even simpler.
    >
    > I am trying to find some matches and have them put into a list when
    > processing is done. I'll use a simple example like email addresses.
    >
    > My input is the following:
    > wordList = ['myname1', '', '',
    > 'myname4@domain', '']
    >
    > My regular expression would be something like '\w\@\w\.\w' (I realize
    > it could and should be more detailed but that's not the point for now).
    >
    > I would like to find out how to output the matches for this expression
    > of my 'wordList' into a neat list variable. How do I get this done?
    >
    > Thanks,
    >
    > Harlin Seritt


    You need to enclose the '\w's in parentheses. The re module will only
    return it if you enclose it in parentheses. Also, you need to use the
    '+' so that \w won't just match the first alphanumeric character, but
    will match one or more. You also need to escape the '.' because that's
    matches any character. So your regular expression would be more like

    r'(\w+)@(\w+)\.(\w+)'

    Anyways, you can use a list comprehension and the groups() method of a
    match object to build a list of tuples
    [re.match(r'(\w+)@(\w+)\.(\w+)', address).groups() for address in
    wordList]

    On a side note, some of the email addresses in your list don't work.
    You should use

    wordList = ['', '',
    '']
    Devan L, Aug 10, 2005
    #2
    1. Advertising

  3. Harlin Seritt wrote:

    > I am trying to find some matches and have them put into a list when
    > processing is done. I'll use a simple example like email addresses.
    >
    > My input is the following:
    > wordList = ['myname1', '', '',
    > 'myname4@domain', '']
    >
    > My regular expression would be something like '\w\@\w\.\w' (I realize
    > it could and should be more detailed but that's not the point for now).
    >
    > I would like to find out how to output the matches for this expression
    > of my 'wordList' into a neat list variable. How do I get this done?


    that's more of a list manipulation question than a regular expression
    question, of course. to apply a regular expression to all items in a
    list, apply it to all items in a list. a list comprehension is the shortest
    way to do this:

    >>> out = [word for word in wordList if re.match("\w+@\w+\.\w+", word)]
    >>> out

    ['', '', '']

    </F>
    Fredrik Lundh, Aug 10, 2005
    #3
  4. Ahh that's it Frederik. That's what I was looking for. The regular
    expression problems I will take care of, but first wanted to walk
    before running. ;)

    Thanks,

    Harlin Seritt
    Harlin Seritt, Aug 10, 2005
    #4
  5. Forgive another question here, but what is the 'r' for when used with
    expression: r'\w+...' ?
    Harlin Seritt, Aug 10, 2005
    #5
  6. Harlin Seritt wrote:

    > Forgive another question here, but what is the 'r' for when used with
    > expression: r'\w+...' ?


    r'..' or r".." are "raw strings" where backslashes do not introduce an
    escape sequence - so you don't have to write '\\', if you need a backslash
    in the string, e.g. r'\w+' == '\\w+'.
    Useful for regular expression (because the re module parses the '\X'
    sequences itself) or Windows pathes (e.g. r'C:\newfile.txt').

    And you should append a '$' to the regular expression, because
    r"\w+@\w+\.\w+" would match '-+*junk', too.

    --
    Benjamin Niemann
    Email: pink at odahoda dot de
    WWW: http://www.odahoda.de/
    Benjamin Niemann, Aug 10, 2005
    #6
  7. Harlin Seritt

    Paul McGuire Guest

    If your re demands get more complicated, you could take a look at
    pyparsing. The code is a bit more verbose, but many find it easier to
    compose their expressions using pyparsing's classes, such as Literal,
    OneOrMore, Optional, etc., plus a number of built-in helper functions
    and expressions, including delimitedList, quotedString, and
    cStyleComment. Pyparsing is intended for writing recursive-descent
    parsers, but can also be used (and is best learned) with simple
    applications such as this one.

    Here is a simple script for parsing your e-mail addresses. Note the
    use of results names to give you access to the individual parsed fields
    (re's also support a similar capability).

    Download pyparsing at http://pyparsing.sourceforge.net.

    -- Paul

    from pyparsing import Literal,Word,Optional,\
    delimitedList,alphanums

    # define format of an email address
    AT = Literal("@").suppress()
    emailWord = Word(alphanums+"_")
    emailDomain = delimitedList( emailWord, ".", combine=True)
    emailAddress = emailWord.setResultsName("user") + \
    Optional( AT + emailDomain ).setResultsName("host")

    # parse each word in wordList
    wordList = ['myname1', '', '',
    'myname4@domain', '']

    for w in wordList:
    addr = emailAddress.parseString(w)
    print w
    print addr
    print "user:", addr.user
    print "host:", addr.host
    print

    Will print out:
    myname1
    ['myname1']
    user: myname1
    host:


    ['myname1', 'domain.tld']
    user: myname1
    host: domain.tld


    ['myname2', 'domain.tld']
    user: myname2
    host: domain.tld

    myname4@domain
    ['myname4', 'domain']
    user: myname4
    host: domain


    ['myname5', 'domain.tldx']
    user: myname5
    host: domain.tldx
    Paul McGuire, Aug 10, 2005
    #7
  8. Harlin Seritt

    Jeff Schwab Guest

    Harlin Seritt wrote:

    > I am trying to find some matches and have them put into a list when
    > processing is done. I'll use a simple example like email addresses.
    >
    > My input is the following:
    > wordList = ['myname1', '', '',
    > 'myname4@domain', '']
    >
    > My regular expression would be something like '\w\@\w\.\w' (I realize
    > it could and should be more detailed but that's not the point for now).


    FYI, matching all compliant email addresses is ridiculously complicated.
    Before you spend too much time on it, you might want to borrow the
    complete and thoroughly explained example in Regular Expressions (O'Reilly):

    http://www.oreilly.com/catalog/regex/
    Jeff Schwab, Aug 10, 2005
    #8
  9. Harlin Seritt

    Cappy2112 Guest

    Be careful with that book though, it's RE examples are Perl-centric and
    not exactly the same implementation that Python uses. However, it's a
    good place to start

    This will also be useful
    http://www.amk.ca/python/howto/regex/
    Cappy2112, Aug 10, 2005
    #9
  10. Paul McGuire wrote:
    > If your re demands get more complicated, you could take a look at
    > pyparsing. The code is a bit more verbose, but many find it easier to
    > compose their expressions using pyparsing's classes, such as Literal,
    > OneOrMore, Optional, etc., plus a number of built-in helper functions
    > and expressions, including delimitedList, quotedString, and
    > cStyleComment. Pyparsing is intended for writing recursive-descent
    > parsers, but can also be used (and is best learned) with simple
    > applications such as this one.


    As a slightly unrelated pyparsing question, is there a good set of API
    documentation around for pyparsing?

    I've looked into it for my mud client, but for now have gone with
    DParser because I need (desire) custom token generation sometimes.
    Pyparsing looks easier to internationalize, though.
    Christopher Subich, Aug 10, 2005
    #10
  11. Harlin Seritt

    Paul McGuire Guest

    Paul McGuire, Aug 10, 2005
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Chris R. Timmons

    Re: Help with regular expressions.

    Chris R. Timmons, Jul 4, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    401
    Chris R. Timmons
    Jul 4, 2003
  2. David Waz...

    Re: Help with regular expressions.

    David Waz..., Jul 4, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    386
    David Waz...
    Jul 4, 2003
  3. Stephajn Craig

    Regular Expressions....HELP!

    Stephajn Craig, Jul 16, 2003, in forum: ASP .Net
    Replies:
    1
    Views:
    3,036
    Cowboy \(Gregory A. Beamer\)
    Jul 16, 2003
  4. Jay Douglas
    Replies:
    0
    Views:
    600
    Jay Douglas
    Aug 15, 2003
  5. Noman Shapiro
    Replies:
    0
    Views:
    232
    Noman Shapiro
    Jul 17, 2013
Loading...

Share This Page