Re: how to right the regular expression ?

Discussion in 'Python' started by MRAB, Feb 14, 2013.

  1. MRAB

    MRAB Guest

    On 2013-02-14 14:13, python wrote:
    > my tv.txt is :
    > http://202.177.192.119/radio5 香港电å°ç¬¬äº”å°(å¯äºŽTotem/VLC/MPlayer播放)
    > http://202.177.192.119/radio35 香港电å°ç¬¬äº”å°(DAB版,å¯äºŽTotem/VLC/MPlayer播放)
    > http://202.177.192.119/radiopth 香港电å°æ™®é€šè¯å°(å¯äºŽTotem/VLC/MPlayer播放)
    > http://202.177.192.119/radio31 香港电å°æ™®é€šè¯å°(DAB版,å¯äºŽTotem/VLC/MPlayer播放)
    > octoshape:rthk.ch1 香港电å°ç¬¬ä¸€å°(粤)
    > octoshape:rthk.ch2 香港电å°ç¬¬äºŒå°(粤)
    > octoshape:rthk.ch6 香港电å°æ™®é€šè¯å°
    > octoshape:rthk.ch3 香港电å°ç¬¬ä¸‰å°(英)
    >
    > what i want to get the result is
    > 1group is http://202.177.192.119/radio5 2group is 香港电å°ç¬¬äº”å° 3group is (å¯äºŽTotem/VLC/MPlayer播放)
    > 1group is http://202.177.192.119/radio35 2group is 香港电å°ç¬¬äº”å° 3group is (DAB版,å¯äºŽTotem/VLC/MPlayer播放)
    > 1group is http://202.177.192.119/radiopth 2group is 香港电å°æ™®é€šè¯å° 3group is (å¯äºŽTotem/VLC/MPlayer播放)
    > 1group is http://202.177.192.119/radio31 2group is 香港电å°æ™®é€šè¯å° 3group is (DAB版,å¯äºŽTotem/VLC/MPlayer播放)
    > 1group is octoshape:rthk.ch1 2group is 香港电å°ç¬¬ä¸€å° 3group is (粤)
    > 1group is octoshape:rthk.ch2 2group is 香港电å°ç¬¬äºŒå° 3group is (粤)
    > 1group is octoshape:rthk.ch6 2group is 香港电å°æ™®é€šè¯å° 3group is none
    > 1group is octoshape:rthk.ch3 2group is 香港电å°ç¬¬ä¸‰å° 3group is (英)
    >
    > here is my code:
    > # -*- coding: utf-8 -*-
    > import re
    > rfile=open("tv.txt","r")
    > pat='([a-z].+?\s)(.+)(\(.+\))'
    > for line in rfile.readlines():
    > Match=re.match(pat,line)
    > print "1group is ",Match.group(1),"2group is
    > ",Match.group(2),"3group is ",Match.group(3)
    > rfile.close()
    >
    > the output is :
    > 1group is http://202.177.192.119/radio5 2group is 香港电å°ç¬¬äº”å°
    > 3group is (å¯äºŽTotem/VLC/MPlayer播放)
    > 1group is http://202.177.192.119/radio35 2group is 香港电å°ç¬¬äº”å°
    > 3group is (DAB版,å¯äºŽTotem/VLC/MPlayer播放)
    > 1group is http://202.177.192.119/radiopth 2group is 香港电å°æ™®é€šè¯å°
    > 3group is (å¯äºŽTotem/VLC/MPlayer播放)
    > 1group is http://202.177.192.119/radio31 2group is 香港电å°æ™®é€šè¯å°
    > 3group is (DAB版,å¯äºŽTotem/VLC/MPlayer播放)
    > 1group is octoshape:rthk.ch1 2group is 香港电å°ç¬¬ä¸€å° 3group is (粤)
    > 1group is octoshape:rthk.ch2 2group is 香港电å°ç¬¬äºŒå° 3group is (粤)
    > 1group is
    > Traceback (most recent call last):
    > File "tv.py", line 7, in <module>
    > print "1group is ",Match.group(1),"2group is ",Match.group(2),"3group is ",Match.group(3)
    > AttributeError: 'NoneType' object has no attribute 'group'
    >
    > how to revise my code to get the output?
    >

    The problem is that the regex makes '(\(.+\))' mandatory, but example 7
    doesn't match it.

    You can make it optional by wrapping it in a non-capturing group
    (?:...), like this:

    pat = r'([a-z].+?\s)(.+)(?:(\(.+\)))?'

    Also, it's highly recommended that you use raw string literals
    (r'...') when writing regex patterns and replacements.
    MRAB, Feb 14, 2013
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Liang
    Replies:
    2
    Views:
    1,651
  2. VSK
    Replies:
    2
    Views:
    2,268
  3. =?iso-8859-1?B?bW9vcJk=?=

    Matching abitrary expression in a regular expression

    =?iso-8859-1?B?bW9vcJk=?=, Dec 1, 2005, in forum: Java
    Replies:
    8
    Views:
    829
    Alan Moore
    Dec 2, 2005
  4. Replies:
    13
    Views:
    194
    robic0
    Aug 27, 2006
  5. MRAB
    Replies:
    0
    Views:
    93
Loading...

Share This Page