find and replace with regular expressions

Discussion in 'Python' started by chrispoliquin@gmail.com, Jul 31, 2008.

  1. Guest

    I am using regular expressions to search a string (always full
    sentences, maybe more than one sentence) for common abbreviations and
    remove the periods. I need to break the string into different
    sentences but split('.') doesn't solve the whole problem because of
    possible periods in the middle of a sentence.

    So I have...

    ----------------

    import re

    middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')

    # this will find abbreviations like e.g. or i.e. in the middle of a
    sentence.
    # then I want to remove the periods.

    ----------------

    I want to keep the ie or eg but just take out the periods. Any
    ideas? Of course newString = middle_abbr.sub('',txt) where txt is the
    string will take out the entire abbreviation with the alphanumeric
    characters included.
    , Jul 31, 2008
    #1
    1. Advertising

  2. Mensanator Guest

    On Jul 31, 3:07 pm, wrote:
    > I am using regular expressions to search a string (always full
    > sentences, maybe more than one sentence) for common abbreviations and
    > remove the periods.  I need to break the string into different
    > sentences but split('.') doesn't solve the whole problem because of
    > possible periods in the middle of a sentence.
    >
    > So I have...
    >
    > ----------------
    >
    > import re
    >
    > middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
    >
    > # this will find abbreviations like e.g. or i.e. in the middle of a
    > sentence.
    > # then I want to remove the periods.
    >
    > ----------------
    >
    > I want to keep the ie or eg but just take out the periods.  Any
    > ideas?  Of course newString = middle_abbr.sub('',txt) where txt is the
    > string will take out the entire abbreviation with the alphanumeric
    > characters included.


    >>> middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
    >>> s = 'A test, i.e., an example.'
    >>> a = middle_abbr.search(s) # find the abbreviation
    >>> b = re.compile('\.') # period pattern
    >>> c = b.sub('',a.group(0)) # remove periods from abbreviation
    >>> d = middle_abbr.sub(c,s) # substitute new abbr for old
    >>> d

    'A test, ie, an example.'
    Mensanator, Jul 31, 2008
    #2
    1. Advertising

  3. Mensanator Guest

    On Jul 31, 3:56 pm, Mensanator <> wrote:
    > On Jul 31, 3:07 pm, wrote:
    >
    >
    >
    >
    >
    > > I am using regular expressions to search a string (always full
    > > sentences, maybe more than one sentence) for common abbreviations and
    > > remove the periods.  I need to break the string into different
    > > sentences but split('.') doesn't solve the whole problem because of
    > > possible periods in the middle of a sentence.

    >
    > > So I have...

    >
    > > ----------------

    >
    > > import re

    >
    > > middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')

    >
    > > # this will find abbreviations like e.g. or i.e. in the middle of a
    > > sentence.
    > > # then I want to remove the periods.

    >
    > > ----------------

    >
    > > I want to keep the ie or eg but just take out the periods.  Any
    > > ideas?  Of course newString = middle_abbr.sub('',txt) where txt is the
    > > string will take out the entire abbreviation with the alphanumeric
    > > characters included.
    > >>> middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
    > >>> s = 'A test, i.e., an example.'
    > >>> a = middle_abbr.search(s)      # find the abbreviation
    > >>> b = re.compile('\.')           # period pattern
    > >>> c = b.sub('',a.group(0))       # remove periods from abbreviation
    > >>> d = middle_abbr.sub(c,s)       # substitute new abbr for old
    > >>> d

    >
    > 'A test, ie, an example.'



    A more versatile version:

    import re

    middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
    s = 'A test, i.e., an example.'
    a = middle_abbr.search(s) # find the abbreviation
    b = re.compile('\.') # period pattern
    c = b.sub('',a.group(0)) # remove periods from abbreviation
    d = middle_abbr.sub(c,s) # substitute new abbr for old

    print d
    print
    print

    s = """A test, i.e., an example.
    Yet another test, i.e., example with 2 abbr."""

    a = middle_abbr.search(s) # find the abbreviation
    c = b.sub('',a.group(0)) # remove periods from abbreviation
    d = middle_abbr.sub(c,s) # substitute new abbr for old

    print d
    print
    print

    s = """A test, i.e., an example.
    Yet another test, i.e., example with 2 abbr.
    A multi-test, e.g., one with different abbr."""

    done = False

    while not done:
    a = middle_abbr.search(s) # find the abbreviation
    if a:
    c = b.sub('',a.group(0)) # remove periods from abbreviation
    s = middle_abbr.sub(c,s,1) # substitute new abbr for old ONCE
    else: # repeat until all removed
    done = True

    print s

    ## A test, ie, an example.
    ##
    ##
    ## A test, ie, an example.
    ## Yet another test, ie, example with 2 abbr.'
    ##
    ##
    ## A test, ie, an example.
    ## Yet another test, ie, example with 2 abbr.
    ## A multi-test, eg, one with different abbr.
    Mensanator, Jul 31, 2008
    #3
  4. Paul McGuire Guest

    On Jul 31, 3:07 pm, wrote:
    >
    > middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
    >


    When defining re's with string literals, it is good practice to use
    the raw string literal format (precede with an 'r'):
    middle_abbr = re.compile(r'[A-Za-z0-9]\.[A-Za-z0-9]\.')

    What abbreviations have numeric digits in them?

    I hope your input string doesn't include something like this:
    For a good approximation of pi, use 3.1.

    -- Paul
    Paul McGuire, Jul 31, 2008
    #4
  5. MRAB Guest

    On Jul 31, 9:07 pm, wrote:
    > I am using regular expressions to search a string (always full
    > sentences, maybe more than one sentence) for common abbreviations and
    > remove the periods.  I need to break the string into different
    > sentences but split('.') doesn't solve the whole problem because of
    > possible periods in the middle of a sentence.
    >
    > So I have...
    >
    > ----------------
    >
    > import re
    >
    > middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
    >
    > # this will find abbreviations like e.g. or i.e. in the middle of a
    > sentence.
    > # then I want to remove the periods.
    >
    > ----------------
    >
    > I want to keep the ie or eg but just take out the periods.  Any
    > ideas?  Of course newString = middle_abbr.sub('',txt) where txt is the
    > string will take out the entire abbreviation with the alphanumeric
    > characters included.


    It's recommended that you should use a raw strings for regular
    expressions.

    Capture the letters using parentheses:

    middle_abbr = re.compile(r'([A-Za-z0-9])\.([A-Za-z0-9])\.')

    and replace what was found with what was captured:

    newString = middle_abbr.sub(r'\1\2', txt)

    HTH
    MRAB, Aug 1, 2008
    #5
  6. dusans Guest

    On Jul 31, 10:07 pm, wrote:
    > I am using regular expressions to search a string (always full
    > sentences, maybe more than one sentence) for common abbreviations and
    > remove the periods.  I need to break the string into different
    > sentences but split('.') doesn't solve the whole problem because of
    > possible periods in the middle of a sentence.
    >
    > So I have...
    >
    > ----------------
    >
    > import re
    >
    > middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
    >
    > # this will find abbreviations like e.g. or i.e. in the middle of a
    > sentence.
    > # then I want to remove the periods.
    >
    > ----------------
    >
    > I want to keep the ie or eg but just take out the periods.  Any
    > ideas?  Of course newString = middle_abbr.sub('',txt) where txt is the
    > string will take out the entire abbreviation with the alphanumeric
    > characters included.


    Its impossible with regex. U could try it with a statistical analysis;
    and even this would give u a good split.
    dusans, Aug 1, 2008
    #6
  7. dusans Guest

    On Aug 1, 12:53 pm, dusans <> wrote:
    > On Jul 31, 10:07 pm, wrote:
    >
    >
    >
    >
    >
    > > I am using regular expressions to search a string (always full
    > > sentences, maybe more than one sentence) for common abbreviations and
    > > remove the periods.  I need to break the string into different
    > > sentences but split('.') doesn't solve the whole problem because of
    > > possible periods in the middle of a sentence.

    >
    > > So I have...

    >
    > > ----------------

    >
    > > import re

    >
    > > middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')

    >
    > > # this will find abbreviations like e.g. or i.e. in the middle of a
    > > sentence.
    > > # then I want to remove the periods.

    >
    > > ----------------

    >
    > > I want to keep the ie or eg but just take out the periods.  Any
    > > ideas?  Of course newString = middle_abbr.sub('',txt) where txt is the
    > > string will take out the entire abbreviation with the alphanumeric
    > > characters included.

    >
    > Its impossible with regex. U could try it with a statistical analysis;
    > and even this would give u a good split.


    "and even this wont* give u a good split." :p
    dusans, Aug 1, 2008
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jay Douglas
    Replies:
    0
    Views:
    600
    Jay Douglas
    Aug 15, 2003
  2. Mark
    Replies:
    0
    Views:
    300
  3. =?Utf-8?B?SGVtYW50?=

    Regular Expressions - Replace

    =?Utf-8?B?SGVtYW50?=, Dec 22, 2004, in forum: ASP .Net
    Replies:
    4
    Views:
    4,636
    =?Utf-8?B?SGVtYW50?=
    Dec 23, 2004
  4. JohnZing

    regular expressions replace

    JohnZing, Oct 21, 2005, in forum: ASP .Net
    Replies:
    3
    Views:
    476
    JohnZing
    Oct 21, 2005
  5. Noman Shapiro
    Replies:
    0
    Views:
    232
    Noman Shapiro
    Jul 17, 2013
Loading...

Share This Page