Replace all occurences of a char, except the first

Discussion in 'Perl Misc' started by Bart Van der Donck, Sep 20, 2008.

  1. Hello,

    I got intrigued by the following challenge which seems to be
    impossible.

    Is it possible to write a regular expression that replaces all
    occurences of a character except the first occurence ?

    a''bc'def'g' -> a'bcdefg
    '''ab'cd'efg -> 'abcdefg
    abc'd'e''f'g -> abc'defg
    etc.

    --
    Bart
    Bart Van der Donck, Sep 20, 2008
    #1
    1. Advertising

  2. Bart Van der Donck wrote:
    >
    > I got intrigued by the following challenge which seems to be
    > impossible.
    >
    > Is it possible to write a regular expression that replaces all
    > occurences of a character except the first occurence ?
    >
    > a''bc'def'g' -> a'bcdefg
    > '''ab'cd'efg -> 'abcdefg
    > abc'd'e''f'g -> abc'defg
    > etc.


    $ perl -e'
    my @x = ( "a##bc#def#g#", "###ab#cd#efg", "abc#d#e##f#g" );
    for ( @x ) {
    print;
    my $count;
    s/(#)/ $count++ ? "" : $1 /eg;
    print " -> $_\n";
    }
    '
    a##bc#def#g# -> a#bcdefg
    ###ab#cd#efg -> #abcdefg
    abc#d#e##f#g -> abc#defg




    John
    --
    Perl isn't a toolbox, but a small machine shop where you
    can special-order certain sorts of tools at low cost and
    in short order. -- Larry Wall
    John W. Krahn, Sep 20, 2008
    #2
    1. Advertising

  3. Bart Van der Donck

    Tim Greer Guest

    Bart Van der Donck wrote:

    > Hello,
    >
    > I got intrigued by the following challenge which seems to be
    > impossible.
    >
    > Is it possible to write a regular expression that replaces all
    > occurences of a character except the first occurence ?
    >
    > a''bc'def'g' -> a'bcdefg
    > '''ab'cd'efg -> 'abcdefg
    > abc'd'e''f'g -> abc'defg
    > etc.
    >
    > --
    > Bart


    I assume you mean that it'll have to locate any character that's
    repeated (not a predetermined one) and only remove those additional
    occurrences, and only use a regular expression?

    I'd use a different method to accomplish the task, but it is indeed
    interesting to try and do this only with a regex.

    The following should work and only remove the second or higher instance
    of any character (so you a string with "'''sfsf'et'st464y'''" will
    result in "'sfet46y". Pretty cool, eh?

    Here's the regex logic:

    $string =~ s|(.)| ($` =~ m/$1/) ? '' : $1 |eg;

    Here's an example based on your strings, using only a regular
    expression:

    script.pl:

    !/usr/bin/perl
    use warnings;
    use strict;

    my $linea = "a''bc'def'g'";
    my $lineb = "'''ab'cd'efg";
    my $linec = "abc'd'e''f'g";

    print "$linea -> ";
    $linea =~ s|(.)| ($` =~ m/$1/) ? '' : $1 |eg;
    print "$linea\n";

    print "$lineb -> ";
    $lineb =~ s|(.)| ($` =~ m/$1/) ? '' : $1 |eg;
    print "$lineb\n";

    print "$linec -> ";
    $linec =~ s|(.)| ($` =~ m/$1/) ? '' : $1 |eg;
    print "$linec\n";


    The output:
    ~]$ ./script.pl
    a''bc'def'g' -> a'bcdefg
    '''ab'cd'efg -> 'abcdefg
    abc'd'e''f'g -> abc'defg

    Is that what you were trying to do?
    --
    Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
    Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
    and Custom Hosting. 24/7 support, 30 day guarantee, secure servers.
    Industry's most experienced staff! -- Web Hosting With Muscle!
    Tim Greer, Sep 20, 2008
    #3
  4. Bart Van der Donck <> wrote:

    > I got intrigued by the following challenge which seems to be
    > impossible.
    >
    > Is it possible to write a regular expression that replaces all
    > occurences of a character except the first occurence ?



    No, it is not possible to replace anything with only a regular expression.

    A regular expression either "matches" or "does not match", it cannot "replace".

    An operator that uses a regular expression as on of its operands,
    such as s/// can probably do that without much trouble though.


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
    Tad J McClellan, Sep 20, 2008
    #4
  5. Bart Van der Donck

    Ben Morrow Guest

    Quoth Tim Greer <>:
    > Bart Van der Donck wrote:
    >
    > > Is it possible to write a regular expression that replaces all
    > > occurences of a character except the first occurence ?
    > >
    > > a''bc'def'g' -> a'bcdefg
    > > '''ab'cd'efg -> 'abcdefg
    > > abc'd'e''f'g -> abc'defg
    > > etc.

    >
    > I assume you mean that it'll have to locate any character that's
    > repeated (not a predetermined one) and only remove those additional
    > occurrences, and only use a regular expression?
    >
    > I'd use a different method to accomplish the task, but it is indeed
    > interesting to try and do this only with a regex.
    >
    > The following should work and only remove the second or higher instance
    > of any character (so you a string with "'''sfsf'et'st464y'''" will
    > result in "'sfet46y". Pretty cool, eh?
    >
    > Here's the regex logic:
    >
    > $string =~ s|(.)| ($` =~ m/$1/) ? '' : $1 |eg;


    Without using /e:

    ~% perl -le'$_ = "abccbdcdc"; 1 while s/(.)(.*)\1/$1$2/g; print'
    abcd

    Using 5.10's \K we can remove the replacement part:

    ~% perl5.10.0 -le'$_="abccbdcdc"; 1 while s/(.).*\K\1//g; print'
    abcd

    and if we reverse the string before and after (so we can use look*ahead*
    instead, which can be variable-length) we can remove the while loop:

    ~% perl -le'$_ = reverse "abccbdcdc"; s/(.)(?=.*\1)//g;
    print scalar reverse'
    abcd

    Ben

    --
    Although few may originate a policy, we are all able to judge it.
    Pericles of Athens, c.430 B.C.
    Ben Morrow, Sep 20, 2008
    #5
  6. Bart Van der Donck wrote:
    >
    > I got intrigued by the following challenge which seems to be
    > impossible.
    >
    > Is it possible to write a regular expression that replaces all
    > occurences of a character except the first occurence ?
    >
    > a''bc'def'g' -> a'bcdefg
    > '''ab'cd'efg -> 'abcdefg
    > abc'd'e''f'g -> abc'defg
    > etc.


    $ perl -e'
    my @x = ( "a##bc#def#g#", "###ab#cd#efg", "abc#d#e##f#g" );
    for ( @x ) {
    print;
    /#/ && substr( $_, $+[0] ) =~ tr/#//d;
    print " -> $_\n";
    }
    '
    a##bc#def#g# -> a#bcdefg
    ###ab#cd#efg -> #abcdefg
    abc#d#e##f#g -> abc#defg



    John
    --
    Perl isn't a toolbox, but a small machine shop where you
    can special-order certain sorts of tools at low cost and
    in short order. -- Larry Wall
    John W. Krahn, Sep 20, 2008
    #6
  7. Bart Van der Donck

    Tim Greer Guest

    Ben Morrow wrote:

    >
    > Quoth Tim Greer <>:
    >> Bart Van der Donck wrote:
    >>
    >> > Is it possible to write a regular expression that replaces all
    >> > occurences of a character except the first occurence ?
    >> >
    >> > a''bc'def'g' -> a'bcdefg
    >> > '''ab'cd'efg -> 'abcdefg
    >> > abc'd'e''f'g -> abc'defg
    >> > etc.

    >>
    >> I assume you mean that it'll have to locate any character that's
    >> repeated (not a predetermined one) and only remove those additional
    >> occurrences, and only use a regular expression?
    >>
    >> I'd use a different method to accomplish the task, but it is indeed
    >> interesting to try and do this only with a regex.
    >>
    >> The following should work and only remove the second or higher
    >> instance of any character (so you a string with
    >> "'''sfsf'et'st464y'''" will
    >> result in "'sfet46y". Pretty cool, eh?
    >>
    >> Here's the regex logic:
    >>
    >> $string =~ s|(.)| ($` =~ m/$1/) ? '' : $1 |eg;

    >
    > Without using /e:
    >
    > ~% perl -le'$_ = "abccbdcdc"; 1 while s/(.)(.*)\1/$1$2/g; print'
    > abcd
    >
    > Using 5.10's \K we can remove the replacement part:
    >
    > ~% perl5.10.0 -le'$_="abccbdcdc"; 1 while s/(.).*\K\1//g; print'
    > abcd
    >
    > and if we reverse the string before and after (so we can use
    > look*ahead* instead, which can be variable-length) we can remove the
    > while loop:
    >
    > ~% perl -le'$_ = reverse "abccbdcdc"; s/(.)(?=.*\1)//g;
    > print scalar reverse'
    > abcd
    >
    > Ben
    >


    This is why I enjoy Perl. There's usually several ways of doing it. I
    was purposely trying to be quirky, but in all seriousness for the views
    of this thread, Ben's suggestion above is better (more efficient than
    using my solution with $` -- it was all in good fun though). I wasn't
    familiar with \K (I'm still in 5.8.x), so that's pretty cool.
    --
    Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
    Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
    and Custom Hosting. 24/7 support, 30 day guarantee, secure servers.
    Industry's most experienced staff! -- Web Hosting With Muscle!
    Tim Greer, Sep 20, 2008
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John Salerno
    Replies:
    20
    Views:
    808
    John Salerno
    Aug 11, 2006
  2. lovecreatesbeauty
    Replies:
    1
    Views:
    991
    Ian Collins
    May 9, 2006
  3. Christoph Krammer

    re.sub does not replace all occurences

    Christoph Krammer, Aug 7, 2007, in forum: Python
    Replies:
    3
    Views:
    536
    Christoph Krammer
    Aug 7, 2007
  4. Fabio Z Tessitore

    who is simpler? try/except/else or try/except

    Fabio Z Tessitore, Aug 12, 2007, in forum: Python
    Replies:
    5
    Views:
    354
  5. mnml
    Replies:
    5
    Views:
    482
    Roedy Green
    Mar 14, 2008
Loading...

Share This Page