Truncating text from a string with beginning text from another string

Discussion in 'Perl Misc' started by Mark, Mar 23, 2007.

  1. Mark

    Mark Guest

    >From a line of arbitrary text, possibly followed by some amount of
    text from the beginning of the string ' Reference #\d+', where \d+
    represents one or more digit characters, I want to output the line
    without the ending ' Reference...' string. For example, the input line
    'some arbitrary text Refer' would become 'some arbitrary text'.

    Here are two programs that seem to do what I want, but they seem
    overly complicated for this task. I'm looking for a simpler solution,
    possibly by using a better regular expression than I have chosen in my
    first sample code.

    First sample:
    use strict ;
    use warnings ;

    my $re = qr'^(.*)\ ( (R$)|
    (Re$)|
    (Ref$)|
    (Refe$)|
    (Refer$)|
    (Refere$)|
    (Referenc$)|
    (Reference\ {0,1}$)|
    (Reference\ \#\d{0,}$)
    )'x ;

    while(<DATA>) {
    chomp ;
    print "in : >$_<\n" ;
    if (my($result) = /$re/g) {
    print "out: >$result<\n" ;
    }
    else {
    print "out: >$_<\n" ;
    }
    }

    __DATA__
    Refer
    One Referenc
    two three Reference
    xx yy Reference Reference
    def Refere Reference #xx
    abc the def Refere Reference #
    abc the def Refere Reference #12


    Second sample:
    use strict ;
    use warnings ;

    my $PATTERN = 'Reference #000000' ;

    my $pos ;
    while (<DATA>) {
    chomp ;
    $pos = -1 ;
    while ((my $ind = index($_,' R',$pos)) != -1) {
    $pos = $ind + 1 ;
    }
    print "in : >$_<\n" ;
    my $result = $_ ;

    if ($pos > 0) {
    my $re = substr($_,$pos) ;
    $re =~ s/\d+$/\\d+/ ;
    $re = qr/^$re/ ;
    if ($PATTERN =~ /$re/) {
    $result = substr($_,0,$pos-1) ;
    }
    }
    print "out: >$result<\n" ;
    }

    __DATA__
    Refer
    One Referenc
    two three Reference
    xx yy Reference Reference
    def Refere Reference #xx
    abc the def Refere Reference #
    abc the def Refere Reference #12
     
    Mark, Mar 23, 2007
    #1
    1. Advertising

  2. Mark

    Mirco Wahab Guest

    Re: Truncating text from a string with beginning text from anotherstring

    Mark wrote:
    > Here are two programs that seem to do what I want, but they seem
    > overly complicated for this task. I'm looking for a simpler solution,
    > possibly by using a better regular expression than I have chosen in my
    > first sample code.
    > First sample:
    > [...]
    > Second sample:
    > [...]


    I don't really know what all this
    should give, but whay wouldn't
    a simple:

    while(<DATA>) {
    chomp && print "$1 ==> from [$_]\n" if /(.+?)Refer/
    }


    do all you want? In your explanations you
    mentioned you'd truncate all subsequent
    occurencies of 'refer' 'reference' and all
    following stuff.

    Regards

    M.
     
    Mirco Wahab, Mar 23, 2007
    #2
    1. Advertising

  3. On Mar 23, 5:44 pm, "Mark" <> wrote:

    [ An interesting problem ]

    > I'm looking for a simpler solution,
    > possibly by using a better regular expression than I have chosen in my
    > first sample code.


    Wow! What a brilliant post. Clear, well thought out, interesting.

    Just wish I had an answer. I'll think about that one tonight. I'll
    probably be up all night thinking about it!
     
    Brian McCauley, Mar 23, 2007
    #3
  4. On Mar 23, 5:44 pm, "Mark" <> wrote:
    > use strict ;
    > use warnings ;
    >
    > my $re = qr'^(.*)\ ( (R$)|
    > (Re$)|
    > (Ref$)|
    > (Refe$)|
    > (Refer$)|
    > (Refere$)|
    > (Referenc$)|
    > (Reference\ {0,1}$)|
    > (Reference\ \#\d{0,}$)
    > )'x ;
    >
    > while(<DATA>) {
    > chomp ;
    > print "in : >$_<\n" ;
    > if (my($result) = /$re/g) {
    > print "out: >$result<\n" ;
    > }
    > else {
    > print "out: >$_<\n" ;
    > }
    >
    > }


    Just being picky but...

    As far as I can see the /g in the match does nothing useful.

    Nor to most of the (...) in the regex.

    {0,1} and {0,} in regex are so commonly used that they have one-
    character short hands: ? and * respectively.

    BTW are you perhaps trying to implement something like File::Stream?
     
    Brian McCauley, Mar 23, 2007
    #4
  5. Mark

    Guest

    On Mar 23, 10:44 am, "Mark" <> wrote:

    > my $re = qr'^(.*)\ ( (R$)|
    > (Re$)|
    > (Ref$)|
    > (Refe$)|
    > (Refer$)|
    > (Refere$)|
    > (Referenc$)|
    > (Reference\ {0,1}$)|
    > (Reference\ \#\d{0,}$)
    > )'x ;


    Try this instead; results are identical to your regex except what
    happens to $2, which you don't use anyway (and you could avoid setting
    $2, but extra complexity for no real gain):

    $re = qr{^(.*) Re?f?e?r?e?n?c?e? ?(\#\d*)$}x;


    --
    The best way to get a good answer is to ask a good question.
    David Filmer (http://DavidFilmer.com)
     
    , Mar 23, 2007
    #5
  6. Brian McCauley, Mar 23, 2007
    #6
  7. Mark

    Guest

    On Mar 23, 11:44 am, wrote:
    > $re = qr{^(.*) Re?f?e?r?e?n?c?e? ?(\#\d*)$}x;


    Then again, it would be possible to "fool" this regex where your
    original would not be fooled (for example, by dropping a middle
    character). Needs more thought....

    --
    The best way to get a good answer is to ask a good question.
    David Filmer (http://DavidFilmer.com)
     
    , Mar 23, 2007
    #7
  8. On Mar 23, 6:44 pm, wrote:
    > On Mar 23, 10:44 am, "Mark" <> wrote:
    >
    > > my $re = qr'^(.*)\ ( (R$)|
    > > (Re$)|
    > > (Ref$)|
    > > (Refe$)|
    > > (Refer$)|
    > > (Refere$)|
    > > (Referenc$)|
    > > (Reference\ {0,1}$)|
    > > (Reference\ \#\d{0,}$)
    > > )'x ;

    >
    > Try this instead; results are identical to your regex except what
    > happens to $2, which you don't use anyway (and you could avoid setting
    > $2, but extra complexity for no real gain):
    >
    > $re = qr{^(.*) Re?f?e?r?e?n?c?e? ?(\#\d*)$}x;


    No, that matches "Rernc 10" etc too.
     
    Brian McCauley, Mar 23, 2007
    #8
  9. Mark

    -berlin.de Guest

    Brian McCauley <> wrote in comp.lang.perl.misc:
    > On Mar 23, 5:44 pm, "Mark" <> wrote:
    >
    > [ An interesting problem ]
    >
    > > I'm looking for a simpler solution,
    > > possibly by using a better regular expression than I have chosen in my
    > > first sample code.

    >
    > Wow! What a brilliant post. Clear, well thought out, interesting.


    ....plus runnable code, including a convincing set of test data.
    I quite agree.

    > Just wish I had an answer. I'll think about that one tonight. I'll
    > probably be up all night thinking about it!


    Ah, it won't take all night. Here is my take:

    {
    my $fix = ' Reference #';
    my $pat = "$fix\\d+";
    my @parts = map substr( $fix, 0, $_), 1 .. length $fix;

    sub rem_ref {
    my $str = shift;
    $str =~ s/$pat$// and return $str;
    $str =~ s/$_$// and return $str for @parts;
    return $str;
    }
    }

    while ( <DATA> ) {
    chomp;
    print "in : >$_<\n";
    print "out: >", rem_ref( $_), "<\n";
    }

    Anno
     
    -berlin.de, Mar 23, 2007
    #9
  10. Mark

    -berlin.de Guest

    <> wrote in comp.lang.perl.misc:
    > On Mar 23, 10:44 am, "Mark" <> wrote:
    >
    > > my $re = qr'^(.*)\ ( (R$)|
    > > (Re$)|
    > > (Ref$)|
    > > (Refe$)|
    > > (Refer$)|
    > > (Refere$)|
    > > (Referenc$)|
    > > (Reference\ {0,1}$)|
    > > (Reference\ \#\d{0,}$)
    > > )'x ;

    >
    > Try this instead; results are identical to your regex except what
    > happens to $2, which you don't use anyway (and you could avoid setting
    > $2, but extra complexity for no real gain):
    >
    > $re = qr{^(.*) Re?f?e?r?e?n?c?e? ?(\#\d*)$}x;


    No, that would also match things like "gaga Refe #12".

    Anno
     
    -berlin.de, Mar 23, 2007
    #10
  11. Mark

    Mirco Wahab Guest

    Re: Truncating text from a string with beginning text from anotherstring

    Mark wrote:
    > From a line of arbitrary text, possibly followed by some amount of
    > text from the beginning of the string ' Reference #\d+', where \d+
    > represents one or more digit characters, I want to output the line
    > without the ending ' Reference...' string. For example, the input line
    > 'some arbitrary text Refer' would become 'some arbitrary text'.
    >
    > Here are two programs that seem to do what I want, but they seem
    > overly complicated for this task. I'm looking for a simpler solution,
    > possibly by using a better regular expression than I have chosen in my
    > first sample code.


    After making the wrong turn first,
    I think this can't be solved very
    much different from your solution.

    The Regex can be an incremental one
    (as was shown already by others) or a
    sequence of alternations (as you tried).

    One could rewrite it somehow 'different',
    as a "split", like:

    use strict;
    use warnings;
    no warnings 'qw';

    my @end = qw{R e f e r e n c e \\s # \\d+};
    my $reg = '('.(join '|',map join('',@$_),map[@end[0..$_]],0..$#end).')$';

    while( <DATA> ) {
    chomp;
    print "[$_->[0]]\n\t[$_->[1]]\n" for
    map [$_->[0]||'undef', $_->[1]||'undef'],
    [split /$reg/]
    }

    __DATA__
    ....

    Aside from the regex construction (which can be commented
    properly ;-), this should be quite readable.


    Regards

    M.
     
    Mirco Wahab, Mar 23, 2007
    #11
  12. Mark

    Mirco Wahab Guest

    Re: Truncating text from a string with beginning text from anotherstring

    Mark wrote:
    > Here are two programs that seem to do what I want, but they seem
    > overly complicated for this task. I'm looking for a simpler solution,
    > possibly by using a better regular expression than I have chosen in my
    > first sample code.


    After making the wrong turn first,
    I think this can't be solved very
    much different from your solution.

    Of course, one can write it somehow 'different',like:

    ...
    my @end = split //, 'Reference #000000';
    my $key = '('.(join '|', map join('',,@$_), map[@end[0..$_]], 0..$#end).')';
    ...

    while(<DATA>) {
    print "$1\t\t$2\n"
    if /^(.+?)($key)$/
    }

    __DATA__
    ....

    Regards

    M.
     
    Mirco Wahab, Mar 23, 2007
    #12
  13. Mark

    Mirco Wahab Guest

    Re: Truncating text from a string with beginning text from anotherstring

    Mirco Wahab wrote:
    > One could rewrite it somehow 'different',
    > as a "split", like:
    >
    > use strict;
    > use warnings;
    > ...
    > [split /$reg/]
    > ...


    ....
    reg and output slightly modified to match yours:


    ...
    no warnings 'qw';

    my @end = qw{R e f e r e n c e \\s # \\d+};
    my $reg = '\s+('.(join '|',map join('',@$_),map[@end[0..$_]],0..$#end).')$';

    while( <DATA> ) {
    chomp;
    print "in : >$_<\n";
    print "out: >", (split /$reg/)[0], "<\n"
    }
    ...

    Regards

    M.
     
    Mirco Wahab, Mar 23, 2007
    #13
  14. <-berlin.de> wrote:
    > <> wrote in comp.lang.perl.misc:
    >> On Mar 23, 10:44 am, "Mark" <> wrote:
    >>
    >> > my $re = qr'^(.*)\ ( (R$)|
    >> > (Re$)|
    >> > (Ref$)|
    >> > (Refe$)|
    >> > (Refer$)|
    >> > (Refere$)|
    >> > (Referenc$)|
    >> > (Reference\ {0,1}$)|
    >> > (Reference\ \#\d{0,}$)
    >> > )'x ;

    >>
    >> Try this instead; results are identical to your regex except what
    >> happens to $2, which you don't use anyway (and you could avoid setting
    >> $2, but extra complexity for no real gain):
    >>
    >> $re = qr{^(.*) Re?f?e?r?e?n?c?e? ?(\#\d*)$}x;

    >
    >No, that would also match things like "gaga Refe #12".


    You could write something like this

    $re = qr{^(.*)\ (R(?:e(?:f(?:e(?:r(?:e(?:n(?:c(?:e(?:\ (?:\#\d*)
    ?)?)?)?)?)?)?)?)?))$}x;

    but that's not clear at all to the human reader, and I don't think
    adding more whitespace would help much in this case.

    Depending on your needs, it might be more clear to use a simpler regex like
    $re = qr{^(.*) ((R[a-z #]+) \d*)$};

    and then test ($3 eq substr('Reference #', 0, length $3))

    Gary Ansok
    --
    3M suggests that to obtain the best results, one should make the bond
    "while the adhesive is wet, aggressively tacky." I did not know what
    "aggressively tacky" meant until I saw a recent notice in the Bboard.
     
    Gary E. Ansok, Mar 23, 2007
    #14
  15. Mark

    Mirco Wahab Guest

    Re: Truncating text from a string with beginning text from anotherstring

    Mirco Wahab wrote:
    > ...
    > [split /$reg/]
    > ...



    regex/output simplified and slightly modified
    to match yours:

    ...
    no warnings 'qw';
    my @end = qw{R e f e r e n c e \\s # \\d+};

    my $reg = '\s+('.(join'|',map join('',@end[0..$_]),0..$#end).')$';

    while( <DATA> ) {
    chomp;
    print "in : >$_<\n";
    print "out: >", (split /$reg/)[0], "<\n"
    }
    ...

    Regards

    M.
     
    Mirco Wahab, Mar 23, 2007
    #15
  16. Mark

    Mark Guest

    On Mar 23, 10:44 am, "Mark" <> wrote:
    > >From a line of arbitrary text, possibly followed by some amount of

    >
    > text from the beginning of the string ' Reference #\d+', where \d+
    > represents one or more digit characters, I want to output the line
    > without the ending ' Reference...' string. For example, the input line
    > 'some arbitrary text Refer' would become 'some arbitrary text'.
    >


    Thanks to all who responded and offered ideas. Anno's post was
    especially interesting.

    - M
     
    Mark, Mar 23, 2007
    #16
  17. Mark

    -berlin.de Guest

    Mark <> wrote in comp.lang.perl.misc:
    > On Mar 23, 10:44 am, "Mark" <> wrote:
    > > >From a line of arbitrary text, possibly followed by some amount of

    > >
    > > text from the beginning of the string ' Reference #\d+', where \d+
    > > represents one or more digit characters, I want to output the line
    > > without the ending ' Reference...' string. For example, the input line
    > > 'some arbitrary text Refer' would become 'some arbitrary text'.
    > >

    >
    > Thanks to all who responded and offered ideas. Anno's post was
    > especially interesting.


    Thanks. Since you mention it, the sub definition can be slightly
    simplified:

    {
    my $fix = ' Reference #';
    my @parts = map substr( $fix, 0, $_), 1 .. length $fix;

    sub rem_ref {
    my $str = shift;
    $str =~ s/$_$// and return $str for @parts, "$fix\\d+";
    return $str;
    }
    }

    Anno
     
    -berlin.de, Mar 23, 2007
    #17
  18. Mark

    Mirco Wahab Guest

    Re: Truncating text from a string with beginning text from anotherstring

    Michele Dondi wrote:
    > On Fri, 23 Mar 2007 20:23:44 +0100, Mirco Wahab <>
    > wrote:
    >
    >> my @end = split //, 'Reference #000000';
    >> my $key = '('.(join '|', map join('',,@$_), map[@end[0..$_]], 0..$#end).')';

    >
    > Isn't that an awkward way to reimplement substr()?


    First this - and the whole approach shown above also
    will not work (to solve to said problem). I tried to
    cancel the message (and post a working solution) after
    thinking again - but your news server didn't honor my
    cancel attempts. This way, all came to the light ...

    Regards

    Mirco
     
    Mirco Wahab, Mar 25, 2007
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jitesh Sinha
    Replies:
    1
    Views:
    632
    Munsifali Rashid
    Dec 5, 2003
  2. Alex  Leach
    Replies:
    5
    Views:
    334
    Michael DOUBEZ
    Feb 7, 2007
  3. Jesse B.
    Replies:
    9
    Views:
    237
    Jesse B.
    Mar 27, 2010
  4. Geoff Soper
    Replies:
    2
    Views:
    150
    Lasse Reichstein Nielsen
    Jan 26, 2004
  5. Jeremy
    Replies:
    3
    Views:
    92
    Dr John Stockton
    Jul 26, 2004
Loading...

Share This Page