another help

Discussion in 'Perl Misc' started by giampiero, Sep 25, 2005.

  1. giampiero

    giampiero Guest

    i find three substring of length 2 (also repeated) followed after a
    while to a reverse sequences (also repeated)


    i use:
    $a=~s/(.{2,})+(.{2,})+(.{2,})+.*\3{1,}\2{1,}\1{1,}/$1 $2 $3/o;

    how to be sure in regular expression that length $1+$2+$3 must be more
    l?
    thanx a lot from deep of my soul
    giampiero, Sep 25, 2005
    #1
    1. Advertising

  2. giampiero

    Dr.Ruud Guest

    giampiero schreef:

    > i find three substring of length 2 (also repeated) followed after a
    > while to a reverse sequences (also repeated)


    Your message has a bad Subject. Keep posting in the same thread. No,
    google is no excuse not to do that.


    > i use:
    > $a =~ s/(.{2,})+(.{2,})+(.{2,})+.*\3{1,}\2{1,}\1{1,}/$1 $2 $3/o;


    The {2,} means two or more, is that what you want?
    The {1,} means 1 or more, so is the same as '+'.

    If you meant exactly 2:

    $a =~ s/(..)+(..)+(..)+.*(\3)+(\2)+(\1)+/\1 \2 \3/o;

    (untested)


    > how to be sure in regular expression that length $1+$2+$3 must be
    > more l?


    That will always be 3 * 2 = 6.

    --
    Affijn, Ruud

    "Gewoon is een tijger."
    Dr.Ruud, Sep 25, 2005
    #2
    1. Advertising

  3. giampiero

    Matt Garrish Guest

    "Dr.Ruud" <> wrote in message
    news:...
    > giampiero schreef:
    >
    >> i find three substring of length 2 (also repeated) followed after a
    >> while to a reverse sequences (also repeated)

    >
    > Your message has a bad Subject. Keep posting in the same thread. No,
    > google is no excuse not to do that.
    >
    >
    >> i use:
    >> $a =~ s/(.{2,})+(.{2,})+(.{2,})+.*\3{1,}\2{1,}\1{1,}/$1 $2 $3/o;

    >
    > The {2,} means two or more, is that what you want?
    > The {1,} means 1 or more, so is the same as '+'.
    >
    > If you meant exactly 2:
    >
    > $a =~ s/(..)+(..)+(..)+.*(\3)+(\2)+(\1)+/\1 \2 \3/o;
    >
    > (untested)
    >


    Capturing like that just isn't going to work. Something like the following
    is probably what you wanted:

    $a = 'AAAABBBBCCCCsometexthereCCCCBBBBAAAA';
    $a =~ s/(..)\1*(..)\2*(..)\3*.*?\3+\2+\1+/$1 $2 $3/;
    print $a;

    Matt
    Matt Garrish, Sep 25, 2005
    #3
  4. giampiero

    Bob Walton Guest

    giampiero wrote:

    > i find three substring of length 2 (also repeated) followed after a
    > while to a reverse sequences (also repeated)
    >
    >
    > i use:
    > $a=~s/(.{2,})+(.{2,})+(.{2,})+.*\3{1,}\2{1,}\1{1,}/$1 $2 $3/o;


    It seems doubtful that the above regex is actually what you want.
    That's because the first (.{2,})+ will match any two or more
    characters and assign them to $1, then any next two or more
    characters and assign *them* to $1, etc. So portions of the
    string which were matched (other than by the .*) will not be
    present in $1 $2 or $3. If you want what I think you said, you
    need to place the parenthetical groupings so they pick up the
    entire repeated group, like:

    $a=~s/((?:.{2,})+)
    ((?:.{2,})+)
    ((?:.{2,})+)
    .*
    \3{1,}\2{1,}\1{1,}
    /$1 $2 $3/xo;

    Note that this regex is particularly inefficient, with huge
    amounts of backtracking, so give it a while to execute if the
    string has any complication at all. This could be improved
    immensely by removing the redundant repeats with no change to
    what is matched except for the improvement in efficiency. Example:

    use warnings;
    use strict;
    my $a='qabczycdefxxxxxxxxxefcdabczynn';
    my $b=$a;
    if( #original regexp
    $a=~s/(.{2,})+(.{2,})+(.{2,})+.*\3{1,}\2{1,}\1{1,}/$1 $2 $3/o
    ){print "\$a matched.\n";
    print "\$1=$1\n";
    print "\$2=$2\n";
    print "\$3=$3\n";
    }
    print "\$a is now $a\n";

    if( #suggested regexp
    $b=~s/(.{2,})
    (.{2,})
    (.{2,})
    .*
    \3+\2+\1+
    /$1 $2 $3/xo
    ){print "\$b matched.\n";
    print "\$1=$1\n";
    print "\$2=$2\n";
    print "\$3=$3\n";
    }
    print "\$b is now $b\n";

    When run:

    D:\junk>perl junk544.pl
    $a matched.
    $1=ef
    $2=xx
    $3=xx
    $a is now ef xx xxcdabczynn
    $b matched.
    $1=abczy
    $2=cd
    $3=ef
    $b is now qabczy cd efnn

    D:\junk>

    >
    > how to be sure in regular expression that length $1+$2+$3 must be more
    > l?


    Well, length $1+$2+$3 will always be 1 unless the strings are
    numeric :). Assuming you actually mean
    length($1)+length($2)+length($3), each of $1 $2 and $3 must have
    matched at least two characters, so if the match succeeded then
    length($1)+length($2)+length($3)>=6. Perhaps you should check to
    see if the match succeeded, as per the example above. Don't ever
    use $1 etc unless you know the match succeeded.
    --
    Bob Walton
    Email: http://bwalton.com/cgi-bin/emailbob.pl
    Bob Walton, Sep 26, 2005
    #4
  5. giampiero

    Guest

    let me ask you:
    and if
    $a=~s/((?:.{0,})+)
    ((?:.{0,})+)
    ((?:.{0,})+)
    .*
    \3{1,}\2{1,}\1{1,}
    /$1 $2 $3/xo;

    and the total of length of $1+$2+$3>=12?

    thanx again
    , Sep 29, 2005
    #5
  6. giampiero

    Dr.Ruud Guest

    schreef:
    > let me ask you:
    > and if
    > $a=~s/((?:.{0,})+)
    > ((?:.{0,})+)
    > ((?:.{0,})+)
    > .*
    > \3{1,}\2{1,}\1{1,}
    > /$1 $2 $3/xo;
    >
    > and the total of length of $1+$2+$3>=12?
    >
    > thanx again


    {0,} is the same as *
    {1,} is the same as +


    Something like ((.*)+) hurts (the mind too). 1 or more of something that
    can be empty, is not what was meant to be.

    The usage of (?:, to cleanly use groups, looks OK.

    I remember that your data had a basic grouplength of 2, like
    '1212123456xxxxxxxx56343412'
    Is that still true? If so, try:

    $a=~s/((?:..)+)
    ((?:..)+)
    ((?:..)+)
    .*
    \3+\2+\1+
    /$1 $2 $3/xo;

    (untested)

    --
    Affijn, Ruud

    "Gewoon is een tijger."
    Dr.Ruud, Sep 29, 2005
    #6
  7. giampiero

    Bob Walton Guest

    wrote:
    > let me ask you:
    > and if
    > $a=~s/((?:.{0,})+)
    > ((?:.{0,})+)
    > ((?:.{0,})+)
    > .*
    > \3{1,}\2{1,}\1{1,}
    > /$1 $2 $3/xo;
    >


    Please note carefully that (?:.{0,})+ is exactly the same as .*,
    with the exception that (?:.{0,})+ is grossly inefficient due to
    the amount of backtracking it generates, particularly when
    multiples of them appear in the same regexp. Also, note that
    this regexp could match the null string. So you could
    equivalently and much more efficiently write:

    $a=~s/(.*)(.*)(.*).*\3+\2+\1+/$1 $2 $3/;

    > and the total of length of $1+$2+$3>=12?


    I interpret this to mean that a success match is intended to
    occur only if the sum of the lengths of the three strings is
    twelve or more characters total. If so:

    use warnings;
    use strict;
    my $a='qabczycfffdefxxxxxxxxxefcfffdabczynn';
    if(
    $a=~s/(.*)
    (.*)
    (.*)
    .*
    \3+\2+\1+
    #Note: '`' x 100 is intended to refer to a sequence
    #of characters which will never occur in the matched
    #string. Adjust as needed.
    (??{length($1)+length($2)+length($3)>=12?
    '':'`' x 100})
    /$1 $2 $3/xo
    ){print "\$a matched.\n";
    print "\$1=$1\n";
    print "\$2=$2\n";
    print "\$3=$3\n";
    }
    print "\$a is now>$a<\n";

    When run, this prints:

    d:\junk>perl junk545.pl
    $a matched.
    $1=abczy
    $2=cfffd
    $3=ef
    $a is now>qabczy cfffd efnn<

    d:\junk>

    If the two sequences of fff in $a are replaced with ff, the match
    will fail because the sum of the string lengths is less than 12.

    It can be instructive to add a print "$1:$2:$3\n"; before the
    conditional statement in the (??{}). That prints the progress of
    the match as it proceeds.

    Note: This will only work using recent versions of Perl.
    ....
    --
    Bob Walton
    Email: http://bwalton.com/cgi-bin/emailbob.pl
    Bob Walton, Sep 30, 2005
    #7
  8. giampiero

    giampiero Guest

    >Please note carefully that (?:.{0,})+ is exactly the same as .*,

    ???????????
    (?:.{0,})+ equal (.*)+
    giampiero, Oct 7, 2005
    #8
  9. giampiero

    Matt Garrish Guest

    "giampiero" <> wrote in message
    news:...
    > >Please note carefully that (?:.{0,})+ is exactly the same as .*,

    >
    > ???????????
    > (?:.{0,})+ equal (.*)+
    >


    You seem to be misunderstaning the fundamental concept of a greedy operator.
    On it's own, /.*/ will match nothing and everything. Consequently, writing
    /(.*)+/ is a useless redundancy as it will always and only ever match once,
    so the additional modifier isn't doing anything (.*? and .*+ being
    completely other beasts).

    Moreover, /.*/ is equivalent to /.{0,}/ as the * modifier means 0 or more
    occurences. There is a difference between writing /(?:.{0,})/ and /(.*)/ and
    that is that the first will not result in any value being assigned to $1. If
    you look closely at what was written above, it is only stated that the two
    are the same without a grouping on .*.

    Matt
    Matt Garrish, Oct 7, 2005
    #9
  10. giampiero

    Bob Walton Guest

    giampiero wrote:

    >>Please note carefully that (?:.{0,})+ is exactly the same as .*,

    >
    >
    > ???????????


    Yes, the above is correct. Both will match any string of
    characters (with a caveat around a newline depending on whether
    the //s switch is active at the time the regexp is encountered --
    but that behavior will be the same between the two). As to why
    (?:.{0,})+ is the same as .* : {0,} is a longhand way of writing
    *, so .{0,} is the same as .* . (?:.{0,}) is then also the same
    as .* . Now, (?:.{0,}) will match any character string (see
    caveat above), hence (?:.{0,})+ will also, with the + interpreted
    as "once". Depending on the character string, it might also
    match, say, half of the string followed by the other half, or a
    quarter followed by the other three-fourths, etc etc. Note that
    there are a whole bunch of ways (?:.{0,})+ can match a character
    string -- but also note that the resulting match does in fact
    match the entire character string, just as .* would have.

    > (?:.{0,})+ equal (.*)+


    This is incorrect. (.*)+ contains grouping parentheses which
    will cause the last string matched by .* to be returned in $1 and
    other side reactions to occur in the various other
    regexp-grouping-related variables. (?:.{0,})+ does not contain
    any grouping parentheses pairs. Hence these two, while they will
    match the same strings (namely, all of them, subject to my caveat
    above), are not the same because they do not cause the same
    ultimate actions.

    You seem to be totally missing the idea of why one *never* wants
    to do something like (?:.*)+ . It is not just that it takes more
    time to type and to think about; it is that such an expression
    causes an extreme amount of backtracking when something
    subsequent to it fails to match in a regexp. That translates
    into computer time -- potentially *years* of it -- spent doing
    absolutely nothing worthwhile. Here is an example program that
    shows the backtracking I'm talking about as the execution of the
    regexps proceeds:

    use warnings;
    use strict;
    my $s='aaaaaaaaaaaaaaaaaaaaaaaaa';
    print "Matching re1:\n";
    $s=~/(.*)(??{print "$1\n";''})\1/;
    print "Done matching re1. Push return to continue.\n";
    <>;
    print "Matching re2:\n";
    $s=~/((?:.*)+)(??{print "$1\n";''})\1/;

    The result of running this should be most instructive as to why
    one should avoid unneeded backtracking in regexps. Note that the
    same result is achieved with both "re1" and "re2" above, but at
    substantially higher computational cost in the case of "re2".

    --
    Bob Walton
    Email: http://bwalton.com/cgi-bin/emailbob.pl
    Bob Walton, Oct 9, 2005
    #10
  11. giampiero

    giampiero Guest

    my intention was to match two substrings at the left and at right of .*
    that can be repeated different times . example
    ...abcabc.....(.*)...abcabcabc.....

    this can be done by (.*) and \1 ????
    thanx again.
    giampiero, Oct 14, 2005
    #11
  12. giampiero

    Bob Walton Guest

    giampiero wrote:

    > my intention was to match two substrings at the left and at right of .*
    > that can be repeated different times . example
    > ..abcabc.....(.*)...abcabcabc.....
    >
    > this can be done by (.*) and \1 ????

    ....

    Well, your example string will match your stated criterion with
    $1 matching . and .* matching all of the string except for the
    leading and trailing .'s. Is that what you intend? If one
    replaces the .'s with random non-repeating characters, as in:

    xyabcabczjtwvu(.*)mqzabcabcabcsukp

    then a match will occur with $1 matching abcabc, and .* matching
    zjtwvu(.*)mqzabc . That match still probably isn't what you
    intend -- you would apparently like to see $1 match abc . The
    problem is that while that would match, it isn't the first match
    encountered by the regexp engine. On the off chance that that is OK:

    use warnings;
    use strict;
    #my $string='..abcabc.....(.*)...abcabcabc.....';
    my $string= 'xyabcabczjtwv(.*)mqzabcabcabcsukpr';
    if($string=~s/(.+)\1*.*\1+//){
    print "Matched, \$1=$1, left: $string\n";
    }

    Note that this is probably not what you really want, since
    matches you probably aren't interested in will occur. In this
    one, $1 matches abcabc, the .* matches zjtwvu(.*)mqzabc and \1+
    matches abcabc. I think you want $1 to match abc . Note that $1
    matching abcabc meets your stated criterion: a string that can
    be repeated following by any characters followed by one or more
    repititions of the first string. The abcabc match is the one the
    regexp engine will encounter first (unless non-greediness is used).

    For an example you most likely don't want: if the string contains
    an additional x (or y) anywhere in the "random junk" near the end
    of the string, like:

    my $string= 'xyabcabczjtwv(.*)mqzabcabcabcsxkpr';

    then $1 will match the first x (or y), the .* will match
    everything up to the second x (or y), \1+ will match the second x
    (or y), and the match will succeed. That match meets your stated
    criterion (a substring that can be repeated occuring on both
    sides of any string), but probably isn't what you want.

    It may help a lot if you can make a clearer statement of what you
    really want to match.

    --
    Bob Walton
    Email: http://bwalton.com/cgi-bin/emailbob.pl
    Bob Walton, Oct 15, 2005
    #12
  13. giampiero

    giampiero Guest

    as you argue abcabc match abcabc(abc)

    But what i need for others elaborations in abc as patter repeated two
    and three times
    giampiero, Oct 16, 2005
    #13
  14. giampiero

    Bob Walton Guest

    giampiero wrote:
    > as you argue abcabc match abcabc(abc)
    >
    > But what i need for others elaborations in abc as patter repeated two
    > and three times
    >


    Unquoted context from previous notes:

    [[[[[
    giampiero wrote:

    > my intention was to match two substrings at the left and at

    right of .*
    > that can be repeated different times . example
    > ..abcabc.....(.*)...abcabcabc.....
    >
    > this can be done by (.*) and \1 ????


    ....

    Well, your example string will match your stated criterion with
    $1 matching . and .* matching all of the string except for the
    leading and trailing .'s. Is that what you intend? If one
    replaces the .'s with random non-repeating characters, as in:

    xyabcabczjtwvu(.*)mqzabcabcabcsukp

    then a match will occur with $1 matching abcabc, and .* matching
    zjtwvu(.*)mqzabc . That match still probably isn't what you
    intend -- you would apparently like to see $1 match abc . The
    problem is that while that would match, it isn't the first match
    encountered by the regexp engine. On the off chance that that is OK:

    use warnings;
    use strict;
    #my $string='..abcabc.....(.*)...abcabcabc.....';
    my $string= 'xyabcabczjtwv(.*)mqzabcabcabcsukpr';
    if($string=~s/(.+)\1*.*\1+//){
    print "Matched, \$1=$1, left: $string\n";
    }

    ]]]]]

    Well, there are a couple of ways of getting that match, all
    involving further restrictions of your requirements. If you make
    the original string match (the (.+) ) so it only matches strings
    three characters long (that is, (.{3,3}) , that works.

    Or if you make it so the part of the string before the .* is
    required to repeat at least once and the part of the string after
    the .* is required to also repeat at least once, that will also
    result in $1 matching abc . Example:

    use warnings;
    use strict;
    my $string='xyabcabczjtwv(.*)mqzabcabcabcsykpr';
    if($string=~s/(.+)\1+.*\1{2,}//){
    print "Matched, \$1=$1, left: $string\n";
    }

    But with your original statement of the desired regexp (a first
    string, possibly repeated, followed by any string, followed by
    the first string possibly repeated), other matches such as abcabc
    will be found first.
    --
    Bob Walton
    Email: http://bwalton.com/cgi-bin/emailbob.pl
    Bob Walton, Oct 16, 2005
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. nail
    Replies:
    0
    Views:
    1,527
  2. qwerty
    Replies:
    3
    Views:
    9,262
    Scott Allen
    Sep 30, 2004
  3. Maziar Aflatoun
    Replies:
    1
    Views:
    481
    =?Utf-8?B?UGF1bA==?=
    Jan 22, 2005
  4. et
    Replies:
    1
    Views:
    506
    Yunus Emre ALPĂ–ZEN [MCSD.NET]
    Jun 29, 2005
  5. Ding
    Replies:
    1
    Views:
    508
    Andrew Thompson
    Jul 1, 2004
Loading...

Share This Page