another help

G

giampiero

i find three substring of length 2 (also repeated) followed after a
while to a reverse sequences (also repeated)


i use:
$a=~s/(.{2,})+(.{2,})+(.{2,})+.*\3{1,}\2{1,}\1{1,}/$1 $2 $3/o;

how to be sure in regular expression that length $1+$2+$3 must be more
l?
thanx a lot from deep of my soul
 
D

Dr.Ruud

giampiero schreef:
i find three substring of length 2 (also repeated) followed after a
while to a reverse sequences (also repeated)

Your message has a bad Subject. Keep posting in the same thread. No,
google is no excuse not to do that.

i use:
$a =~ s/(.{2,})+(.{2,})+(.{2,})+.*\3{1,}\2{1,}\1{1,}/$1 $2 $3/o;

The {2,} means two or more, is that what you want?
The {1,} means 1 or more, so is the same as '+'.

If you meant exactly 2:

$a =~ s/(..)+(..)+(..)+.*(\3)+(\2)+(\1)+/\1 \2 \3/o;

(untested)

how to be sure in regular expression that length $1+$2+$3 must be
more l?

That will always be 3 * 2 = 6.
 
M

Matt Garrish

Dr.Ruud said:
giampiero schreef:


Your message has a bad Subject. Keep posting in the same thread. No,
google is no excuse not to do that.



The {2,} means two or more, is that what you want?
The {1,} means 1 or more, so is the same as '+'.

If you meant exactly 2:

$a =~ s/(..)+(..)+(..)+.*(\3)+(\2)+(\1)+/\1 \2 \3/o;

(untested)

Capturing like that just isn't going to work. Something like the following
is probably what you wanted:

$a = 'AAAABBBBCCCCsometexthereCCCCBBBBAAAA';
$a =~ s/(..)\1*(..)\2*(..)\3*.*?\3+\2+\1+/$1 $2 $3/;
print $a;

Matt
 
B

Bob Walton

giampiero said:
i find three substring of length 2 (also repeated) followed after a
while to a reverse sequences (also repeated)


i use:
$a=~s/(.{2,})+(.{2,})+(.{2,})+.*\3{1,}\2{1,}\1{1,}/$1 $2 $3/o;

It seems doubtful that the above regex is actually what you want.
That's because the first (.{2,})+ will match any two or more
characters and assign them to $1, then any next two or more
characters and assign *them* to $1, etc. So portions of the
string which were matched (other than by the .*) will not be
present in $1 $2 or $3. If you want what I think you said, you
need to place the parenthetical groupings so they pick up the
entire repeated group, like:

$a=~s/((?:.{2,})+)
((?:.{2,})+)
((?:.{2,})+)
.*
\3{1,}\2{1,}\1{1,}
/$1 $2 $3/xo;

Note that this regex is particularly inefficient, with huge
amounts of backtracking, so give it a while to execute if the
string has any complication at all. This could be improved
immensely by removing the redundant repeats with no change to
what is matched except for the improvement in efficiency. Example:

use warnings;
use strict;
my $a='qabczycdefxxxxxxxxxefcdabczynn';
my $b=$a;
if( #original regexp
$a=~s/(.{2,})+(.{2,})+(.{2,})+.*\3{1,}\2{1,}\1{1,}/$1 $2 $3/o
){print "\$a matched.\n";
print "\$1=$1\n";
print "\$2=$2\n";
print "\$3=$3\n";
}
print "\$a is now $a\n";

if( #suggested regexp
$b=~s/(.{2,})
(.{2,})
(.{2,})
.*
\3+\2+\1+
/$1 $2 $3/xo
){print "\$b matched.\n";
print "\$1=$1\n";
print "\$2=$2\n";
print "\$3=$3\n";
}
print "\$b is now $b\n";

When run:

D:\junk>perl junk544.pl
$a matched.
$1=ef
$2=xx
$3=xx
$a is now ef xx xxcdabczynn
$b matched.
$1=abczy
$2=cd
$3=ef
$b is now qabczy cd efnn

D:\junk>
how to be sure in regular expression that length $1+$2+$3 must be more
l?

Well, length $1+$2+$3 will always be 1 unless the strings are
numeric :). Assuming you actually mean
length($1)+length($2)+length($3), each of $1 $2 and $3 must have
matched at least two characters, so if the match succeeded then
length($1)+length($2)+length($3)>=6. Perhaps you should check to
see if the match succeeded, as per the example above. Don't ever
use $1 etc unless you know the match succeeded.
 
B

borges2003xx

let me ask you:
and if
$a=~s/((?:.{0,})+)
((?:.{0,})+)
((?:.{0,})+)
.*
\3{1,}\2{1,}\1{1,}
/$1 $2 $3/xo;

and the total of length of $1+$2+$3>=12?

thanx again
 
D

Dr.Ruud

(e-mail address removed) schreef:
let me ask you:
and if
$a=~s/((?:.{0,})+)
((?:.{0,})+)
((?:.{0,})+)
.*
\3{1,}\2{1,}\1{1,}
/$1 $2 $3/xo;

and the total of length of $1+$2+$3>=12?

thanx again

{0,} is the same as *
{1,} is the same as +


Something like ((.*)+) hurts (the mind too). 1 or more of something that
can be empty, is not what was meant to be.

The usage of (?:, to cleanly use groups, looks OK.

I remember that your data had a basic grouplength of 2, like
'1212123456xxxxxxxx56343412'
Is that still true? If so, try:

$a=~s/((?:..)+)
((?:..)+)
((?:..)+)
.*
\3+\2+\1+
/$1 $2 $3/xo;

(untested)
 
B

Bob Walton

let me ask you:
and if
$a=~s/((?:.{0,})+)
((?:.{0,})+)
((?:.{0,})+)
.*
\3{1,}\2{1,}\1{1,}
/$1 $2 $3/xo;

Please note carefully that (?:.{0,})+ is exactly the same as .*,
with the exception that (?:.{0,})+ is grossly inefficient due to
the amount of backtracking it generates, particularly when
multiples of them appear in the same regexp. Also, note that
this regexp could match the null string. So you could
equivalently and much more efficiently write:

$a=~s/(.*)(.*)(.*).*\3+\2+\1+/$1 $2 $3/;
and the total of length of $1+$2+$3>=12?

I interpret this to mean that a success match is intended to
occur only if the sum of the lengths of the three strings is
twelve or more characters total. If so:

use warnings;
use strict;
my $a='qabczycfffdefxxxxxxxxxefcfffdabczynn';
if(
$a=~s/(.*)
(.*)
(.*)
.*
\3+\2+\1+
#Note: '`' x 100 is intended to refer to a sequence
#of characters which will never occur in the matched
#string. Adjust as needed.
(??{length($1)+length($2)+length($3)>=12?
'':'`' x 100})
/$1 $2 $3/xo
){print "\$a matched.\n";
print "\$1=$1\n";
print "\$2=$2\n";
print "\$3=$3\n";
}
print "\$a is now>$a<\n";

When run, this prints:

d:\junk>perl junk545.pl
$a matched.
$1=abczy
$2=cfffd
$3=ef
$a is now>qabczy cfffd efnn<

d:\junk>

If the two sequences of fff in $a are replaced with ff, the match
will fail because the sum of the string lengths is less than 12.

It can be instructive to add a print "$1:$2:$3\n"; before the
conditional statement in the (??{}). That prints the progress of
the match as it proceeds.

Note: This will only work using recent versions of Perl.
....
 
G

giampiero

Please note carefully that (?:.{0,})+ is exactly the same as .*,

???????????
(?:.{0,})+ equal (.*)+
 
M

Matt Garrish

giampiero said:
???????????
(?:.{0,})+ equal (.*)+

You seem to be misunderstaning the fundamental concept of a greedy operator.
On it's own, /.*/ will match nothing and everything. Consequently, writing
/(.*)+/ is a useless redundancy as it will always and only ever match once,
so the additional modifier isn't doing anything (.*? and .*+ being
completely other beasts).

Moreover, /.*/ is equivalent to /.{0,}/ as the * modifier means 0 or more
occurences. There is a difference between writing /(?:.{0,})/ and /(.*)/ and
that is that the first will not result in any value being assigned to $1. If
you look closely at what was written above, it is only stated that the two
are the same without a grouping on .*.

Matt
 
B

Bob Walton

giampiero said:
???????????

Yes, the above is correct. Both will match any string of
characters (with a caveat around a newline depending on whether
the //s switch is active at the time the regexp is encountered --
but that behavior will be the same between the two). As to why
(?:.{0,})+ is the same as .* : {0,} is a longhand way of writing
*, so .{0,} is the same as .* . (?:.{0,}) is then also the same
as .* . Now, (?:.{0,}) will match any character string (see
caveat above), hence (?:.{0,})+ will also, with the + interpreted
as "once". Depending on the character string, it might also
match, say, half of the string followed by the other half, or a
quarter followed by the other three-fourths, etc etc. Note that
there are a whole bunch of ways (?:.{0,})+ can match a character
string -- but also note that the resulting match does in fact
match the entire character string, just as .* would have.
(?:.{0,})+ equal (.*)+

This is incorrect. (.*)+ contains grouping parentheses which
will cause the last string matched by .* to be returned in $1 and
other side reactions to occur in the various other
regexp-grouping-related variables. (?:.{0,})+ does not contain
any grouping parentheses pairs. Hence these two, while they will
match the same strings (namely, all of them, subject to my caveat
above), are not the same because they do not cause the same
ultimate actions.

You seem to be totally missing the idea of why one *never* wants
to do something like (?:.*)+ . It is not just that it takes more
time to type and to think about; it is that such an expression
causes an extreme amount of backtracking when something
subsequent to it fails to match in a regexp. That translates
into computer time -- potentially *years* of it -- spent doing
absolutely nothing worthwhile. Here is an example program that
shows the backtracking I'm talking about as the execution of the
regexps proceeds:

use warnings;
use strict;
my $s='aaaaaaaaaaaaaaaaaaaaaaaaa';
print "Matching re1:\n";
$s=~/(.*)(??{print "$1\n";''})\1/;
print "Done matching re1. Push return to continue.\n";
<>;
print "Matching re2:\n";
$s=~/((?:.*)+)(??{print "$1\n";''})\1/;

The result of running this should be most instructive as to why
one should avoid unneeded backtracking in regexps. Note that the
same result is achieved with both "re1" and "re2" above, but at
substantially higher computational cost in the case of "re2".
 
G

giampiero

my intention was to match two substrings at the left and at right of .*
that can be repeated different times . example
...abcabc.....(.*)...abcabcabc.....

this can be done by (.*) and \1 ????
thanx again.
 
B

Bob Walton

giampiero said:
my intention was to match two substrings at the left and at right of .*
that can be repeated different times . example
..abcabc.....(.*)...abcabcabc.....

this can be done by (.*) and \1 ????
....

Well, your example string will match your stated criterion with
$1 matching . and .* matching all of the string except for the
leading and trailing .'s. Is that what you intend? If one
replaces the .'s with random non-repeating characters, as in:

xyabcabczjtwvu(.*)mqzabcabcabcsukp

then a match will occur with $1 matching abcabc, and .* matching
zjtwvu(.*)mqzabc . That match still probably isn't what you
intend -- you would apparently like to see $1 match abc . The
problem is that while that would match, it isn't the first match
encountered by the regexp engine. On the off chance that that is OK:

use warnings;
use strict;
#my $string='..abcabc.....(.*)...abcabcabc.....';
my $string= 'xyabcabczjtwv(.*)mqzabcabcabcsukpr';
if($string=~s/(.+)\1*.*\1+//){
print "Matched, \$1=$1, left: $string\n";
}

Note that this is probably not what you really want, since
matches you probably aren't interested in will occur. In this
one, $1 matches abcabc, the .* matches zjtwvu(.*)mqzabc and \1+
matches abcabc. I think you want $1 to match abc . Note that $1
matching abcabc meets your stated criterion: a string that can
be repeated following by any characters followed by one or more
repititions of the first string. The abcabc match is the one the
regexp engine will encounter first (unless non-greediness is used).

For an example you most likely don't want: if the string contains
an additional x (or y) anywhere in the "random junk" near the end
of the string, like:

my $string= 'xyabcabczjtwv(.*)mqzabcabcabcsxkpr';

then $1 will match the first x (or y), the .* will match
everything up to the second x (or y), \1+ will match the second x
(or y), and the match will succeed. That match meets your stated
criterion (a substring that can be repeated occuring on both
sides of any string), but probably isn't what you want.

It may help a lot if you can make a clearer statement of what you
really want to match.
 
G

giampiero

as you argue abcabc match abcabc(abc)

But what i need for others elaborations in abc as patter repeated two
and three times
 
B

Bob Walton

giampiero said:
as you argue abcabc match abcabc(abc)

But what i need for others elaborations in abc as patter repeated two
and three times

Unquoted context from previous notes:

[[[[[
giampiero said:
my intention was to match two substrings at the left and at right of .*
that can be repeated different times . example
..abcabc.....(.*)...abcabcabc.....

this can be done by (.*) and \1 ????

....

Well, your example string will match your stated criterion with
$1 matching . and .* matching all of the string except for the
leading and trailing .'s. Is that what you intend? If one
replaces the .'s with random non-repeating characters, as in:

xyabcabczjtwvu(.*)mqzabcabcabcsukp

then a match will occur with $1 matching abcabc, and .* matching
zjtwvu(.*)mqzabc . That match still probably isn't what you
intend -- you would apparently like to see $1 match abc . The
problem is that while that would match, it isn't the first match
encountered by the regexp engine. On the off chance that that is OK:

use warnings;
use strict;
#my $string='..abcabc.....(.*)...abcabcabc.....';
my $string= 'xyabcabczjtwv(.*)mqzabcabcabcsukpr';
if($string=~s/(.+)\1*.*\1+//){
print "Matched, \$1=$1, left: $string\n";
}

]]]]]

Well, there are a couple of ways of getting that match, all
involving further restrictions of your requirements. If you make
the original string match (the (.+) ) so it only matches strings
three characters long (that is, (.{3,3}) , that works.

Or if you make it so the part of the string before the .* is
required to repeat at least once and the part of the string after
the .* is required to also repeat at least once, that will also
result in $1 matching abc . Example:

use warnings;
use strict;
my $string='xyabcabczjtwv(.*)mqzabcabcabcsykpr';
if($string=~s/(.+)\1+.*\1{2,}//){
print "Matched, \$1=$1, left: $string\n";
}

But with your original statement of the desired regexp (a first
string, possibly repeated, followed by any string, followed by
the first string possibly repeated), other matches such as abcabc
will be found first.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top