Quantifier...bigger than 32766...in regex

L

leegee

Please help. When I say:

qr/^.{1,$Radio::AudioNotes::COL_SIZE->{TEXT}}$/m, ],

Perl says:

Quantifier in {,} bigger than 32766 before HERE mark in regex m/^.{
<< HERE 1,65535}$/

It appears from the archives of this gruop that this is unavoidable -
does anyone know otherwise?

Or can anyone think of a work-around that would fit the above?

I'm trying to limit an input string to a specific length, that of a
MySQL MEDIUMTEXT column.

Thanks in anticipation,
lee
 
A

A. Sinan Unur

(e-mail address removed) wrote in @y43g2000cwc.googlegroups.com:
Please help. When I say:

qr/^.{1,$Radio::AudioNotes::COL_SIZE->{TEXT}}$/m, ], ....

I'm trying to limit an input string to a specific length, that of a
MySQL MEDIUMTEXT column.

Yeeehaaaw! I think I spotted another SAQ:

See

perldoc -f length

Sinan

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
 
L

leegee

I'd be grateful to hear an answer better than this (please!):

use strict;

my %COL_SIZE = (
aTINYTEXT => 255, # 8-bit
bTEXT => 65535, # 16-bit
cMEDIUMTEXT => 16777215, # 32-bit
dLONGTEXT => 4294967295, # 64-bit
);

foreach (sort keys %COL_SIZE){
print $_
."\n\t".
($COL_SIZE{$_} / 32766)
."\n\t".
int ($COL_SIZE{$_} / 32766)
."\n\t".
($COL_SIZE{$_} % 32766)
."\n\n".
'qr['.
(
'.{1,32766}' x int ($COL_SIZE{$_} / 32766)
)
.'.{1,'.($COL_SIZE{$_} % 32766).'}'
.']'
."\n\n\n"
}

__END__

Outputs:

aTINYTEXT
0.00778245742537997
0
255

qr[.{1,255}]


bTEXT
2.00009155832265
2
3

qr[.{1,32766}.{1,32766}.{1,3}]

[...etc]
 
D

David Squire

I'd be grateful to hear an answer better than this (please!):

Than what? Please quote context when replying (see the posting
guidelines that are posted here frequently)

DS
 
L

LeeGee

Problem as posted at top of thread: when I say:

qr/^.{1,$Radio::AudioNotes::COL_SIZE->{TEXT}}$/m, ],

Perl says:

Quantifier in {,} bigger than 32766 before HERE mark in regex m/^.{

<< HERE 1,65535}$/

(Described in perldoc prelre)

Best solution so far, but I would appreciate a "nicer" one:

use strict;

$|=0;
my %COL_SIZE = (
bTINYTEXT => 255, # 8-bit
cTEXT => 65535, # 16-bit
dMEDIUMTEXT => 16777215, # 32-bit
# eLONGTEXT => 4294967295, # 64-bit
);

foreach (sort keys %COL_SIZE){
my $s = '^';
$s .= '(.{0,32766}){0,' . int($COL_SIZE{$_} / 32766) .'}' if
int($COL_SIZE{$_} / 32766);
$s .= '.{0,'.($COL_SIZE{$_} % 32766).'}' if ($COL_SIZE{$_} % 32766);
$s .= '$';

my $qr = qr[$s]m;
die ref $qr if ref $qr ne 'Regexp';

print $qr."\n";
for my $i ( reverse sort values %COL_SIZE ){
my $t = 'x' x $i;
print "\tSrting with length $i ";
if ($t !~ $qr){
print "FAILS regex\n"
} else {
print "PASSES regex\n";
}

}
print "\n\n";
}

__END__

Outputs:

(?m-xis:^.{0,255}$)
Srting with length 65535 FAILS regex
Srting with length 255 PASSES regex
Srting with length 16777215 FAILS regex


(?m-xis:^(.{0,32766}){0,2}.{0,3}$)
Srting with length 65535 PASSES regex
Srting with length 255 PASSES regex
Srting with length 16777215 FAILS regex


(?m-xis:^(.{0,32766}){0,512}.{0,1023}$)
Srting with length 65535 PASSES regex
Srting with length 255 PASSES regex
Srting with length 16777215 PASSES regex



Tool completed successfully
 
L

LeeGee

(e-mail address removed) wrote in news:[email protected]:
Please help. When I say:
qr/^.{1,$Radio::AudioNotes::COL_SIZE->{TEXT}}$/m, ], ...
I'm trying to limit an input string to a specific length, that of a
MySQL MEDIUMTEXT column.

Yeeehaaaw! I think I spotted another SAQ:

See
perldoc -f length

Sorry, Sinan - I forgot to say that I *must* use a regular expression.
For my own perverse reasons. But thanks for taking the time to be
helpful.

Lee
 
D

Dr.Ruud

(e-mail address removed) schreef:
Please help. When I say:

qr/^.{1,$Radio::AudioNotes::COL_SIZE->{TEXT}}$/m, ],

Perl says:

Quantifier in {,} bigger than 32766 before HERE mark in regex
m/^.{ << HERE 1,65535}$/

Generate is as
/^.{1,32766}.{0,32766}.{0,3}$/


#!/usr/bin/perl
use strict;
use warnings;

use constant RE_qmax => 32766;

my $n = 65536; # $Radio::AudioNotes::COL_SIZE->{TEXT} ;

my $re = q</^.{1,> ;

while ($n > RE_qmax)
{
$re .= RE_qmax . q<}.{0,>;
$n -= RE_qmax;
}
$re = qr<$re$n}\$/>;

print "$re\n";
 
D

Dr.Ruud

Dr.Ruud schreef:
(e-mail address removed) schreef:
When I say:

qr/^.{1,$Radio::AudioNotes::COL_SIZE->{TEXT}}$/m, ],

Perl says:

Quantifier in {,} bigger than 32766 before HERE mark in regex
m/^.{ << HERE 1,65535}$/

Generate is as
/^.{1,32766}.{0,32766}.{0,3}$/

Corrected code:

#!/usr/bin/perl
use strict;
use warnings;

use constant RE_qmax => 32766;

my $n = $Radio::AudioNotes::COL_SIZE->{TEXT} ;
my $re = q<^.{1,> ;

while ($n > RE_qmax)
{
$re .= RE_qmax . q<}.{0,>;
$n -= RE_qmax;
}
$re = qr<$re$n}$>;

print "$re\n";
 
A

Anno Siegel

LeeGee said:
Problem as posted at top of thread: when I say:

qr/^.{1,$Radio::AudioNotes::COL_SIZE->{TEXT}}$/m, ],

Perl says:

Quantifier in {,} bigger than 32766 before HERE mark in regex m/^.{

<< HERE 1,65535}$/

(Described in perldoc prelre)

Best solution so far, but I would appreciate a "nicer" one:

[snip]

How about

# construct a regex that matches up to $n characters without using
# a quantifier > MAX
use constant MAX => 32766;
sub mk_re {
my $n = shift;
my $q = int $n/MAX;
die "Size $n too large" if $q > MAX; # $n > MAX**2
sprintf
'^(?:(?:.{%d}){0,%d}.{0,%d}|(?:.{%d}){%d}.{0,%d})$',
MAX, $q-1, MAX, MAX, $q, $n % MAX;
}

This works up to a size of MAX**2. The regex matches an empty string,
which is not quite to specification. That can be repaired in various
ways. If you want to return the ready-made qr/.../, you can use

return qr/$_/ for sprintf ...;


Anno
 
B

Brian McCauley

LeeGee said:
(?m-xis:^(.{0,32766}){0,512}.{0,1023}$)
Srting with length 65535 PASSES regex
Srting with length 255 PASSES regex
Srting with length 16777215 PASSES regex

String with length 16777217 would fail but probably not before the end
of the universe.

Looking at some much smaller numbers and counting the number of times
the RE engine backtracks you may start to see the problem...

local our $count;
('x' x 2000 ) =~ /(?m-xis:^(.{0,9}){0,4}(?{ $count++ })$)/;
print $count;

Prints 8201.
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was NOT [per weedlist] sent to
Dr.Ruud
qr/^.{1,$Radio::AudioNotes::COL_SIZE->{TEXT}}$/m, ],
Perl says:
Quantifier in {,} bigger than 32766 before HERE mark in regex
m/^.{ << HERE 1,65535}$/
Generate is as
/^.{1,32766}.{0,32766}.{0,3}$/

This works by coincidence only. One needs (?>) or an alternative
without two consequent quantified expressions.

Hope this helps,
Ilya
 
D

Dr.Ruud

Ilya Zakharevich schreef:
Dr.Ruud:
[attribution repaired] (e-mail address removed):
qr/^.{1,$Radio::AudioNotes::COL_SIZE->{TEXT}}$/m, ],
Perl says:
Quantifier in {,} bigger than 32766 before HERE mark in regex
m/^.{ << HERE 1,65535}$/

Generate is as
/^.{1,32766}.{0,32766}.{0,3}$/

This works by coincidence only. One needs (?>) or an alternative
without two consequent quantified expressions.

Care to explain? Would a
'.?' x 32766
do any better?
 
B

Brian McCauley

Brian said:
String with length 16777217 would fail but probably not before the end
of the universe.

Looking at some much smaller numbers and counting the number of times
the RE engine backtracks you may start to see the problem...

local our $count;
('x' x 2000 ) =~ /(?m-xis:^(.{0,9}){0,4}(?{ $count++ })$)/;
print $count;

Prints 8201.

After reading what Ilya said (and re-reading it a dozen time) I now
realise this is triviallly fixed.

local our $count;
('x' x 2000 ) =~ /(?m-xis:^(?>.{0,9}){0,4}(?{ $count++ })$)/;
print $count;

Prints 5.
 
B

Brian McCauley

Dr.Ruud said:
Ilya Zakharevich schreef:
Dr.Ruud:
[attribution repaired] (e-mail address removed):
qr/^.{1,$Radio::AudioNotes::COL_SIZE->{TEXT}}$/m, ],
Perl says:
Quantifier in {,} bigger than 32766 before HERE mark in regex
m/^.{ << HERE 1,65535}$/

Generate is as
/^.{1,32766}.{0,32766}.{0,3}$/

This works by coincidence only. One needs (?>) or an alternative
without two consequent quantified expressions.

Care to explain?

Consider the situation where the string is too long. On the first
attempt the 3 parts of your pattern match 32766,32766,3 characters
respectively. Then the RE engine hits the $ assertion and that fails.
So the RE engine back tracks and tries 32766,32766,2 then 32766,32766,1
then 32766,32766,0 then 32766,32765,3 and so on for 32766*32767*4
iterations.
Would a
'.?' x 32766
do any better?

Depends what you call better. That pattern would take 2**32766
iterations before it failed.
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was NOT [per weedlist] sent to
Brian McCauley
local our $count;
('x' x 2000 ) =~ /(?m-xis:^(?>.{0,9}){0,4}(?{ $count++ })$)/;
print $count;

I think it is clearer to write it as

qr/ (.{9}){0,3} .{0,9} /x

This is slower than "the correct"

qr/ (?> (.{9}){0,3} ) .{0,9} /x

but unless in a tight loop, the difference should be tolerable...

Hope this helps,
Ilya
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was NOT [per weedlist] sent to
Brian McCauley
Depends what you call better. That pattern would take 2**32766
iterations before it failed.

Patterns with 16 or less "complicated repetitiors" have a special
optimization; they are always "polynomial time in THESE repetitors".
But if too many repetitors, or if "simple repetitors" happen to "be
chained", one can still get into the exponential tar pit...

qr/.?/ being not "complicated", there should be no optimization, so
with many of them one can hit exponential behaviour quite soon
indeed...

Hope this helps,
Ilya
 
D

Dr.Ruud

Brian McCauley schreef:
local our $count;
('x' x 2000 ) =~ /(?m-xis:^(?>.{0,9}){0,4}(?{ $count++ })$)/;
print $count;

Prints 5.


(perl, v5.8.6 built for i386-freebsd-64int)

I compared

perl -le '("z"x55) =~ /^(.{0,10}){1,5}(?{$count++})$/m;
print "<$count>"'

to

perl -le '("z"x55) =~ /(?m-xis:^(.{0,10}){1,5}(?{$count++})$)/;
print "<$count>"'

and to

perl -le '("z"x55) =~ /(?m-xis)^(.{0,10}){1,5}(?{$count++})$/;
print "<$count>"'



and found that the first and the last variant return quickly (without
incrementing $count) and (only) the second variant takes more time.


I also found that you can't use the Perlish 32_766 for a quantifier in a
regex.
 
B

Brian McCauley

Dr.Ruud said:
(perl, v5.8.6 built for i386-freebsd-64int)

I compared

perl -le '("z"x55) =~ /^(.{0,10}){1,5}(?{$count++})$/m;
print "<$count>"'

to

perl -le '("z"x55) =~ /(?m-xis:^(.{0,10}){1,5}(?{$count++})$)/;
print "<$count>"'

and to

perl -le '("z"x55) =~ /(?m-xis)^(.{0,10}){1,5}(?{$count++})$/;
print "<$count>"'

and found that the first and the last variant return quickly (without
incrementing $count) and (only) the second variant takes more time.

Gee, regex optomisations scare me sometimes. That's one for the next
installment of "Usenet Gems". Just as soon as I figure out what's
going on here.
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was NOT [per weedlist] sent to
Dr.Ruud
Brian McCauley schreef:



(perl, v5.8.6 built for i386-freebsd-64int)

I compared

perl -le '("z"x55) =~ /^(.{0,10}){1,5}(?{$count++})$/m;
print "<$count>"'

to

perl -le '("z"x55) =~ /(?m-xis:^(.{0,10}){1,5}(?{$count++})$)/;
print "<$count>"'

and to

perl -le '("z"x55) =~ /(?m-xis)^(.{0,10}){1,5}(?{$count++})$/;
print "<$count>"'



and found that the first and the last variant return quickly (without
incrementing $count) and (only) the second variant takes more time.

perl -Mre=debugcolor -wle '...'

(I did it with -c only, and see that the first one finds
floating `'$ at 0..50
the second one does not. Since the generated code looks identical,
this is most probably a bug in the optimizer: by some reason it loses
its hopes too soon.)

Hope this helps,
Ilya
 
D

Dr.Ruud

Ilya Zakharevich schreef:
Dr.Ruud:

perl -Mre=debugcolor -wle '...'

(I did it with -c only, and see that the first one finds
floating `'$ at 0..50
the second one does not. Since the generated code looks identical,
this is most probably a bug in the optimizer: by some reason it loses
its hopes too soon.)

OK, I just reported it with perlbug, ID [perl #39096].
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top