Quantifier...bigger than 32766...in regex

Discussion in 'Perl Misc' started by leegee@gmail.com, May 4, 2006.

  1. Guest

    Please help. When I say:

    qr/^.{1,$Radio::AudioNotes::COL_SIZE->{TEXT}}$/m, ],

    Perl says:

    Quantifier in {,} bigger than 32766 before HERE mark in regex m/^.{
    << HERE 1,65535}$/

    It appears from the archives of this gruop that this is unavoidable -
    does anyone know otherwise?

    Or can anyone think of a work-around that would fit the above?

    I'm trying to limit an input string to a specific length, that of a
    MySQL MEDIUMTEXT column.

    Thanks in anticipation,
    lee
    , May 4, 2006
    #1
    1. Advertising

  2. wrote in news:1146742722.221433.143330
    @y43g2000cwc.googlegroups.com:

    > Please help. When I say:
    >
    > qr/^.{1,$Radio::AudioNotes::COL_SIZE->{TEXT}}$/m, ],

    ....

    > I'm trying to limit an input string to a specific length, that of a
    > MySQL MEDIUMTEXT column.


    Yeeehaaaw! I think I spotted another SAQ:

    See

    perldoc -f length

    Sinan

    --
    A. Sinan Unur <>
    (remove .invalid and reverse each component for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
    A. Sinan Unur, May 4, 2006
    #2
    1. Advertising

  3. Guest

    I'd be grateful to hear an answer better than this (please!):

    use strict;

    my %COL_SIZE = (
    aTINYTEXT => 255, # 8-bit
    bTEXT => 65535, # 16-bit
    cMEDIUMTEXT => 16777215, # 32-bit
    dLONGTEXT => 4294967295, # 64-bit
    );

    foreach (sort keys %COL_SIZE){
    print $_
    ."\n\t".
    ($COL_SIZE{$_} / 32766)
    ."\n\t".
    int ($COL_SIZE{$_} / 32766)
    ."\n\t".
    ($COL_SIZE{$_} % 32766)
    ."\n\n".
    'qr['.
    (
    '.{1,32766}' x int ($COL_SIZE{$_} / 32766)
    )
    .'.{1,'.($COL_SIZE{$_} % 32766).'}'
    .']'
    ."\n\n\n"
    }

    __END__

    Outputs:

    aTINYTEXT
    0.00778245742537997
    0
    255

    qr[.{1,255}]


    bTEXT
    2.00009155832265
    2
    3

    qr[.{1,32766}.{1,32766}.{1,3}]

    [...etc]
    , May 4, 2006
    #3
  4. David Squire Guest

    wrote:
    > I'd be grateful to hear an answer better than this (please!):


    Than what? Please quote context when replying (see the posting
    guidelines that are posted here frequently)

    DS
    David Squire, May 4, 2006
    #4
  5. LeeGee Guest

    Problem as posted at top of thread: when I say:

    qr/^.{1,$Radio::AudioNotes::COL_SIZE->{TEXT}}$/m, ],

    Perl says:

    Quantifier in {,} bigger than 32766 before HERE mark in regex m/^.{

    << HERE 1,65535}$/

    (Described in perldoc prelre)

    Best solution so far, but I would appreciate a "nicer" one:

    use strict;

    $|=0;
    my %COL_SIZE = (
    bTINYTEXT => 255, # 8-bit
    cTEXT => 65535, # 16-bit
    dMEDIUMTEXT => 16777215, # 32-bit
    # eLONGTEXT => 4294967295, # 64-bit
    );

    foreach (sort keys %COL_SIZE){
    my $s = '^';
    $s .= '(.{0,32766}){0,' . int($COL_SIZE{$_} / 32766) .'}' if
    int($COL_SIZE{$_} / 32766);
    $s .= '.{0,'.($COL_SIZE{$_} % 32766).'}' if ($COL_SIZE{$_} % 32766);
    $s .= '$';

    my $qr = qr[$s]m;
    die ref $qr if ref $qr ne 'Regexp';

    print $qr."\n";
    for my $i ( reverse sort values %COL_SIZE ){
    my $t = 'x' x $i;
    print "\tSrting with length $i ";
    if ($t !~ $qr){
    print "FAILS regex\n"
    } else {
    print "PASSES regex\n";
    }

    }
    print "\n\n";
    }

    __END__

    Outputs:

    (?m-xis:^.{0,255}$)
    Srting with length 65535 FAILS regex
    Srting with length 255 PASSES regex
    Srting with length 16777215 FAILS regex


    (?m-xis:^(.{0,32766}){0,2}.{0,3}$)
    Srting with length 65535 PASSES regex
    Srting with length 255 PASSES regex
    Srting with length 16777215 FAILS regex


    (?m-xis:^(.{0,32766}){0,512}.{0,1023}$)
    Srting with length 65535 PASSES regex
    Srting with length 255 PASSES regex
    Srting with length 16777215 PASSES regex



    Tool completed successfully
    LeeGee, May 4, 2006
    #5
  6. LeeGee Guest

    > > wrote in news::
    > >
    > > Please help. When I say:
    > > qr/^.{1,$Radio::AudioNotes::COL_SIZE->{TEXT}}$/m, ],

    > ...
    > > I'm trying to limit an input string to a specific length, that of a
    > > MySQL MEDIUMTEXT column.

    >
    > Yeeehaaaw! I think I spotted another SAQ:
    >
    > See
    > perldoc -f length


    Sorry, Sinan - I forgot to say that I *must* use a regular expression.
    For my own perverse reasons. But thanks for taking the time to be
    helpful.

    Lee
    LeeGee, May 4, 2006
    #6
  7. Dr.Ruud Guest

    schreef:
    > Please help. When I say:
    >
    > qr/^.{1,$Radio::AudioNotes::COL_SIZE->{TEXT}}$/m, ],
    >
    > Perl says:
    >
    > Quantifier in {,} bigger than 32766 before HERE mark in regex
    > m/^.{ << HERE 1,65535}$/


    Generate is as
    /^.{1,32766}.{0,32766}.{0,3}$/


    #!/usr/bin/perl
    use strict;
    use warnings;

    use constant RE_qmax => 32766;

    my $n = 65536; # $Radio::AudioNotes::COL_SIZE->{TEXT} ;

    my $re = q</^.{1,> ;

    while ($n > RE_qmax)
    {
    $re .= RE_qmax . q<}.{0,>;
    $n -= RE_qmax;
    }
    $re = qr<$re$n}\$/>;

    print "$re\n";

    --
    Affijn, Ruud

    "Gewoon is een tijger."
    Dr.Ruud, May 4, 2006
    #7
  8. Dr.Ruud Guest

    Dr.Ruud schreef:
    > schreef:


    >> When I say:
    >>
    >> qr/^.{1,$Radio::AudioNotes::COL_SIZE->{TEXT}}$/m, ],
    >>
    >> Perl says:
    >>
    >> Quantifier in {,} bigger than 32766 before HERE mark in regex
    >> m/^.{ << HERE 1,65535}$/

    >
    > Generate is as
    > /^.{1,32766}.{0,32766}.{0,3}$/


    Corrected code:

    #!/usr/bin/perl
    use strict;
    use warnings;

    use constant RE_qmax => 32766;

    my $n = $Radio::AudioNotes::COL_SIZE->{TEXT} ;
    my $re = q<^.{1,> ;

    while ($n > RE_qmax)
    {
    $re .= RE_qmax . q<}.{0,>;
    $n -= RE_qmax;
    }
    $re = qr<$re$n}$>;

    print "$re\n";

    --
    Affijn, Ruud

    "Gewoon is een tijger."
    Dr.Ruud, May 4, 2006
    #8
  9. Anno Siegel Guest

    LeeGee <> wrote in comp.lang.perl.misc:
    > Problem as posted at top of thread: when I say:
    >
    > qr/^.{1,$Radio::AudioNotes::COL_SIZE->{TEXT}}$/m, ],
    >
    > Perl says:
    >
    > Quantifier in {,} bigger than 32766 before HERE mark in regex m/^.{
    >
    > << HERE 1,65535}$/
    >
    > (Described in perldoc prelre)
    >
    > Best solution so far, but I would appreciate a "nicer" one:


    [snip]

    How about

    # construct a regex that matches up to $n characters without using
    # a quantifier > MAX
    use constant MAX => 32766;
    sub mk_re {
    my $n = shift;
    my $q = int $n/MAX;
    die "Size $n too large" if $q > MAX; # $n > MAX**2
    sprintf
    '^(?:(?:.{%d}){0,%d}.{0,%d}|(?:.{%d}){%d}.{0,%d})$',
    MAX, $q-1, MAX, MAX, $q, $n % MAX;
    }

    This works up to a size of MAX**2. The regex matches an empty string,
    which is not quite to specification. That can be repaired in various
    ways. If you want to return the ready-made qr/.../, you can use

    return qr/$_/ for sprintf ...;


    Anno
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
    Anno Siegel, May 4, 2006
    #9
  10. LeeGee wrote:

    > (?m-xis:^(.{0,32766}){0,512}.{0,1023}$)
    > Srting with length 65535 PASSES regex
    > Srting with length 255 PASSES regex
    > Srting with length 16777215 PASSES regex


    String with length 16777217 would fail but probably not before the end
    of the universe.

    Looking at some much smaller numbers and counting the number of times
    the RE engine backtracks you may start to see the problem...

    local our $count;
    ('x' x 2000 ) =~ /(?m-xis:^(.{0,9}){0,4}(?{ $count++ })$)/;
    print $count;

    Prints 8201.
    Brian McCauley, May 4, 2006
    #10
  11. [A complimentary Cc of this posting was NOT [per weedlist] sent to
    Dr.Ruud
    <>], who wrote in article <>:
    > > qr/^.{1,$Radio::AudioNotes::COL_SIZE->{TEXT}}$/m, ],
    > > Perl says:
    > > Quantifier in {,} bigger than 32766 before HERE mark in regex
    > > m/^.{ << HERE 1,65535}$/


    > Generate is as
    > /^.{1,32766}.{0,32766}.{0,3}$/


    This works by coincidence only. One needs (?>) or an alternative
    without two consequent quantified expressions.

    Hope this helps,
    Ilya
    Ilya Zakharevich, May 5, 2006
    #11
  12. Dr.Ruud Guest

    Ilya Zakharevich schreef:
    > Dr.Ruud:
    >> [attribution repaired] :


    >>> qr/^.{1,$Radio::AudioNotes::COL_SIZE->{TEXT}}$/m, ],
    >>> Perl says:
    >>> Quantifier in {,} bigger than 32766 before HERE mark in regex
    >>> m/^.{ << HERE 1,65535}$/

    >>
    >> Generate is as
    >> /^.{1,32766}.{0,32766}.{0,3}$/

    >
    > This works by coincidence only. One needs (?>) or an alternative
    > without two consequent quantified expressions.


    Care to explain? Would a
    '.?' x 32766
    do any better?

    --
    Affijn, Ruud

    "Gewoon is een tijger."
    Dr.Ruud, May 5, 2006
    #12
  13. Brian McCauley wrote:
    > LeeGee wrote:
    >
    > > (?m-xis:^(.{0,32766}){0,512}.{0,1023}$)
    > > Srting with length 65535 PASSES regex
    > > Srting with length 255 PASSES regex
    > > Srting with length 16777215 PASSES regex

    >
    > String with length 16777217 would fail but probably not before the end
    > of the universe.
    >
    > Looking at some much smaller numbers and counting the number of times
    > the RE engine backtracks you may start to see the problem...
    >
    > local our $count;
    > ('x' x 2000 ) =~ /(?m-xis:^(.{0,9}){0,4}(?{ $count++ })$)/;
    > print $count;
    >
    > Prints 8201.


    After reading what Ilya said (and re-reading it a dozen time) I now
    realise this is triviallly fixed.

    local our $count;
    ('x' x 2000 ) =~ /(?m-xis:^(?>.{0,9}){0,4}(?{ $count++ })$)/;
    print $count;

    Prints 5.
    Brian McCauley, May 5, 2006
    #13
  14. Dr.Ruud wrote:
    > Ilya Zakharevich schreef:
    > > Dr.Ruud:
    > >> [attribution repaired] :

    >
    > >>> qr/^.{1,$Radio::AudioNotes::COL_SIZE->{TEXT}}$/m, ],
    > >>> Perl says:
    > >>> Quantifier in {,} bigger than 32766 before HERE mark in regex
    > >>> m/^.{ << HERE 1,65535}$/
    > >>
    > >> Generate is as
    > >> /^.{1,32766}.{0,32766}.{0,3}$/

    > >
    > > This works by coincidence only. One needs (?>) or an alternative
    > > without two consequent quantified expressions.

    >
    > Care to explain?


    Consider the situation where the string is too long. On the first
    attempt the 3 parts of your pattern match 32766,32766,3 characters
    respectively. Then the RE engine hits the $ assertion and that fails.
    So the RE engine back tracks and tries 32766,32766,2 then 32766,32766,1
    then 32766,32766,0 then 32766,32765,3 and so on for 32766*32767*4
    iterations.

    > Would a
    > '.?' x 32766
    > do any better?


    Depends what you call better. That pattern would take 2**32766
    iterations before it failed.
    Brian McCauley, May 5, 2006
    #14
  15. [A complimentary Cc of this posting was NOT [per weedlist] sent to
    Brian McCauley
    <>], who wrote in article <>:
    > local our $count;
    > ('x' x 2000 ) =~ /(?m-xis:^(?>.{0,9}){0,4}(?{ $count++ })$)/;
    > print $count;


    I think it is clearer to write it as

    qr/ (.{9}){0,3} .{0,9} /x

    This is slower than "the correct"

    qr/ (?> (.{9}){0,3} ) .{0,9} /x

    but unless in a tight loop, the difference should be tolerable...

    Hope this helps,
    Ilya
    Ilya Zakharevich, May 5, 2006
    #15
  16. [A complimentary Cc of this posting was NOT [per weedlist] sent to
    Brian McCauley
    <>], who wrote in article <>:
    > > Would a
    > > '.?' x 32766
    > > do any better?


    > Depends what you call better. That pattern would take 2**32766
    > iterations before it failed.


    Patterns with 16 or less "complicated repetitiors" have a special
    optimization; they are always "polynomial time in THESE repetitors".
    But if too many repetitors, or if "simple repetitors" happen to "be
    chained", one can still get into the exponential tar pit...

    qr/.?/ being not "complicated", there should be no optimization, so
    with many of them one can hit exponential behaviour quite soon
    indeed...

    Hope this helps,
    Ilya
    Ilya Zakharevich, May 5, 2006
    #16
  17. Dr.Ruud Guest

    Brian McCauley schreef:

    > local our $count;
    > ('x' x 2000 ) =~ /(?m-xis:^(?>.{0,9}){0,4}(?{ $count++ })$)/;
    > print $count;
    >
    > Prints 5.



    (perl, v5.8.6 built for i386-freebsd-64int)

    I compared

    perl -le '("z"x55) =~ /^(.{0,10}){1,5}(?{$count++})$/m;
    print "<$count>"'

    to

    perl -le '("z"x55) =~ /(?m-xis:^(.{0,10}){1,5}(?{$count++})$)/;
    print "<$count>"'

    and to

    perl -le '("z"x55) =~ /(?m-xis)^(.{0,10}){1,5}(?{$count++})$/;
    print "<$count>"'



    and found that the first and the last variant return quickly (without
    incrementing $count) and (only) the second variant takes more time.


    I also found that you can't use the Perlish 32_766 for a quantifier in a
    regex.

    --
    Affijn, Ruud

    "Gewoon is een tijger."
    Dr.Ruud, May 5, 2006
    #17
  18. Dr.Ruud wrote:
    > (perl, v5.8.6 built for i386-freebsd-64int)
    >
    > I compared
    >
    > perl -le '("z"x55) =~ /^(.{0,10}){1,5}(?{$count++})$/m;
    > print "<$count>"'
    >
    > to
    >
    > perl -le '("z"x55) =~ /(?m-xis:^(.{0,10}){1,5}(?{$count++})$)/;
    > print "<$count>"'
    >
    > and to
    >
    > perl -le '("z"x55) =~ /(?m-xis)^(.{0,10}){1,5}(?{$count++})$/;
    > print "<$count>"'
    >
    > and found that the first and the last variant return quickly (without
    > incrementing $count) and (only) the second variant takes more time.


    Gee, regex optomisations scare me sometimes. That's one for the next
    installment of "Usenet Gems". Just as soon as I figure out what's
    going on here.
    Brian McCauley, May 6, 2006
    #18
  19. [A complimentary Cc of this posting was NOT [per weedlist] sent to
    Dr.Ruud
    <>], who wrote in article <>:
    > Brian McCauley schreef:
    >
    > > local our $count;
    > > ('x' x 2000 ) =~ /(?m-xis:^(?>.{0,9}){0,4}(?{ $count++ })$)/;
    > > print $count;
    > >
    > > Prints 5.

    >
    >
    > (perl, v5.8.6 built for i386-freebsd-64int)
    >
    > I compared
    >
    > perl -le '("z"x55) =~ /^(.{0,10}){1,5}(?{$count++})$/m;
    > print "<$count>"'
    >
    > to
    >
    > perl -le '("z"x55) =~ /(?m-xis:^(.{0,10}){1,5}(?{$count++})$)/;
    > print "<$count>"'
    >
    > and to
    >
    > perl -le '("z"x55) =~ /(?m-xis)^(.{0,10}){1,5}(?{$count++})$/;
    > print "<$count>"'
    >
    >
    >
    > and found that the first and the last variant return quickly (without
    > incrementing $count) and (only) the second variant takes more time.


    perl -Mre=debugcolor -wle '...'

    (I did it with -c only, and see that the first one finds
    floating `'$ at 0..50
    the second one does not. Since the generated code looks identical,
    this is most probably a bug in the optimizer: by some reason it loses
    its hopes too soon.)

    Hope this helps,
    Ilya
    Ilya Zakharevich, May 6, 2006
    #19
  20. Dr.Ruud Guest

    Ilya Zakharevich schreef:
    > Dr.Ruud:
    >> Brian McCauley:


    >>> local our $count;
    >>> ('x' x 2000 ) =~ /(?m-xis:^(?>.{0,9}){0,4}(?{ $count++ })$)/;
    >>> print $count;
    >>>
    >>> Prints 5.

    >>
    >> perl -le '("z"x55) =~ /^(.{0,10}){1,5}(?{$count++})$/m;
    >> print "<$count>"'
    >>
    >> perl -le '("z"x55) =~ /(?m-xis:^(.{0,10}){1,5}(?{$count++})$)/;
    >> print "<$count>"'
    >>
    >> perl -le '("z"x55) =~ /(?m-xis)^(.{0,10}){1,5}(?{$count++})$/;
    >> print "<$count>"'
    >>
    >> the first and the last variant return quickly (without
    >> incrementing $count) and (only) the second variant takes more time.

    >
    > perl -Mre=debugcolor -wle '...'
    >
    > (I did it with -c only, and see that the first one finds
    > floating `'$ at 0..50
    > the second one does not. Since the generated code looks identical,
    > this is most probably a bug in the optimizer: by some reason it loses
    > its hopes too soon.)


    OK, I just reported it with perlbug, ID [perl #39096].

    --
    Affijn, Ruud

    "Gewoon is een tijger."
    Dr.Ruud, May 7, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    767
    Filip Larsen
    Apr 10, 2007
  2. Yannick Turgeon

    App getting bigger and bigger

    Yannick Turgeon, Oct 13, 2003, in forum: Perl Misc
    Replies:
    1
    Views:
    133
    Yannick Turgeon
    Oct 14, 2003
  3. Jack
    Replies:
    2
    Views:
    279
    Tad McClellan
    Oct 4, 2006
  4. C3
    Replies:
    1
    Views:
    102
    Tad McClellan
    Oct 30, 2006
  5. cibalo
    Replies:
    1
    Views:
    178
    Damien Wyart
    May 17, 2013
Loading...

Share This Page