regular expression too big

B

brabuhr

I built a local version of 1.8.5 with the oniguruma engine:
http://raa.ruby-lang.org/project/oniguruma/

And re-ran (a slight variation of) my test program:

I thought I'd try running under jruby too:

$ ruby long_regex_test.rb
Took 0.000153 seconds to convert 1 words into a regex 17 bytes long.
Took 0.000381 seconds to convert 2 words into a regex 20 bytes long.
Took 0.000393 seconds to convert 4 words into a regex 36 bytes long.
Took 0.000629 seconds to convert 8 words into a regex 93 bytes long.
Took 0.001359 seconds to convert 16 words into a regex 180 bytes long.
Took 0.002261 seconds to convert 32 words into a regex 360 bytes long.
Took 0.007304 seconds to convert 64 words into a regex 741 bytes long.
Took 0.013601 seconds to convert 128 words into a regex 1348 bytes long.
Took 0.028273 seconds to convert 256 words into a regex 2746 bytes long.
Took 0.066228 seconds to convert 512 words into a regex 5345 bytes long.
Took 0.177105 seconds to convert 1024 words into a regex 10017 bytes long.
Took 0.330573 seconds to convert 2048 words into a regex 19597 bytes long.
Took 1.390542 seconds to convert 4096 words into a regex 37345 bytes long.
long_regex_test.rb:26:in `match': regular expression too big:
/(?:A(?:cr(?:edula|opora)|d(?:ar|elochorda|ventis[mt])|frogaean|hepatokla|ileen|l(?:adinist|l(?:a(?:manda|sch)|otheria)|ticamelus)|m(?:bystomidae|ericanly|ioidei|phioxidae)|n(?:chisaurus|d(?:aman|romache)|olympiad|t(?:echinomys|h(?:eek:phila|ropozoic)))|patornis|r(?:ab|chelenis|istarch)|s(?:caridia|elli|hantee|ilidae|terias)|tropa|u(?:riculidae|stroasiatic))|B(?:a(?:cchus|eria|haism|iera|k(?:shaish|wiri)|re|sili(?:ca|scus))|e(?:atrice|l(?:g(?:ae|ic)|shazzaresque)|mbex|rn(?:inesque|oullian))|i(?:elid|lati|smarck|tis)|lackfoot|o(?:hemia|llandist|rrovian)|ra(?:m|nchiopulmonata)|u(?:nga|phthalmum)|yroni(?:cs|te))|C(?:a(?:ctales|l(?:edonia|li(?:carpa|stephus)|ochortaceae|vados|ycophorae)|m(?:bodian|orra)|ntabri|p(?:ito(?:line)?|sidae)|r(?:eek:lan|tist)|s(?:sandra|tanospermum)|thari)|e(?:ntrarchidae|strian)|h(?:arontas|e(?:lura|makuan)|rist(?:ianomastix|li(?:keness|ness)|mas))|lathrus|o(?:bleskill|fane|l(?:letidae|ossian)|m(?:melinaceae|us)|rybantic)|rocus|u(?:cumariidae|thbert)|y(?:clospondy
(RegexpError)
from long_regex_test.rb:26
from long_regex_test.rb:15:in `times'
from long_regex_test.rb:15

$ /opt/ruby/v1.8.5-oniguruma/bin/ruby long_regex_test.rb
Took 0.000211 seconds to convert 1 words into a regex 5 bytes long.
Took 0.000334 seconds to convert 2 words into a regex 24 bytes long.
Took 0.000215 seconds to convert 4 words into a regex 52 bytes long.
Took 0.000836 seconds to convert 8 words into a regex 92 bytes long.
Took 0.000885 seconds to convert 16 words into a regex 173 bytes long.
Took 0.002779 seconds to convert 32 words into a regex 345 bytes long.
Took 0.004934 seconds to convert 64 words into a regex 725 bytes long.
Took 0.009765 seconds to convert 128 words into a regex 1369 bytes long.
Took 0.020761 seconds to convert 256 words into a regex 2737 bytes long.
Took 0.088759 seconds to convert 512 words into a regex 5408 bytes long.
Took 0.144276 seconds to convert 1024 words into a regex 10131 bytes long.
Took 0.246762 seconds to convert 2048 words into a regex 19531 bytes long.
Took 0.667575 seconds to convert 4096 words into a regex 37498 bytes long.
Took 1.677037 seconds to convert 8192 words into a regex 71352 bytes long.
Took 2.971277 seconds to convert 16384 words into a regex 133499 bytes long.
Took 6.078681 seconds to convert 32768 words into a regex 245318 bytes long.
Took 13.001538 seconds to convert 65536 words into a regex 433611 bytes long.
Took 26.791838 seconds to convert 131072 words into a regex 713229 bytes long.
Took 47.691109 seconds to convert 262144 words into a regex 1061186 bytes long.
Took 71.050324 seconds to convert 524288 words into a regex 1354567 bytes long.

$ export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Home
$ ~/Desktop/jruby-0.9.1/bin/jruby long_regex_test.rb
Took 0.032 seconds to convert 1 words into a regex 9 bytes long.
Took 0.012 seconds to convert 2 words into a regex 18 bytes long.
Took 0.624 seconds to convert 4 words into a regex 40 bytes long.
Took 0.033 seconds to convert 8 words into a regex 95 bytes long.
Took 0.095 seconds to convert 16 words into a regex 156 bytes long.
Took 0.057 seconds to convert 32 words into a regex 358 bytes long.
Took 0.171 seconds to convert 64 words into a regex 743 bytes long.
Took 0.309 seconds to convert 128 words into a regex 1402 bytes long.
Took 0.40900000000000003 seconds to convert 256 words into a regex
2692 bytes long.
Took 1.863 seconds to convert 512 words into a regex 5341 bytes long.
Took 0.838 seconds to convert 1024 words into a regex 10328 bytes long.
Took 1.504 seconds to convert 2048 words into a regex 19733 bytes long.
Took 2.814 seconds to convert 4096 words into a regex 37334 bytes long.
Took 8.177 seconds to convert 8192 words into a regex 71593 bytes long.
Took 15.181000000000001 seconds to convert 16384 words into a regex
133779 bytes long.
Took 30.695 seconds to convert 32768 words into a regex 244280 bytes long.
Took 61.555 seconds to convert 65536 words into a regex 432751 bytes long.
Took 155.94400000000002 seconds to convert 131072 words into a regex
713573 bytes long.
Took 224.93 seconds to convert 262144 words into a regex 1060079 bytes long.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
 
X

x1

#would_never_do.rb
matched = `perl -e 'if ("Hello" =~ m/(Hello)/) {print "true";}'`


I built a local version of 1.8.5 with the oniguruma engine:
http://raa.ruby-lang.org/project/oniguruma/

And re-ran (a slight variation of) my test program:

I thought I'd try running under jruby too:

$ ruby long_regex_test.rb
Took 0.000153 seconds to convert 1 words into a regex 17 bytes long.
Took 0.000381 seconds to convert 2 words into a regex 20 bytes long.
Took 0.000393 seconds to convert 4 words into a regex 36 bytes long.
Took 0.000629 seconds to convert 8 words into a regex 93 bytes long.
Took 0.001359 seconds to convert 16 words into a regex 180 bytes long.
Took 0.002261 seconds to convert 32 words into a regex 360 bytes long.
Took 0.007304 seconds to convert 64 words into a regex 741 bytes long.
Took 0.013601 seconds to convert 128 words into a regex 1348 bytes long.
Took 0.028273 seconds to convert 256 words into a regex 2746 bytes long.
Took 0.066228 seconds to convert 512 words into a regex 5345 bytes long.
Took 0.177105 seconds to convert 1024 words into a regex 10017 bytes long.
Took 0.330573 seconds to convert 2048 words into a regex 19597 bytes long.
Took 1.390542 seconds to convert 4096 words into a regex 37345 bytes long.
long_regex_test.rb:26:in `match': regular expression too big:
/(?:A(?:cr(?:edula|opora)|d(?:ar|elochorda|ventis[mt])|frogaean|hepatokla|ileen|l(?:adinist|l(?:a(?:manda|sch)|otheria)|ticamelus)|m(?:bystomidae|ericanly|ioidei|phioxidae)|n(?:chisaurus|d(?:aman|romache)|olympiad|t(?:echinomys|h(?:eek:phila|ropozoic)))|patornis|r(?:ab|chelenis|istarch)|s(?:caridia|elli|hantee|ilidae|terias)|tropa|u(?:riculidae|stroasiatic))|B(?:a(?:cchus|eria|haism|iera|k(?:shaish|wiri)|re|sili(?:ca|scus))|e(?:atrice|l(?:g(?:ae|ic)|shazzaresque)|mbex|rn(?:inesque|oullian))|i(?:elid|lati|smarck|tis)|lackfoot|o(?:hemia|llandist|rrovian)|ra(?:m|nchiopulmonata)|u(?:nga|phthalmum)|yroni(?:cs|te))|C(?:a(?:ctales|l(?:edonia|li(?:carpa|stephus)|ochortaceae|vados|ycophorae)|m(?:bodian|orra)|ntabri|p(?:ito(?:line)?|sidae)|r(?:eek:lan|tist)|s(?:sandra|tanospermum)|thari)|e(?:ntrarchidae|strian)|h(?:arontas|e(?:lura|makuan)|rist(?:ianomastix|li(?:keness|ness)|mas))|lathrus|o(?:bleskill|fane|l(?:letidae|ossian)|m(?:melinaceae|us)|rybantic)|rocus|u(?:cumariidae|thbert)|y(?:clospondy
(RegexpError)
from long_regex_test.rb:26
from long_regex_test.rb:15:in `times'
from long_regex_test.rb:15

$ /opt/ruby/v1.8.5-oniguruma/bin/ruby long_regex_test.rb
Took 0.000211 seconds to convert 1 words into a regex 5 bytes long.
Took 0.000334 seconds to convert 2 words into a regex 24 bytes long.
Took 0.000215 seconds to convert 4 words into a regex 52 bytes long.
Took 0.000836 seconds to convert 8 words into a regex 92 bytes long.
Took 0.000885 seconds to convert 16 words into a regex 173 bytes long.
Took 0.002779 seconds to convert 32 words into a regex 345 bytes long.
Took 0.004934 seconds to convert 64 words into a regex 725 bytes long.
Took 0.009765 seconds to convert 128 words into a regex 1369 bytes long.
Took 0.020761 seconds to convert 256 words into a regex 2737 bytes long.
Took 0.088759 seconds to convert 512 words into a regex 5408 bytes long.
Took 0.144276 seconds to convert 1024 words into a regex 10131 bytes long.
Took 0.246762 seconds to convert 2048 words into a regex 19531 bytes long.
Took 0.667575 seconds to convert 4096 words into a regex 37498 bytes long.
Took 1.677037 seconds to convert 8192 words into a regex 71352 bytes long.
Took 2.971277 seconds to convert 16384 words into a regex 133499 bytes long.
Took 6.078681 seconds to convert 32768 words into a regex 245318 bytes long.
Took 13.001538 seconds to convert 65536 words into a regex 433611 bytes long.
Took 26.791838 seconds to convert 131072 words into a regex 713229 bytes long.
Took 47.691109 seconds to convert 262144 words into a regex 1061186 bytes long.
Took 71.050324 seconds to convert 524288 words into a regex 1354567 bytes long.

$ export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Home
$ ~/Desktop/jruby-0.9.1/bin/jruby long_regex_test.rb
Took 0.032 seconds to convert 1 words into a regex 9 bytes long.
Took 0.012 seconds to convert 2 words into a regex 18 bytes long.
Took 0.624 seconds to convert 4 words into a regex 40 bytes long.
Took 0.033 seconds to convert 8 words into a regex 95 bytes long.
Took 0.095 seconds to convert 16 words into a regex 156 bytes long.
Took 0.057 seconds to convert 32 words into a regex 358 bytes long.
Took 0.171 seconds to convert 64 words into a regex 743 bytes long.
Took 0.309 seconds to convert 128 words into a regex 1402 bytes long.
Took 0.40900000000000003 seconds to convert 256 words into a regex
2692 bytes long.
Took 1.863 seconds to convert 512 words into a regex 5341 bytes long.
Took 0.838 seconds to convert 1024 words into a regex 10328 bytes long.
Took 1.504 seconds to convert 2048 words into a regex 19733 bytes long.
Took 2.814 seconds to convert 4096 words into a regex 37334 bytes long.
Took 8.177 seconds to convert 8192 words into a regex 71593 bytes long.
Took 15.181000000000001 seconds to convert 16384 words into a regex
133779 bytes long.
Took 30.695 seconds to convert 32768 words into a regex 244280 bytes long.
Took 61.555 seconds to convert 65536 words into a regex 432751 bytes long.
Took 155.94400000000002 seconds to convert 131072 words into a regex
713573 bytes long.
Took 224.93 seconds to convert 262144 words into a regex 1060079 bytes long.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,754
Messages
2,569,527
Members
44,998
Latest member
MarissaEub

Latest Threads

Top