S
S3
--Boundary_(ID_wcHSDvAB6xTg9VN04MWj4Q)
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I am trying to write a simple script to parse
an Apache log file, but it is taking an extremely
long time. I used profile and the problem appears
to be with the regular expression matcher.
I have made a simple script to run it with different
lengths and it appears that the regular expression
matcher is being very slow. See the attached script.
Here are some timings:
~>time ./mklong.rb 1000
real 0m11.930s
~>time ./mklong.rb 2000
real 0m50.400s
~>time ./mklong.rb 3000
real 1m55.693s
~>time ./mklong.rb 4000
real 3m16.004s
So, dividing it out,
1000/11 = 91
2000/50 = 40
3000/120 = 25
4000/200 = 20
So, the matching appears to be much slower than O(n).
Isn't the whole point of regular expressions to be
fast and O(n)?
Whenever my script encounters a long string,
it grinds to a halt.
Why is this?
Did I make the regular expression correctly?
Is there some way to optimize it?
Is there a problem with the matcher?
Any help would be appreciated.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2-ecc0.1.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFExRiSxzVgPqtIcfsRAmJGAJ9FmvUTT7Q0692yIVvexWoSvg8FDQCdGBgZ
5R2ieCXoflMUgiwYVCQuMaI=
=d8Zp
-----END PGP SIGNATURE-----
--Boundary_(ID_wcHSDvAB6xTg9VN04MWj4Q)
Content-type: text/plain; name=mklong.rb
Content-transfer-encoding: 7BIT
Content-disposition: inline; filename=mklong.rb
#!/usr/bin/ruby
str='67.39.177.137 - - [05/Jun/2004:12:54:44 -0500] "SEARCH '
str<<'\xb1\x02'*(ARGV[0].to_i)
str<<'" 414 326 "-" "-"'
logformat=/(\S+)\s+(\S+)\s+(.+)\s+\[([^\]]+)\]\s+"(\S+) +([^"]+) +[A-Za-z\/]*([0-9.]+)"\s+(\S+)\s+(\S+)\s+"([^"]+)"\s+"([^"]+)"/
str.match(logformat)
--Boundary_(ID_wcHSDvAB6xTg9VN04MWj4Q)--
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I am trying to write a simple script to parse
an Apache log file, but it is taking an extremely
long time. I used profile and the problem appears
to be with the regular expression matcher.
I have made a simple script to run it with different
lengths and it appears that the regular expression
matcher is being very slow. See the attached script.
Here are some timings:
~>time ./mklong.rb 1000
real 0m11.930s
~>time ./mklong.rb 2000
real 0m50.400s
~>time ./mklong.rb 3000
real 1m55.693s
~>time ./mklong.rb 4000
real 3m16.004s
So, dividing it out,
1000/11 = 91
2000/50 = 40
3000/120 = 25
4000/200 = 20
So, the matching appears to be much slower than O(n).
Isn't the whole point of regular expressions to be
fast and O(n)?
Whenever my script encounters a long string,
it grinds to a halt.
Why is this?
Did I make the regular expression correctly?
Is there some way to optimize it?
Is there a problem with the matcher?
Any help would be appreciated.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2-ecc0.1.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFExRiSxzVgPqtIcfsRAmJGAJ9FmvUTT7Q0692yIVvexWoSvg8FDQCdGBgZ
5R2ieCXoflMUgiwYVCQuMaI=
=d8Zp
-----END PGP SIGNATURE-----
--Boundary_(ID_wcHSDvAB6xTg9VN04MWj4Q)
Content-type: text/plain; name=mklong.rb
Content-transfer-encoding: 7BIT
Content-disposition: inline; filename=mklong.rb
#!/usr/bin/ruby
str='67.39.177.137 - - [05/Jun/2004:12:54:44 -0500] "SEARCH '
str<<'\xb1\x02'*(ARGV[0].to_i)
str<<'" 414 326 "-" "-"'
logformat=/(\S+)\s+(\S+)\s+(.+)\s+\[([^\]]+)\]\s+"(\S+) +([^"]+) +[A-Za-z\/]*([0-9.]+)"\s+(\S+)\s+(\S+)\s+"([^"]+)"\s+"([^"]+)"/
str.match(logformat)
--Boundary_(ID_wcHSDvAB6xTg9VN04MWj4Q)--