regex, number of matches

D

Dr.Ruud

Brian Wakem schreef:
The fastest way is to substitute a match with itself.


Not for (counting) random words, the destructive
substition 's/\S+/./g' is faster than 's/(\S+)/\1/'.
(even 50% faster than the '++ while /\S+/g')

list => '$list = 0; $list += () = (/\S+/g) for @data;',
sub => '$sub = 0; $sub += s/(\S+)/$1/g for @data;',
sub2 => '$sub2 = 0; $sub2 += s/(\S+)/\1/g for @data;',
sub3 => '$sub3 = 0; $sub3 += s/\S+/./g for @data;',
while => '$while = 0; do {$while++ while /\S+/g} for @data;',

Rate sub sub2 list while sub3
sub 5318/s -- -2% -24% -52% -68%
sub2 5420/s 2% -- -22% -51% -68%
list 6990/s 31% 29% -- -37% -58%
while 11120/s 109% 105% 59% -- -34%
sub3 16809/s 216% 210% 140% 51% --
 
D

Dr.Ruud

John Bokma schreef:
Dr.Ruud:
Abigail:
[$&]
it's not easy to determine those special circumstances.

How about something like '\&' that can only be used in the
replace-part?

Just because you don't want to use () in the regexp part?

The answer is in "can only be used in the replace-part".

With () you need $1, and that involves setting up a variable.

So no, not only because I etc.
 
J

John Bokma

Dr.Ruud said:
John Bokma schreef:
Dr.Ruud:
Abigail:

[$&]
it's not easy to determine those special circumstances.

How about something like '\&' that can only be used in the
replace-part?

Just because you don't want to use () in the regexp part?

The answer is in "can only be used in the replace-part".

Yes, I understand that part :) Regexes are already extremely complex (I
have often to look up things), and adding more clutter doesn't seem the
answer (to me) that is.

How often do you neet the \& thing.
With () you need $1, and that involves setting up a variable.

And what is \&? And where is the info it contains stored?
 
D

Dr.Ruud

Abigail schreef:
Dr.Ruud:

That's not a fair benchmark. After the first iteration, all the
sequences of non-space characters have been collapsed to single
characters -
reducing the sizes of the strings to match against. And that's of
course faster. It will also influence all tests run after the 'sub3'
test.

Aaargh. The destructive sub was run on a copy @data, to prevent any
influence on following runs, sorry for not making that clear.
But I hadn't thought of the side effects with the iterations.

a way to speed up the counting of words,
and that involves entering the regexp engine once,
instead of once for each line: [...]

while => '$while = 0; do {$while ++ while /\S+/g} for @data;',
join => '$join = 0; my $c = join "" => @data;
$join ++ while $c =~ /\S+/g', [...]

while 5019/s 142% 74% 71% -- -16%
join 5973/s 188% 107% 103% 19% --

The joining "" needs to be a " " for lines that don't have leading or
trailing whitespace.


Since we seem to be getting in a word counting contest ;)
consider this:

split => '$split = 0; $split += split for @data'

(but I didn't check the benchmarking side effects, again)
 
D

Dr.Ruud

Abigail:
Dr.Ruud:

All lines end with a newline, so the space is not needed.

Maybe you changed the
BEGIN {@data = split /\n/ => <<'--';
to
BEGIN {@data = <<'--';
and I didn't notice.

That doesn't modify the data, so that fine.

OK. It think it changes somebody else's @_, but inside an eval that
should not be a problem.

It's even faster.

Yep.
 
A

Anno Siegel

Dr.Ruud said:
Abigail schreef:
[$&
it's not easy to determine those special circumstances.

How about something like '\&' that can only be used in the replace-part?

In view of the (?{ ...}) and (??{ ...}) constructs the string could
change even then.

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top