regex, number of matches

Dr.Ruud · Sep 25, 2005

Abigail schreef:

[$&
it's not easy to determine those special circumstances.

How about something like '\&' that can only be used in the replace-part?

Dr.Ruud · Sep 25, 2005

Brian Wakem schreef:

The fastest way is to substitute a match with itself.

Not for (counting) random words, the destructive
substition 's/\S+/./g' is faster than 's/(\S+)/\1/'.
(even 50% faster than the '++ while /\S+/g')

list => '$list = 0; $list += () = (/\S+/g) for @data;',
sub => '$sub = 0; $sub += s/(\S+)/$1/g for @data;',
sub2 => '$sub2 = 0; $sub2 += s/(\S+)/\1/g for @data;',
sub3 => '$sub3 = 0; $sub3 += s/\S+/./g for @data;',
while => '$while = 0; do {$while++ while /\S+/g} for @data;',

Rate sub sub2 list while sub3
sub 5318/s -- -2% -24% -52% -68%
sub2 5420/s 2% -- -22% -51% -68%
list 6990/s 31% 29% -- -37% -58%
while 11120/s 109% 105% 59% -- -34%
sub3 16809/s 216% 210% 140% 51% --

John Bokma · Sep 25, 2005

Dr.Ruud said:
Dr.Ruud schreef:

Looks good:
my $foo = +/.../; # numeric context, return count of matches

Yup, I would prefer that over the t thingy

John Bokma · Sep 25, 2005

Dr.Ruud said:
Abigail schreef:

[$&
it's not easy to determine those special circumstances.

Click to expand...

How about something like '\&' that can only be used in the replace-part?

Just because you don't want to use () in the regexp part?

Dr.Ruud · Sep 25, 2005

John Bokma schreef:

Dr.Ruud:

Yup, I would prefer that over the t thingy

See Apocalypse-5:
http://www.perl.com/pub/a/2002/06/04/apo5.html?page=18
"If it turns out we do need an option, it'll probably be: n."

Dr.Ruud · Sep 25, 2005

John Bokma schreef:

Dr.Ruud:

Abigail:

[$&]
it's not easy to determine those special circumstances.

Click to expand...

How about something like '\&' that can only be used in the
replace-part?

Click to expand...

Just because you don't want to use () in the regexp part?

The answer is in "can only be used in the replace-part".

With () you need $1, and that involves setting up a variable.

So no, not only because I etc.

John Bokma · Sep 26, 2005

Dr.Ruud said:
John Bokma schreef:

See Apocalypse-5:
http://www.perl.com/pub/a/2002/06/04/apo5.html?page=18
"If it turns out we do need an option, it'll probably be: n."

Thanks, I have reading "Perl 6 info" on my list for ages now, time to start
reading

John Bokma · Sep 26, 2005

Dr.Ruud said:
John Bokma schreef:

Dr.Ruud:

Abigail:

[$&]
it's not easy to determine those special circumstances.

How about something like '\&' that can only be used in the
replace-part?

Click to expand...

Just because you don't want to use () in the regexp part?

Click to expand...

The answer is in "can only be used in the replace-part".

Yes, I understand that part

Regexes are already extremely complex (I
have often to look up things), and adding more clutter doesn't seem the
answer (to me) that is.

How often do you neet the \& thing.

With () you need $1, and that involves setting up a variable.

And what is \&? And where is the info it contains stored?

Dr.Ruud · Sep 26, 2005

Abigail schreef:

Dr.Ruud:

That's not a fair benchmark. After the first iteration, all the
sequences of non-space characters have been collapsed to single
characters -
reducing the sizes of the strings to match against. And that's of
course faster. It will also influence all tests run after the 'sub3'
test.

Aaargh. The destructive sub was run on a copy @data, to prevent any
influence on following runs, sorry for not making that clear.
But I hadn't thought of the side effects with the iterations.

a way to speed up the counting of words,
and that involves entering the regexp engine once,
instead of once for each line: [...]

while => '$while = 0; do {$while ++ while /\S+/g} for @data;',
join => '$join = 0; my $c = join "" => @data;
$join ++ while $c =~ /\S+/g', [...]

while 5019/s 142% 74% 71% -- -16%
join 5973/s 188% 107% 103% 19% --

The joining "" needs to be a " " for lines that don't have leading or
trailing whitespace.

Since we seem to be getting in a word counting contest

consider this:

split => '$split = 0; $split += split for @data'

(but I didn't check the benchmarking side effects, again)

Dr.Ruud · Sep 26, 2005

John Bokma schreef:

And what is \&?

A message for the regex engine.

And where is the info it contains stored?

Not in something accessible on the 'Perl-level'.

Dr.Ruud · Sep 26, 2005

Abigail:

Dr.Ruud:

All lines end with a newline, so the space is not needed.

Maybe you changed the
BEGIN {@data = split /\n/ => <<'--';
to
BEGIN {@data = <<'--';
and I didn't notice.

That doesn't modify the data, so that fine.

OK. It think it changes somebody else's @_, but inside an eval that
should not be a problem.

It's even faster.

Yep.

Anno Siegel · Sep 27, 2005

Dr.Ruud said:
Abigail schreef:

[$&
it's not easy to determine those special circumstances.

Click to expand...

How about something like '\&' that can only be used in the replace-part?

In view of the (?{ ...}) and (??{ ...}) constructs the string could
change even then.

Anno

Sort by number of characters	1	Nov 2, 2023
Help in hangman game	1	Jul 24, 2023
TF-IDF	1	Aug 19, 2021
Simplify Variable Number of Regex Groups	7	Jun 15, 2007
I would like to use awk to calculate the total number of records processed	1	Aug 25, 2022
number of starting tabs	15	Aug 16, 2012
Genetic algoritm generating the text	0	Aug 18, 2023
Get Number of regex matches	5	Dec 6, 2006

regex, number of matches

Dr.Ruud

Dr.Ruud

John Bokma

John Bokma

Dr.Ruud

Dr.Ruud

John Bokma

John Bokma

Dr.Ruud

Dr.Ruud

Dr.Ruud

Anno Siegel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads