Don't know what is slowing down my program?

H

Harry

Hello,

I am trying to match tags in an html file, roughly as follows:

1
2 # Global variable.
3 my $lines = "slurped file content here";
4
5 sub rule_for_tag_a {
6
7 pos($lines) = 0;
8
9 while ($lines =~ m@ complex_pattern_without_any_\G_anchors
@gsix) {
10
11 if(some_condition) {
12
13 # pos($lines) == x at this point.
14
15 # Retrace / backtrack the pos by a small enough y,
where y >= 0.
16 pos($lines) = x - y;
17
18 next; # <--- STEPPING OVER THIS BECOMES **VERY**
SLOW AFTER SOMETIME!
19 }
20
21 adhoc_processing_for_tag_a();
22 }
23 }
24 ...

There are rules for other tags 'b', 'c', 'd', etc that *very* similar
to the rule for tag 'a' (and, you can trust me on this one)... they
differ only in the
'adhoc_processing_for_tag_*()'
subroutines.

I call these rules one after the other as follows:

25 rule_for_tag_d ();
26 rule_for_tag_c ();
27 rule_for_tag_b ();
28 rule_for_tag_a ();
29 # Everything runs very slowly now and then from this point on!
30 ...

Now, what I'm noticing is that, after several of the tag rules (for,
let's say, tags 'd', 'c', and 'b') have run with the usual (and as
expected) very high speed, something suddenly causes the program to
slow down substantially! I have only been able narrow down the problem
to one particular statement -- the 'next;' statement on line 18:
Stepping over line 18 and arriving at line 11 takes longer than
'expected' (roughly, about 2 to 3 seconds, which is a lot compared to
other iterations)! The next 2 or 3 iterations after the slowdown run
fine before the slowdown surfaces once again. This slowdown-fine-
slowdown-fine drama continues from this point on till the end of the
program.

Could it be that the complexity of my regex pattern and/or the nature
of the input data ($lines) is causing Perl's Garbage Collector to
suddenly kick in?

Don't know what else to try now?
/HS
 
A

A. Sinan Unur

Hello,

I am trying to match tags in an html file, roughly as follows:

You give no information of value.

What purpose do the line numbers below serve but to create clutter?
2 # Global variable.
3 my $lines = "slurped file content here";
4
5 sub rule_for_tag_a {
6
7 pos($lines) = 0;
8
9 while ($lines =~ m@
complex_pattern_without_any_\G_anchors
@gsix) {
10
11 if(some_condition) {
12
13 # pos($lines) == x at this point.
14
15 # Retrace / backtrack the pos by a small enough y,
where y >= 0.
16 pos($lines) = x - y;
17
18 next; # <--- STEPPING OVER THIS BECOMES **VERY**
SLOW AFTER SOMETIME!
19 }

Maybe it is still lightning fast but it is happening waaaaaaaayyyy to
many times?
21 adhoc_processing_for_tag_a();

Not very illuminating. This really does not help us help you.
Could it be that the complexity of my regex pattern and/or the nature
of the input data ($lines) is causing Perl's Garbage Collector to
suddenly kick in?

I don't think the garbage collector works that way in Perl. AFAIK, it is
a simple reference counting scheme.

I would recommend that you adopt a proper HTML parser. Given the
structure of your code, HTML::TokeParser may be particularly
appropriate.
Don't know what else to try now?

First, read the posting guidelines for this group to find out how to
help others help you.

Sinan

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/
 
J

Jürgen Exner

Harry said:
I am trying to match tags in an html file, roughly as follows:

Which is A Bad Idea(TM).
[...]
Don't know what else to try now?

Use a tool that is meant to parse HTML, REs are not.
There are several good HTML parsers on CPAN.

jue
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top