# Approach to match segments, opinions welcome

Discussion in 'Perl Misc' started by Frank Maas, Aug 21, 2003.

1. ### Frank MaasGuest

Hi Perl Group,

I am fussing about a problem and I would very much like some input of
others. I'll explain the problem:

given a list of numbers 1,2,3,15,13,19,5,6
and a set of data triples (1,15,9),(15,13,2),(13,6,4),(20,23,99)

I am looking for a fast algorithm that would convert the list into
the sum of 9, 2 and 4. To explain some more: the triples in effect
are the begin and endpoint in the list and the data that is valid
for that subsegment. No triple contains overlap, but for the begin
and end. There are more triples than subsegments and it is possible
that a subsegment cannot be found. In that case something special
must be done.

I have tried two approaches:
(1) use a series of regexp created out of the triple list (+/ 1000)
and match this with the list:

\$list="1:2:3:15:13:19:5:6"; \$d=0;
for (@triples) {
if (\$list =~ s/(?:^|(\$_->[0])?:.+?(\$_->[1])(?::|\$)/:\$1:-:\$2:/g)
{
\$d+=\$_->[2]
}
}
\$list=~s/:\d+-|\$)//;

(2) walk through the list (converted to an array) and find matches

%triples=( '1:15'=>9, ...);
@list=qw(1 2 3 15 13 19 5 6 X); \$d=0;
for \$i (shift @list) {
for \$j (shift @list) {
exit if \$j eq 'X';
\$d+=\$triples{"\$i:\$j"}, last if exists \$triples{"\$i:\$j"};
}
}
print "success \$d\n";

Both work (perhaps I have oversighted an error in the sample code
above), but a tad bit slow. Since this is must once be part of a
heavy webserver any time optimisation is very helpful. Any ideas?

--Frank

Frank Maas, Aug 21, 2003

2. ### Steven KuoGuest

On Thu, 21 Aug 2003, Steven Kuo wrote:

(snippped)

> I think the second approach is better. You've not mentioned what
> should be done if the first number in the list has no stopping
> delimiter. Do you stop? Or do you attempt to find a valid starting
> point? Assuming the former, here's what I'd do:
>
> #!/usr/local/bin/perl
>
> use strict;
> use warnings;
> my @triples = ( [1,15,9],[15,13,2],[13,6,4],[20,23,99]);
> my %find;
>
> for (@triples) {
> \$find{\$_->[0]} = {
> stop => \$_->[1],
> };
> }
>
> my @list = qw(1 2 3 15 13 19 5 6);
> my \$save = my \$start = shift @list;
> my \$stop;
>
> unless (\$start and exists \$find{\$start}->{stop}) {

^^^^^^^

I suppose this should be

unless (defined (\$start) and exists \$find{\$start}->{stop}) {

--
Cheers,
Steven

Steven Kuo, Aug 21, 2003

3. ### Frank MaasGuest

Hi Steven,

> You're using certain numbers to delimit a list of numbers?

I am not exactly sure what you mean by this. If your question
was to what I am doing exactly: this is about calculating travel
times in a network of nodes. Where the string is the node list and
the triples are travel times between two nodes.

> I think the second approach is better.

Yes... When I wrote this I had an error in the code that I found later.
The 2nd approach is way (way!) faster.

> You've not mentioned what
> should be done if the first number in the list has no stopping
> delimiter. Do you stop? Or do you attempt to find a valid starting
> point? Assuming the former, here's what I'd do:

> # Loop once and exit as soon as possible -- should be faster than
> # what you had previously written.

I can use this, but not exactly your solution. What I did not write
is that it is possible to have two triples with the same "start".
Your solution does not accommodate this. But the one time loop (w/o
shifting) seems promising.

Thanks!

Frank

Frank Maas, Aug 21, 2003