Creating and outputting a generic data structure?

C

Chris

Hi all,

I'm having difficulty trying to genericise this problem and
am looking for suggestions to point me in the right direction:

Take the following string:
LVFSGVLPLGTDTQNADI(SF)LWAKSFGAVIASKINM(KNA)TTHLVAGRNRT(DAGV)KVREATRYPKVKIVTTQWLLDCLTQWKRLAEEPYLLP

The letters in brackets are ambiguous positions where any of
the letters are allowed in that position. I want output all
the allowed combinations of the ambiguous positions e.g.

LVFSGVLPLGTDTQNADISLWAKSFGAVIASKINMKTTHLVAGRNRTDKVREATRYPKVKIVTTQWLLDCLTQWKRLAEEPYLLP
LVFSGVLPLGTDTQNADISLWAKSFGAVIASKINMNTTHLVAGRNRTAKVREATRYPKVKIVTTQWLLDCLTQWKRLAEEPYLLP
etc. ^ ^ ^

There can be any number of ambiguous positions with upto 4
allowed letters per position.

The code below correctly outputs all the combinations for
the above example, but I want to make this more generic for
more or less ambiguous positions. I would appreciate any
pointers as this has been bugging me for days now :-(
It only outputs the ambiguous letters - I can add in the
rest easily enough later.
TIA
Chris.


#!/usr/bin/perl

use strict;
use warnings;

my $string =
'LVFSGVLPLGTDTQNADI(SF)LWAKSFGAVIASKINM(KNA)TTHLVAGRNRT(DAGV)KVREATRYPKVKIVTTQWLLDCLTQWKRLAEEPYLLP';

my @ambis;
my $hit = 0;

while ($string =~ /\((\w+)\)/g) { # extract ambiguous regions
my @F = split //, $1;

foreach my $aa (@F) {
$ambis[$hit]{$aa}++;
}
++$hit;
}

if (scalar @ambis) { # output all combinations

# this bit doesn't work as desired :-(
# my $i = 0;
# while ($ambis[$i]) {
# foreach my $k (keys %{$ambis[$i]}) {
# print "$k\n";
# }
# ++$i;
# }

# this works, but only for when there are three ambiguous
regions. How can I change it?
foreach my $k1 (keys %{$ambis[0]}) {
foreach my $k2 (keys %{$ambis[1]}) {
foreach my $k3 (keys %{$ambis[2]}) {
print "$k1$k2$k3\n";
}
}
}
}
 
A

Anno Siegel

Chris said:
Hi all,

I'm having difficulty trying to genericise this problem and
am looking for suggestions to point me in the right direction:

Take the following string:
LVFSGVLPLGTDTQNADI(SF)LWAKSFGAVIASKINM(KNA)TTHLVAGRNRT(DAGV)KVREATRYPKVKIVTTQWLLDCLTQWKRLAEEPYLLP

Your example is exquisitely opaque. Why does it have to be a super-long
string in all upper case? I'll use

'XXX(sf)YYY(kna)ZZZ(dgav)UUU'

instead, which is a lot easier to follow.
The letters in brackets are ambiguous positions where any of
the letters are allowed in that position. I want output all
the allowed combinations of the ambiguous positions e.g.

I had to resort to your code to see that "any of the letters" are the
letters inside the parentheses (not any letter). It is also not clear
at this point that each group of parentheses stands for a single letter
in the final string.
LVFSGVLPLGTDTQNADISLWAKSFGAVIASKINMKTTHLVAGRNRTDKVREATRYPKVKIVTTQWLLDCLTQWKRLAEEPYLLP
LVFSGVLPLGTDTQNADISLWAKSFGAVIASKINMNTTHLVAGRNRTAKVREATRYPKVKIVTTQWLLDCLTQWKRLAEEPYLLP
etc. ^ ^ ^

There can be any number of ambiguous positions with upto 4
allowed letters per position.

The code below correctly outputs all the combinations for
the above example, but I want to make this more generic for
more or less ambiguous positions. I would appreciate any
pointers as this has been bugging me for days now :-(
It only outputs the ambiguous letters - I can add in the
rest easily enough later.

[code snipped]

Here is a recursive solution:

sub expand {
my $template = shift;
return $template unless $template =~ /\(([[:alpha:]]*)\)/;
my @expanded;
for my $char ( split //, $1 ) {
push @expanded, $template;
substr( $expanded[ -1], $-[ 0], $+[ 0] - $-[ 0]) = $char;
}
return map expand( $_), @expanded;
}

Anno
 
C

Chris

Anno said:
Your example is exquisitely opaque. Why does it have to be a super-long
string in all upper case? I'll use

'XXX(sf)YYY(kna)ZZZ(dgav)UUU'

instead, which is a lot easier to follow.

I can see that your generic example is clearer, but I
thought it better to use a real world example. BTW it needs
to be all upper case and my example is short in my context.
I had to resort to your code to see that "any of the letters" are the
letters inside the parentheses (not any letter). It is also not clear
at this point that each group of parentheses stands for a single letter
in the final string.

Apologies for the lack of clarity, hence why I prefer
explaining things with examples.
LVFSGVLPLGTDTQNADISLWAKSFGAVIASKINMKTTHLVAGRNRTDKVREATRYPKVKIVTTQWLLDCLTQWKRLAEEPYLLP
LVFSGVLPLGTDTQNADISLWAKSFGAVIASKINMNTTHLVAGRNRTAKVREATRYPKVKIVTTQWLLDCLTQWKRLAEEPYLLP
etc. ^ ^ ^

There can be any number of ambiguous positions with upto 4
allowed letters per position.

The code below correctly outputs all the combinations for
the above example, but I want to make this more generic for
more or less ambiguous positions. I would appreciate any
pointers as this has been bugging me for days now :-(
It only outputs the ambiguous letters - I can add in the
rest easily enough later.

[code snipped]

Here is a recursive solution:

sub expand {
my $template = shift;
return $template unless $template =~ /\(([[:alpha:]]*)\)/;
my @expanded;
for my $char ( split //, $1 ) {
push @expanded, $template;
substr( $expanded[ -1], $-[ 0], $+[ 0] - $-[ 0]) = $char;
}
return map expand( $_), @expanded;
}

Anno

Thanks very much for your reply and very succinct answer :)
I did think recursion was a way to do it, but it screws with
my head ;-)

I did not know about the @- and @+ variables; I'm sure I'll
use them again. One question though, is there any reason why
you used this:

substr( $expanded[ -1], $-[ 0], $+[ 0] - $-[ 0]) = $char;

instead of:

substr( $expanded[ -1], $-[ 0], $+[ 0] - $-[ 0], $char);
?
 
A

Anno Siegel

Chris said:
Anno said:
Chris <[email protected]> wrote in comp.lang.perl.misc:
[...]

... . One question though, is there any reason why
you used this:

substr( $expanded[ -1], $-[ 0], $+[ 0] - $-[ 0]) = $char;

instead of:

substr( $expanded[ -1], $-[ 0], $+[ 0] - $-[ 0], $char);
?

I find the assignment form more readable than the four-argument form
when replacing a substring. I restrict the four-argument form to
its specialty: returning the unchanged substring as well as replacing
it in the string. That isn't needed all that often, but can come in
handy.

Anno
 
C

Chris

Anno said:
Chris said:
Anno said:
Chris <[email protected]> wrote in comp.lang.perl.misc:
[...]

... . One question though, is there any reason why
you used this:

substr( $expanded[ -1], $-[ 0], $+[ 0] - $-[ 0]) = $char;

instead of:

substr( $expanded[ -1], $-[ 0], $+[ 0] - $-[ 0], $char);
?

I find the assignment form more readable than the four-argument form
when replacing a substring. I restrict the four-argument form to
its specialty: returning the unchanged substring as well as replacing
it in the string. That isn't needed all that often, but can come in
handy.
OK. Thanks again...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,280
Latest member
BGBBrock56

Latest Threads

Top