elegant way to do the union of 2 strings

A

avilella

Hi,

I am looking for an elegant way to do the union of two string. For
example:

$seq1 = "--- --- GAA --- GGA";
$seq2 = "AAC TGG --- --- ---";

The rule is to do the union of the letters, leaving the dash symbol
when both have a dash in a given position:

$union = "AAC TGG GAA --- GGA";

Anyone? Maybe using a regular expression?

Thanks,

Albert.
 
D

Danny Woods

avilella said:
Hi,

I am looking for an elegant way to do the union of two string. For
example:

$seq1 = "--- --- GAA --- GGA";
$seq2 = "AAC TGG --- --- ---";

The rule is to do the union of the letters, leaving the dash symbol
when both have a dash in a given position:

$union = "AAC TGG GAA --- GGA";

Perhaps:

$seq1 =~ s/-/substr($seq2, pos($seq1), 1)/eg;

Works for your test case. Perhaps you have more data to test it
against?

Cheers,
Danny.
 
J

Jürgen Exner

avilella said:
I am looking for an elegant way to do the union of two string. For
example:

$seq1 = "--- --- GAA --- GGA";
$seq2 = "AAC TGG --- --- ---";

The rule is to do the union of the letters, leaving the dash symbol
when both have a dash in a given position:

$union = "AAC TGG GAA --- GGA";

Anyone? Maybe using a regular expression?

Sometimes just using different terminology solves the mysterie:

In sequence 1 (logically those are not strings but as you indicated
yourself they are sequences) you want to replace each '---' element with
the corresponding element from sequence 2.

#untested, sketch only
my @a = split(/ /, $seq1);
my @b = split(/ /,$seq2);
for my $i (0..$#a) {
if ($a eq '---') {$a = $b};
}
my $res = join (' ', @a);

This can probably be optimized somewhat by using map() and computing the
result elements in the argument of the join() but this way the logic is
very accessible.

jue
 
C

ccc31807

Hi,

I am looking for an elegant way to do the union of two string. For
example:

$seq1 = "--- --- GAA --- GGA";
$seq2 = "AAC TGG --- --- ---";

The rule is to do the union of the letters, leaving the dash symbol
when both have a dash in a given position:

$union = "AAC TGG GAA --- GGA";

I would approach this from the standpoint of comparing arrays rather
than looking at strings. Split each sequence and then iterate through
the arrays.

use strict;
use warnings;
my $s1 = "--- --- GAA --- GGA";
my $s2 = "AAC TGG --- --- ---";
my @s1 = split / /, $s1;
my @s2 = split / /, $s2;
my @s3 = ();
my $len = @s1; #get the length
for (my $i = 0; $i < $len; $i++)
{
$s3[$i] = '---' if $s1[$i] eq '---' and $s2[$i] eq '---';
$s3[$i] = $s1[$i] if $s1[$i] =~ /[A-Z]{3}/ and $s2[$i] eq '---';
$s3[$i] = $s2[$i] if $s1[$i] eq '---' and $s2[$i] =~ /[A-Z]{3}/;
$s3[$i] = '***' if $s1[$i] =~ /[A-Z]{3}/ and $s2[$i] =~ /[A-Z]{3}/;
}
my $s3 = join ' ', @s3;
print $s3;
exit(0);
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top