Help: Show specific part

Discussion in 'Perl Misc' started by Amy Lee, Aug 15, 2008.

  1. Amy Lee

    Amy Lee Guest

    Hello,

    I'm a newbie in Perl and do some work in Bioinformatics. I write a tiny
    script to show the sequences. However, I have a problem while I'm going to
    further process.

    My output looks like this.
    >xxx

    IGRRQWASLVTPMAKFDPEIVLEFYANAWPTEEGVRDMRSWVRGQWIPFDADA
    IGQLLGYPLVLEEGQECEYGQRRNRSDGFDEEA
    >yyy

    gaggccatcaagggatggtcgtttctccgggagcaacgcgtccagctcagggacgacgag
    tatactgatttccaggaggaaatagggcgccggcagtgggcatcactggttactcccatg
    gccaagttcgatccggaaatagtccttgagttttatgccaatgcttggccaacagaggag
    >zzz

    EGDAHAVSSTPAWVKPQQTPHGTHQYAQHHPSFSAHAGNASSST
    PVQPKAPTQREAPQVPTPNTTRPAGNSNTTRNFPPRPLPEFTPLPMTYEDLLPSLIANHL
    AVVTPGRVLEPPFPKWYDPNATCKYHGGVPGHSVEKCLALKYKVQHLMDAGWLTFQEDRP
    NVRTNPLANHGGGAVNAVESD
    >qqq

    tggaagccgcagaagaatcgttagaaactgctttccag
    tcttttgaggtggtcagcatttcctccgtggactccctctttgggcaaccttgtctgtcc
    gatgcagcggtaatgatggcccgagttatgttggggaacggttttgaacccgggatgggt
    ttagaaaaaaacaacggcggcataactagc

    And I hope I can save the whole protein sequences with their
    tags(>blahblah) into a file, like "protein" and save DNA sequences into
    "dna" file.

    So from that, "protein" is
    >xxx

    IGRRQWASLVTPMAKFDPEIVLEFYANAWPTEEGVRDMRSWVRGQWIPFDADA
    IGQLLGYPLVLEEGQECEYGQRRNRSDGFDEEA
    >zzz

    EGDAHAVSSTPAWVKPQQTPHGTHQYAQHHPSFSAHAGNASSST
    PVQPKAPTQREAPQVPTPNTTRPAGNSNTTRNFPPRPLPEFTPLPMTYEDLLPSLIANHL
    AVVTPGRVLEPPFPKWYDPNATCKYHGGVPGHSVEKCLALKYKVQHLMDAGWLTFQEDRP
    NVRTNPLANHGGGAVNAVESD
    "dna" is
    >yyy

    gaggccatcaagggatggtcgtttctccgggagcaacgcgtccagctcagggacgacgag
    tatactgatttccaggaggaaatagggcgccggcagtgggcatcactggttactcccatg
    gccaagttcgatccggaaatagtccttgagttttatgccaatgcttggccaacagaggag
    >qqq

    tggaagccgcagaagaatcgttagaaactgctttccag
    tcttttgaggtggtcagcatttcctccgtggactccctctttgggcaaccttgtctgtcc
    gatgcagcggtaatgatggcccgagttatgttggggaacggttttgaacccgggatgggt
    ttagaaaaaaacaacggcggcataactagc

    Because of lacking of Perl knowledge, could you show me some tips?

    Thank you very much~

    Regards,

    Amy Lee
    Amy Lee, Aug 15, 2008
    #1
    1. Advertising

  2. Amy Lee <> wrote:
    >I'm a newbie in Perl and do some work in Bioinformatics. I write a tiny
    >script to show the sequences. However, I have a problem while I'm going to
    >further process.
    >
    >My output looks like this.

    [snip lengthy text]

    >And I hope I can save the whole protein sequences with their
    >tags(>blahblah) into a file, like "protein" and save DNA sequences into
    >"dna" file.
    >
    >So from that, "protein" is

    [snip lenghty text]

    >Because of lacking of Perl knowledge, could you show me some tips?


    In how far is the text marked as "output" different from the part marked
    as "protein"? They appear to be identical to me. But then again I did
    not compare each and every character in those lengthy sequences.

    jue
    Jürgen Exner, Aug 15, 2008
    #2
    1. Advertising

  3. Amy Lee

    Amy Lee Guest

    On Fri, 15 Aug 2008 13:02:12 +0000, Jürgen Exner wrote:

    > Amy Lee <> wrote:
    >>I'm a newbie in Perl and do some work in Bioinformatics. I write a tiny
    >>script to show the sequences. However, I have a problem while I'm going to
    >>further process.
    >>
    >>My output looks like this.

    > [snip lengthy text]
    >
    >>And I hope I can save the whole protein sequences with their
    >>tags(>blahblah) into a file, like "protein" and save DNA sequences into
    >>"dna" file.
    >>
    >>So from that, "protein" is

    > [snip lenghty text]
    >
    >>Because of lacking of Perl knowledge, could you show me some tips?

    >
    > In how far is the text marked as "output" different from the part marked
    > as "protein"? They appear to be identical to me. But then again I did
    > not compare each and every character in those lengthy sequences.
    >
    > jue

    Well, actually speaking, the protein is upper letter, and dna is lowercase
    letter. So I suppose that I can deal with it by this. But I don't know how
    to do that~

    Thanks,

    Amy
    Amy Lee, Aug 15, 2008
    #3
  4. Amy Lee <> wrote:
    >On Fri, 15 Aug 2008 13:02:12 +0000, Jürgen Exner wrote:
    >>>Because of lacking of Perl knowledge, could you show me some tips?

    >>
    >> In how far is the text marked as "output" different from the part marked
    >> as "protein"? They appear to be identical to me. But then again I did
    >> not compare each and every character in those lengthy sequences.

    >
    >Well, actually speaking, the protein is upper letter, and dna is lowercase
    >letter.


    What on earth are you talking about? I was asking about what is the
    difference between your "output" and your "protein" character sequences,
    i.e. how do you want your Perl script to manipulate/change/modify those
    character sequences?

    >So I suppose that I can deal with it by this. But I don't know how
    >to do that~


    I have no idea what you are talking about. What "that" are you referring
    to?

    jue
    Jürgen Exner, Aug 15, 2008
    #4
  5. Amy Lee

    Amy Lee Guest

    On Fri, 15 Aug 2008 13:26:53 +0000, Jürgen Exner wrote:

    > Amy Lee <> wrote:
    >>On Fri, 15 Aug 2008 13:02:12 +0000, Jürgen Exner wrote:
    >>>>Because of lacking of Perl knowledge, could you show me some tips?
    >>>
    >>> In how far is the text marked as "output" different from the part marked
    >>> as "protein"? They appear to be identical to me. But then again I did
    >>> not compare each and every character in those lengthy sequences.

    >>
    >>Well, actually speaking, the protein is upper letter, and dna is lowercase
    >>letter.

    >
    > What on earth are you talking about? I was asking about what is the
    > difference between your "output" and your "protein" character sequences,
    > i.e. how do you want your Perl script to manipulate/change/modify those
    > character sequences?
    >
    >>So I suppose that I can deal with it by this. But I don't know how
    >>to do that~

    >
    > I have no idea what you are talking about. What "that" are you referring
    > to?
    >
    > jue

    Hmm, sorry to my poor English. Anyway, I will describe my problem in
    details.

    In fact, perl dose not modify any characters. As you know before, The
    "output" is separated by two parts, upper letter part(dna sequences) and
    lowercase letter part(protein sequences). And what I want to do is save
    the "protein" part into a file and save the "dna" part into another file.
    I need not change any characters.

    Furthermore, there's a tag like ">xxx" and the tag follows sequences. I
    hope I keep this tag when I save the "dna" part and "protein" part.

    Thank you very much~

    Regards,

    Amy
    Amy Lee, Aug 15, 2008
    #5
  6. Amy Lee <> wrote:
    >In fact, perl dose not modify any characters. As you know before, The
    >"output" is separated by two parts, upper letter part(dna sequences) and
    >lowercase letter part(protein sequences).


    No, I did not know. It may have been obvious to you but I did not notice
    that detail in the long complicated character sequences. Thank you for
    the explanation.

    >And what I want to do is save
    >the "protein" part into a file and save the "dna" part into another file.


    Ok, those four lines of explanation make it quite clear what you want to
    do. Posting only samples doesn't help because it leaves too much room
    for confusion and misunderstandings.

    >Furthermore, there's a tag like ">xxx" and the tag follows sequences. I
    >hope I keep this tag when I save the "dna" part and "protein" part.


    Here's how I would do it (sketch of code only, details and error
    handling omitted):

    open() the input file, open() two output files 'dna' and 'protein' with
    properly named file handles $DNA and $PROTEIN.

    Then

    while (<$IN>) {#loop through input file
    if (substr ($_, 0, 1) eq '>' ){ #found tag in this line
    my $next = <$IN>; #get next line for analysis
    $isDNA = $next eq lc($next); #set flag for DNA or Prot
    print ($isDNA ? $DNA : $PROTEIN) $_, $next;
    #print tag line and line from analysis to
    #either $DNA or $PROTEIN depending on flag
    } else { #not a tag line but regular data
    print ($isDNA ? $DNA : $PROTEIN) $_; #print normal data line
    }


    jue
    Jürgen Exner, Aug 15, 2008
    #6
  7. Amy Lee

    Dr.Ruud Guest

    Re: Show specific part

    Amy Lee schreef:

    > I'm a newbie in Perl and do some work in Bioinformatics. I write a
    > tiny script to show the sequences. However, I have a problem while
    > I'm going to further process. [...]
    > And I hope I can save the whole protein sequences with their
    > tags(>blahblah) into a file, like "protein" and save DNA sequences
    > into "dna" file.



    The following code expects "good input". It will be fooled by mixed-up
    input like

    >xxx

    IGRRQWASLVTPMAKFDPEIVLEFYANAWPTEEGVRDMRSWVRGQWIPFDADA
    tatactgatttccaggaggaaatagggcgccggcagtgggcatcactggttactcccatg
    IGQLLGYPLVLEEGQECEYGQRRNRSDGFDEEA
    >yyy

    gaggccatcaagggatggtcgtttctccgggagcaacgcgtccagctcagggacgacgag
    IGRRQWASLVTPMAKFDPEIVLEFYANAWPTEEGVRDMRSWVRGQWIPFDADA
    tatactgatttccaggaggaaatagggcgccggcagtgggcatcactggttactcccatg
    gccaagttcgatccggaaatagtccttgagttttatgccaatgcttggccaacagaggag


    #!/usr/bin/perl
    use strict;
    use warnings;

    my ($fh_dna, $fh_pro) = (\*STDOUT, \*STDERR);

    my $tag;

    while ( <DATA> ) {
    if ( /^>.+/ ) {
    $tag = $_;
    next; ###
    } elsif ( /^[acgt]+$/ ) {
    select $fh_dna;
    } elsif ( /^[A-Z]+$/ ) {
    select $fh_pro;
    } else {
    die;
    }
    $tag and print $tag and undef $tag;
    print;
    }

    __DATA__
    >xxx

    IGRRQWASLVTPMAKFDPEIVLEFYANAWPTEEGVRDMRSWVRGQWIPFDADA
    IGQLLGYPLVLEEGQECEYGQRRNRSDGFDEEA
    >yyy

    gaggccatcaagggatggtcgtttctccgggagcaacgcgtccagctcagggacgacgag
    tatactgatttccaggaggaaatagggcgccggcagtgggcatcactggttactcccatg
    gccaagttcgatccggaaatagtccttgagttttatgccaatgcttggccaacagaggag
    >zzz

    EGDAHAVSSTPAWVKPQQTPHGTHQYAQHHPSFSAHAGNASSST
    PVQPKAPTQREAPQVPTPNTTRPAGNSNTTRNFPPRPLPEFTPLPMTYEDLLPSLIANHL
    AVVTPGRVLEPPFPKWYDPNATCKYHGGVPGHSVEKCLALKYKVQHLMDAGWLTFQEDRP
    NVRTNPLANHGGGAVNAVESD
    >qqq

    tggaagccgcagaagaatcgttagaaactgctttccag
    tcttttgaggtggtcagcatttcctccgtggactccctctttgggcaaccttgtctgtcc
    gatgcagcggtaatgatggcccgagttatgttggggaacggttttgaacccgggatgggt
    ttagaaaaaaacaacggcggcataactagc


    "shrunken code" variant of the while-loop:

    while ( <DATA> ) {
    /^>.+/ and $tag = $_ and next;
    /^[acgt]+$/ and select($fh_dna) or
    /^[A-Z]+$/ and select($fh_pro) or die;
    print $tag and undef $tag if $tag;
    print;
    }

    --
    Affijn, Ruud

    "Gewoon is een tijger."
    Dr.Ruud, Aug 16, 2008
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. caleb
    Replies:
    5
    Views:
    380
    Johan Poppe
    May 31, 2005
  2. =?Utf-8?B?SmF2?=

    Is ViwState Page-Specific or UserControl-Specific

    =?Utf-8?B?SmF2?=, Aug 16, 2006, in forum: ASP .Net
    Replies:
    2
    Views:
    532
    =?Utf-8?B?SmF2?=
    Aug 16, 2006
  3. Jack
    Replies:
    8
    Views:
    277
  4. Marek Mänd
    Replies:
    1
    Views:
    300
    Martin Honnen
    Feb 20, 2005
  5. Replies:
    4
    Views:
    232
    Thomas 'PointedEars' Lahn
    May 9, 2005
Loading...

Share This Page