Split a multi-sequence file into individual files

Discussion in 'Perl Misc' started by ela, Nov 8, 2008.

  1. ela

    ela Guest

    From google, no need to reinvent the wheel but this one line code is too
    difficult to understand...

    perl -ne 'BEGIN{ $/=">"; } if(/^\s*(\S+)/){ open(F,">$1.fsa")||warn"$1 write
    failed:$!\n";chomp;print F ">", $_ }' fastafile

    anybody helps?
    ela, Nov 8, 2008
    #1
    1. Advertising

  2. ela <> wrote:
    > From google, no need to reinvent the wheel but this one line code is too
    > difficult to understand...
    >
    > perl -ne 'BEGIN{ $/=">"; } if(/^\s*(\S+)/){ open(F,">$1.fsa")||warn"$1 write
    > failed:$!\n";chomp;print F ">", $_ }' fastafile
    >
    > anybody helps?



    BEGIN{ $/=">"; } # set the Input Record Separator (perlvar.pod)
    while ( <> ) { # -n wraps in a while-diamond loop
    if( /^\s*(\S+)/ ){ # grab the first non-whitespace characters
    open(F,">$1.fsa") || warn"$1 write failed:$!\n"; # open a file
    chomp; # remove ">" from end of string
    print F ">", $_; # print ">" at beginning of string
    }
    }



    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
    Tad J McClellan, Nov 8, 2008
    #2
    1. Advertising

  3. ela

    Mirco Wahab Guest

    Tad J McClellan wrote:
    > ela <> wrote:
    >> From google, no need to reinvent the wheel but this one line code is too
    >> difficult to understand...
    >>
    >> perl -ne 'BEGIN{ $/=">"; } if(/^\s*(\S+)/){ open(F,">$1.fsa")||warn"$1 write
    >> failed:$!\n";chomp;print F ">", $_ }' fastafile
    >>
    >> anybody helps?

    >
    >
    > BEGIN{ $/=">"; } # set the Input Record Separator (perlvar.pod)
    > while ( <> ) { # -n wraps in a while-diamond loop
    > if( /^\s*(\S+)/ ){ # grab the first non-whitespace characters
    > open(F,">$1.fsa") || warn"$1 write failed:$!\n"; # open a file
    > chomp; # remove ">" from end of string
    > print F ">", $_; # print ">" at beginning of string
    > }
    > }


    I don't understand the purpose of the chomp,
    maybe it needs to be in front of the if():

    ...
    local $/ = '>';
    while (<>) {
    chomp;
    if( /\s*(\S+)/ ) {
    open my $fh, '>', "$1.fsa" or warn "$1 $!";
    print $fh '>'.$_
    }
    }
    ...

    Regards

    M.
    Mirco Wahab, Nov 8, 2008
    #3
  4. ela

    Tim Greer Guest

    Mirco Wahab wrote:

    > Tad J McClellan wrote:
    >> ela <> wrote:
    >>> From google, no need to reinvent the wheel but this one line code is
    >>> too difficult to understand...
    >>>
    >>> perl -ne 'BEGIN{ $/=">"; } if(/^\s*(\S+)/){
    >>> open(F,">$1.fsa")||warn"$1 write failed:$!\n";chomp;print F ">", $_
    >>> }' fastafile
    >>>
    >>> anybody helps?

    >>
    >>
    >> BEGIN{ $/=">"; } # set the Input Record Separator
    >> (perlvar.pod)
    >> while ( <> ) { # -n wraps in a while-diamond loop
    >> if( /^\s*(\S+)/ ){ # grab the first non-whitespace
    >> characters
    >> open(F,">$1.fsa") || warn"$1 write failed:$!\n"; # open a
    >> file
    >> chomp; # remove ">" from end of string
    >> print F ">", $_; # print ">" at beginning of string
    >> }
    >> }

    >
    > I don't understand the purpose of the chomp,
    > maybe it needs to be in front of the if():
    >
    > ...
    > local $/ = '>';
    > while (<>) {
    > chomp;
    > if( /\s*(\S+)/ ) {
    > open my $fh, '>', "$1.fsa" or warn "$1 $!";
    > print $fh '>'.$_
    > }
    > }
    > ...
    >
    > Regards
    >
    > M.


    perldoc -f chomp

    Chomp removes any newline, if one exists (which it probably would on
    <>).

    It's the difference between (trying to) opening:
    $1.fsa

    and

    $1
    ..fsa


    --
    Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
    Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
    and Custom Hosting. 24/7 support, 30 day guarantee, secure servers.
    Industry's most experienced staff! -- Web Hosting With Muscle!
    Tim Greer, Nov 8, 2008
    #4
  5. ela

    Tim Greer Guest

    Tim Greer wrote:

    > Mirco Wahab wrote:
    >
    >> Tad J McClellan wrote:
    >>> ela <> wrote:
    >>>> From google, no need to reinvent the wheel but this one line code
    >>>> is too difficult to understand...
    >>>>
    >>>> perl -ne 'BEGIN{ $/=">"; } if(/^\s*(\S+)/){
    >>>> open(F,">$1.fsa")||warn"$1 write failed:$!\n";chomp;print F ">", $_
    >>>> }' fastafile
    >>>>
    >>>> anybody helps?
    >>>
    >>>
    >>> BEGIN{ $/=">"; } # set the Input Record Separator
    >>> (perlvar.pod)
    >>> while ( <> ) { # -n wraps in a while-diamond loop
    >>> if( /^\s*(\S+)/ ){ # grab the first non-whitespace
    >>> characters
    >>> open(F,">$1.fsa") || warn"$1 write failed:$!\n"; # open a
    >>> file
    >>> chomp; # remove ">" from end of string
    >>> print F ">", $_; # print ">" at beginning of string
    >>> }
    >>> }

    >>
    >> I don't understand the purpose of the chomp,
    >> maybe it needs to be in front of the if():
    >>
    >> ...
    >> local $/ = '>';
    >> while (<>) {
    >> chomp;
    >> if( /\s*(\S+)/ ) {
    >> open my $fh, '>', "$1.fsa" or warn "$1 $!";
    >> print $fh '>'.$_
    >> }
    >> }
    >> ...
    >>
    >> Regards
    >>
    >> M.

    >
    > perldoc -f chomp
    >
    > Chomp removes any newline, if one exists


    Pardon... to be clear, it removes the new line at the end of the string
    (not just any new line).
    --
    Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
    Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
    and Custom Hosting. 24/7 support, 30 day guarantee, secure servers.
    Industry's most experienced staff! -- Web Hosting With Muscle!
    Tim Greer, Nov 8, 2008
    #5
  6. ela

    Mirco Wahab Guest

    Tim Greer wrote:
    > Mirco Wahab wrote:
    >> Tad J McClellan wrote:
    >>> BEGIN{ $/=">"; } # set the Input Record Separator
    >>> (perlvar.pod)
    >>> while ( <> ) { # -n wraps in a while-diamond loop
    >>> if( /^\s*(\S+)/ ){ # grab the first non-whitespace
    >>> characters
    >>> open(F,">$1.fsa") || warn"$1 write failed:$!\n"; # open a
    >>> file
    >>> chomp; # remove ">" from end of string
    >>> print F ">", $_; # print ">" at beginning of string
    >>> }
    >>> }

    >> I don't understand the purpose of the chomp,
    >> maybe it needs to be in front of the if():
    >>
    >> ...
    >> local $/ = '>';
    >> while (<>) {
    >> chomp;
    >> if( /\s*(\S+)/ ) {
    >> open my $fh, '>', "$1.fsa" or warn "$1 $!";
    >> print $fh '>'.$_
    >> }
    >> }
    >> ...

    >
    > perldoc -f chomp
    >
    > Chomp removes any newline, if one exists (which it probably would on
    > <>).


    No, it doesn't. It removes the $/, which is
    here the '>'.

    > It's the difference between (trying to) opening:
    > $1.fsa
    >
    > and
    >
    > $1
    > .fsa


    No way. In the above problem, it would on the
    first record get the '>' in $1, which leads
    to an open argument of ">>.fsa" which
    creates a file '.fsa' that contains noting.

    Regards

    M.
    Mirco Wahab, Nov 8, 2008
    #6
  7. ela

    Tim Greer Guest

    Mirco Wahab wrote:

    >> perldoc -f chomp
    >>
    >> Chomp removes any newline, if one exists (which it probably would on
    >> <>).

    >
    > No, it doesn't. It removes the $/, which is
    > here the '>'.


    My newsreader is interpreting / / and <> for some reason (and I'm not
    seeing what I should be seeing), so I didn't see all of the code for
    what it was, I guess. I saw while (<>) { chomp; ... } and hence my
    reply. Disregard if it wasn't relevant after all.
    --
    Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
    Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
    and Custom Hosting. 24/7 support, 30 day guarantee, secure servers.
    Industry's most experienced staff! -- Web Hosting With Muscle!
    Tim Greer, Nov 8, 2008
    #7
  8. ela

    Guest

    Mirco Wahab <> wrote:
    > Tad J McClellan wrote:
    > > ela <> wrote:
    > >> From google, no need to reinvent the wheel but this one line code is
    > >> too difficult to understand...
    > >>
    > >> perl -ne 'BEGIN{ $/=">"; } if(/^\s*(\S+)/){ open(F,">$1.fsa")||warn"$1
    > >> write failed:$!\n";chomp;print F ">", $_ }' fastafile
    > >>
    > >> anybody helps?

    > >
    > >
    > > BEGIN{ $/=">"; } # set the Input Record Separator
    > > (perlvar.pod) while ( <> ) { # -n wraps in a
    > > while-diamond loop
    > > if( /^\s*(\S+)/ ){ # grab the first non-whitespace
    > > characters
    > > open(F,">$1.fsa") || warn"$1 write failed:$!\n"; # open a file
    > > chomp; # remove ">" from end of string
    > > print F ">", $_; # print ">" at beginning of string
    > > }
    > > }

    >
    > I don't understand the purpose of the chomp,


    It is to remove the trailing ">", which is not wanted. In FASTA sequence
    files, ">" is start of the next record, not the end of the current one.

    > maybe it needs to be in front of the if():


    I don't see how that would make a difference. If the if fails, nothing
    happens anyway. If the if succeeds, it makes no difference if the chomp
    is done before or after.

    Ah, but if the file starts out with the first character of ">", (which it
    probably does) then the first record contains nothing but $/. By not
    chomping the conditional is true you litter your file system with invisible
    (on linux) empty files named .fsa. If you do chomp, the conditional is
    false and nothing happens, which is what one wants. So yes, the chomp
    should be before the if.


    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    The costs of publication of this article were defrayed in part by the
    payment of page charges. This article must therefore be hereby marked
    advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    this fact.
    , Nov 10, 2008
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Krish
    Replies:
    1
    Views:
    1,073
    =?Utf-8?B?Q3VydF9DIFtNVlBd?=
    Oct 20, 2005
  2. Andy Fish
    Replies:
    0
    Views:
    293
    Andy Fish
    Jun 23, 2004
  3. Randy Bush
    Replies:
    3
    Views:
    258
  4. Jed
    Replies:
    7
    Views:
    434
    Terry Reedy
    Aug 26, 2010
  5. satyam
    Replies:
    9
    Views:
    265
    Peter Otten
    Oct 24, 2012
Loading...

Share This Page