View Single Post
Old 07-28-2011, 12:20 AM   #10
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

Here's an alternate version that does away with the duplicated code and progress output (but also doesn't handle a minimum length -- you could pipe through fastx_clipper for that if necessary):

Code:
#!/usr/bin/perl
# contiguous_fasta.pl -- splits fasta-formatted files into contiguous
# sequences of non-ambiguous bases
# Author: David Eccles (gringer) 2011 <david.eccles@mpi-muenster.mpg.de>

use warnings;
use strict;

my $id = "";
my $first = 1; # true
my @sequences = ();
my $seqNum = 0;

while(<>){
    chomp;
    if(substr($_,0,1) eq ">"){
        $id = $_;
        $seqNum = 0;
        print((($first)?"":"\n").$_."/".$seqNum++."\n");
        $first = 0; # false
    } else {
        @sequences = split(/N+/, $_);
        print(shift(@sequences)) unless !@sequences; # whole line could be N
        foreach my $sequence (@sequences){
            print("\n$id/".$seqNum++."\n$sequence");
        }
    }
}
print("\n");
gringer is offline   Reply With Quote