Hello,
I have intended to write a script to extract unaligned reads from a fasta file.
The script takes the first line of the file which contains all the reads, and run throught the entire one (the one containing all aligned reads), until it finds a match between two reads IDs. If it doesn't, then it means the read is unaligned and it is stored into the file unaligned.fasta (aligned.fasta in the other case).
The problem is the script runs very, very slowly, and I don't know how to optimize it.
Any help would be very appreciated.
The code (I made no comments of it, do not hesitate to tell me if you want me to do so):
Thanks in advance.
I have intended to write a script to extract unaligned reads from a fasta file.
The script takes the first line of the file which contains all the reads, and run throught the entire one (the one containing all aligned reads), until it finds a match between two reads IDs. If it doesn't, then it means the read is unaligned and it is stored into the file unaligned.fasta (aligned.fasta in the other case).
The problem is the script runs very, very slowly, and I don't know how to optimize it.
Any help would be very appreciated.
The code (I made no comments of it, do not hesitate to tell me if you want me to do so):
Code:
#!/usr/bin/perl use Bio::SeqIO; use Bio::Root::Exception; my $usage = "unalignedreads.pl alignedreads totalreads\n"; my $alignedreads = shift or die $usage; my $totalreads = shift or die $usage; $inseq1 = Bio::SeqIO->new('-file' => "<$totalreads", '-format' => 'fasta'); $inseq2 = Bio::SeqIO->new('-file' => "<$alignedreads", '-format' => 'fasta'); my %outfiles = ( 'aligned' => Bio::SeqIO->new('-file'=> '>/home/biopuce11/Documents/reads/aligned.fasta', '-format'=> 'fasta'), 'unaligned' => Bio::SeqIO->new('-file'=> '>/home/biopuce11/Documents/reads/unaligned.fasta', '-format' => 'fasta') ); while ($seqin1 = $inseq1->next_seq) { $flag = 0; while (($seqin2 = $inseq2->next_seq) && $flag == 0) { $sequ2 = $seqin2->display_id; $sequ1 = $seqin1->display_id; chomp($sequ1); chomp($sequ2); if ( "$sequ1" eq "$sequ2"){ $flag = 1; } } $inseq2 = Bio::SeqIO->new('-file' => "<$alignedreads", '-format' => 'fasta'); if ( $flag == 1) { $outfiles{'aligned'}->write_seq($seqin1); } else { $outfiles{'unaligned'}->write_seq($seqin1); } } exit;
Comment