Hi guys!
I'm working on a project where I am reading in values from a text file and then using them as search terms in a FASTA file. Ideally every time the program finds encounters the search term in the header section of the FASTA file, it should copy the header section of the FASTA file, along with the associated sequence into an output file. I've included examples of the kinds of data I am working with, along with the code I'm currently working on so you can get a better sense of what I am talking about.
Currently the problem I've been having with my code is that when I run it, it only outputs the results of the search conducted with the final term in the text file. It's that the program ran through the entire text file without performing any searches on the file to be searched, before stopping on the last item in the text file and performing a search. I should warn you that I just started programming in perl recently, so it is quite possible that my code has a few glaring errors that I've missed.
#Data from the text file I am using for search terms. Note these sequence ID names represent a subset of the sequences found in the FASTA file I am searching through. The actual text file is much larger, containing 1000s of terms.
BF01013B2E03.f1
BF01010B1A11.f1
BF01028B1E03.f1
BF01029A2C12.f1
BF01028B2D11.f1
#Data from input FASTA file. I am searching through this file using th +e items from the text file shown above. The actual input FASTA file contains 1000s more items
> BF01013B2E03.f1 735 1 735 ABI CAATCCAAGAACATTTTGAAGAAAAAATCTCTTAAAAAAAAGAAATCAAAACAAGTGATCAAAAATGAAATGAATGGTCA
> BF01010B1A11.f1 782 1 782 ABI AACGGACNANNCGGCAACCAGGAGGCCTTCCAAGCTGAACTGGGAGAGTGGATCAAGAAG
> BF01028B2D11.f1 674 1 674 ABI CCAGCACNNNNTNAGATATTAGCCTAGCCTCTATGTCGTATTTGTATTTCNNCTAGTTTTTCATCCGACTTTTTTGGATC
#output file note the output file has only one sequence in it. This is clearly not what I want. Something is wrong, I just can't put my finger on it. > BF01028B2D11.f1 674 1 674 ABI CCAGCACNNNNTNAGATATTAGCCTAGCCTCTATGTCGTATTTGTATTTCNNCTAGTTTTTCATCCGACTTTTTTGGATC #perl program
<code>
#!/usr/bin/perl
use strict;
use File::Basename;
my $database;
my $data = shift(@ARGV);
my $input_file = shift(@ARGV);
my $infile;
my $match; my $output_file;
my $ESTs_W_SNPs;
$output_file = "ESTsWsnpsANDgoodCoverage.fasta";
open ($database,'<', $data) or die "Cannot open $data\n";
while ($match = <$database>)
{
chomp $match;
open (IN, $input_file) || die "Can't open input file. Please provide a valid input filename.\n";
open ($ESTs_W_SNPs, '>', $output_file) or die "Cannot write to $output_file\n";
my ($seq, $prevhead) = (0, "", '');
while(<IN>)
{
my $line = $_;
$line =~ s/\r\n/\n/;
chomp $line;
$seq.= uc($line) if(eof(IN));
if (/\>(.*)/ || eof(IN))
{
my $head=$1;
printf $ESTs_W_SNPs ">$prevhead\n$seq\n" if($prevhead =~ /$ma +tch/); $prevhead = $head; $seq='';
}
else
{
$seq.=$line;
}
}
close (IN);
close ($ESTs_W_SNPs);
}
I'm working on a project where I am reading in values from a text file and then using them as search terms in a FASTA file. Ideally every time the program finds encounters the search term in the header section of the FASTA file, it should copy the header section of the FASTA file, along with the associated sequence into an output file. I've included examples of the kinds of data I am working with, along with the code I'm currently working on so you can get a better sense of what I am talking about.
Currently the problem I've been having with my code is that when I run it, it only outputs the results of the search conducted with the final term in the text file. It's that the program ran through the entire text file without performing any searches on the file to be searched, before stopping on the last item in the text file and performing a search. I should warn you that I just started programming in perl recently, so it is quite possible that my code has a few glaring errors that I've missed.
#Data from the text file I am using for search terms. Note these sequence ID names represent a subset of the sequences found in the FASTA file I am searching through. The actual text file is much larger, containing 1000s of terms.
BF01013B2E03.f1
BF01010B1A11.f1
BF01028B1E03.f1
BF01029A2C12.f1
BF01028B2D11.f1
#Data from input FASTA file. I am searching through this file using th +e items from the text file shown above. The actual input FASTA file contains 1000s more items
> BF01013B2E03.f1 735 1 735 ABI CAATCCAAGAACATTTTGAAGAAAAAATCTCTTAAAAAAAAGAAATCAAAACAAGTGATCAAAAATGAAATGAATGGTCA
> BF01010B1A11.f1 782 1 782 ABI AACGGACNANNCGGCAACCAGGAGGCCTTCCAAGCTGAACTGGGAGAGTGGATCAAGAAG
> BF01028B2D11.f1 674 1 674 ABI CCAGCACNNNNTNAGATATTAGCCTAGCCTCTATGTCGTATTTGTATTTCNNCTAGTTTTTCATCCGACTTTTTTGGATC
#output file note the output file has only one sequence in it. This is clearly not what I want. Something is wrong, I just can't put my finger on it. > BF01028B2D11.f1 674 1 674 ABI CCAGCACNNNNTNAGATATTAGCCTAGCCTCTATGTCGTATTTGTATTTCNNCTAGTTTTTCATCCGACTTTTTTGGATC #perl program
<code>
#!/usr/bin/perl
use strict;
use File::Basename;
my $database;
my $data = shift(@ARGV);
my $input_file = shift(@ARGV);
my $infile;
my $match; my $output_file;
my $ESTs_W_SNPs;
$output_file = "ESTsWsnpsANDgoodCoverage.fasta";
open ($database,'<', $data) or die "Cannot open $data\n";
while ($match = <$database>)
{
chomp $match;
open (IN, $input_file) || die "Can't open input file. Please provide a valid input filename.\n";
open ($ESTs_W_SNPs, '>', $output_file) or die "Cannot write to $output_file\n";
my ($seq, $prevhead) = (0, "", '');
while(<IN>)
{
my $line = $_;
$line =~ s/\r\n/\n/;
chomp $line;
$seq.= uc($line) if(eof(IN));
if (/\>(.*)/ || eof(IN))
{
my $head=$1;
printf $ESTs_W_SNPs ">$prevhead\n$seq\n" if($prevhead =~ /$ma +tch/); $prevhead = $head; $seq='';
}
else
{
$seq.=$line;
}
}
close (IN);
close ($ESTs_W_SNPs);
}
Comment