Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Ben Saville
    Junior Member
    • Aug 2010
    • 3

    Parsing BLAST results using BioPerl

    Hi Everyone,

    I'm very much new to the world of sequence data analysis (and this forum!), and have reached a roadblock.

    I have BLASTed some contigs against a series of databases that I created. From this I would like to parse through the data and separate it before extracting the information of interest at a later point. I would like to separate the data by query ID. I found the following Bioperl script;

    Code:
    #!/usr/bin/perl
    
    use Bio::Search::Result::BlastResult;
    use Bio::SearchIO;
     
    my $report = Bio::SearchIO->new( -file=>'All_BCM_results.bls', -format => blast);
    my $result = $report->next_result;
    my %hits_by_query;
    while (my $hit = $result->next_hit) {
      push @{$hits_by_query{$hit->name}}, $hit;
    }
     
    foreach my $qid ( keys %hits_by_query ) {
      my $result = Bio::Search::Result::BlastResult->new();
      $result->add_hit($_) for ( @{$hits_by_query{$qid}} );
      my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' );
      $blio->write_result($result);
    }
    running this script resulted in the following error;

    BlastResult::new(): Not adding iterations.

    ------------- EXCEPTION: Bio::Root::NoSuchThing -------------
    MSG: No such iteration number: 0. Valid range=1-0
    VALUE: The number zero (0)
    STACK: Error::throw
    STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.8/Bio/Root/Root.pm:368
    STACK: Bio::Search::Result::BlastResult::iteration /sw/lib/perl5/5.8.8/Bio/Search/Result/BlastResult.pm:328
    STACK: Bio::Search::Result::BlastResult::add_hit /sw/lib/perl5/5.8.8/Bio/Search/Result/BlastResult.pm:258
    STACK: /Users/bsaville/Desktop/Parsing_BLAST_by_query.pl:15
    -------------------------------------------------------------

    So I added
    Code:
    my $result = Bio::Search::Result::BlastResult->new(1);
    The 1 to the line shown above, as it told me this was within the valid range. This produced the following error;

    ------------- EXCEPTION: Bio::Root::Exception -------------
    MSG: Must define arrayref of Iterations when initializing a Bio::Search::Result::BlastResult

    STACK: Error::throw
    STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.8/Bio/Root/Root.pm:368
    STACK: Bio::Search::Result::BlastResult::new /sw/lib/perl5/5.8.8/Bio/Search/Result/BlastResult.pm:128
    STACK: /Users/bsaville/Desktop/Parsing_BLAST_by_query.pl:14
    -----------------------------------------------------------

    I know that it is my inexperience that is causing this problem, but I really can't figure this out.

    If anyone has any better suggestions for parsing BLAST results, I'm open to ideas.

    Thanks
    Ben
  • maubp
    Peter (Biopython etc)
    • Jul 2009
    • 1544

    #2
    You'd probably get a reply quickly on the BioPerl mailing list.

    Update:
    Last edited by maubp; 08-20-2010, 05:48 AM. Reason: Adding link to email on BioPerl mailing list

    Comment

    • Zigster
      Jeremy Leipzig
      • May 2009
      • 117

      #3
      The key to debugging is usually to start with something that works and make subtle changes until it doesn't.

      So either:
      1. find some blast output that works with this script and see how it differs from yours, or
      2. simplify the script to its most basic components and use print statements to verify it is parsing your blast output as expected
      --
      Jeremy Leipzig
      Bioinformatics Programmer
      --
      My blog
      Twitter

      Comment

      • westerman
        Rick Westerman
        • Jun 2008
        • 1104

        #4
        If anyone has any better suggestions for parsing BLAST results, I'm open to ideas.
        There are somewhere around 12 different Blast output formats. Feeding the wrong type to your script could cause it to blow up.

        My suggestion to newbies who are not doing much blast work is to use the blast format '-m 9' (tabular with comment lines) when doing their blast searches. This will allow you to import the work into a spreadsheet. Or, if you are unix-savvy, use the tools 'cut', 'paste', 'grep', 'uniq', etc. to obtain what you want.

        The default blast output (-m 0 ... pairwise) is almost impossible to parse correctly.

        XML output (-m 7) can be useful if you like XML.

        Comment

        • rglover
          rg
          • Dec 2008
          • 51

          #5
          I'm at the same institute as Ben - rather than use BioPerl (the Blast writer section has always been a bit of a pain to get working correctly) we hashed together a script that would do what he wanted (splitting a multiblast file into smaller blast output files named by the blast queryname):
          Code:
          #!/usr/bin/perl
          
          use strict;
          use warnings;
          
          open(BLAST,"All_All_results.bls");
          my @data = <BLAST>;
          close BLAST;
          
          my @querylines = ();
          for (my $i=0; $i<@data; $i++) {
              my $line = $data[$i];
              if ($line =~ m/Query=/g) {
                  push(@querylines,$i);
              }
          }
          
          my @queries = ();
          for (my $j=1; $j<@querylines; $j++) {
              my $endline = $querylines[$j];
              $endline--;
              my $startline = $querylines[$j-1];
              my @result = ();
              for (my $k=$startline; $k<=$endline; $k++) {
                  push(@result,$data[$k]);
              }
              push(@queries,[@result]);
          }
          
          my $numqueries = scalar(@queries);
          for (my $m=0; $m<$numqueries; $m++) {
              my $firstline = $queries[$m][0];
              my @linearray = split(' ',$firstline);
              my $name = $linearray[1];
              my @array = @{$queries[$m]};
              open(OUT,">$name.bls");
              foreach my $line (@array) {
                  print OUT $line;
              }   
              close OUT;    
          }
          It's based on the standard Blast+ blastn output.

          I'm sure there are more efficient perl ways of doing it, but I hope that helps anyone else who comes up with the same problem
          Last edited by rglover; 08-20-2010, 07:46 AM.

          Comment

          • kmcarr
            Senior Member
            • May 2008
            • 1181

            #6
            Ben,

            Could you please clarify something for me.

            Is the input BLAST file the result of a single BLAST run? In other words are you just trying to break up a single BLAST report into separate files, one corresponding to each query?

            OR

            Did you first concatenate several BLAST reports together, and were those BLAST reports the result of searching with the same set of query sequences against various target databases? If this is the case is your desire to aggregate all of the results for each query into a single report file?

            Comment

            • Ben Saville
              Junior Member
              • Aug 2010
              • 3

              #7
              Rachel was a little over generous saying that 'we' wrote the perl file, I just watched really.

              I set about this in completely the wrong way, I had several Multi-FASTA files to be BLASTed against each other. Originally I had 5 db's and 5 FASTA files and I BLASTed them all against each other.

              Then I realised that I could produce 5 larger BLAST reports by combining all of the FASTA sequences into one file. Then just before going to speak with Rachel I realised that I could create a database using my combined FASTA file too. So in the end I produced just the one BLAST report. This was then successfully parsed with the script shown above, and broken down by query ID into files using the query ID as the filename.

              The input blast report was produced using default -blastn settings on NCBI's blast++ command line program.

              Comment

              • kmcarr
                Senior Member
                • May 2008
                • 1181

                #8
                Ben,

                Combining the queries and making a single database is definitely the best way to go. Having a single output file, with each query represented only once, makes the task relatively simple.

                I know you and Rachel solved the problem but since you originally asked about using BioPerl I though I would throw together a demo script.

                Code:
                #!/usr/bin/perl
                
                ###########################
                # bp_simpleSplitBlast.pl
                # 
                # Simple script demonstrating spliting a multi query BLAST report
                # into individual query reports using the Bioperl SearchIO module.
                ###########################
                
                # Standard pragmas.
                
                use strict;
                use warnings;
                
                
                # The SearchIO module is used for both reading and writing the BLAST report files.
                # The Writer::TextResultWriter is used by the output SearchIO object to format the output
                # before writing.
                
                use Bio::SearchIO;
                use Bio::SearchIO::Writer::TextResultWriter;
                
                
                # Input BLAST report filename is passed as the only argument on the command line.
                
                my $fileName = $ARGV[0];
                
                
                # Instantiate a SearchIO object to read the input file.
                # The arguments are the file format and filename.
                
                my $blastReport = Bio::SearchIO->new(-format => 'blast',
                                                     -file => $fileName);
                
                
                # Instantiate a Writer object which will be used to format the output stream.
                # This object will be reused by each output writer.
                
                my $reportWriter = Bio::SearchIO::Writer::TextResultWriter->new();
                
                
                # Parse the input BLAST report one result object at a time.
                # The Bio::Search::Result::ResultI object exactly corresponds to one query,
                # which is the desired output.
                # For each result the query name is read to create unique file names.
                # A new SearchIO object is instantiated for each output,
                # passing the Writer object and file name.
                
                while (my $blastResult = $blastReport->next_result) {
                	my $qid = $blastResult->query_name;
                	my $reportFile = "${qid}.bls";
                	my $reportOutput = Bio::SearchIO->new(-writer => $reportWriter,
                                                             -file => ">$reportFile");
                	$reportOutput->write_result($blastResult);
                }

                Comment

                • Ben Saville
                  Junior Member
                  • Aug 2010
                  • 3

                  #9
                  Thanks Km,

                  I like to think that I'm not the only person who has / will fall at this hurdle and I hope that this thread will help others in the future!

                  I will be sure to try out your Bioperl script next time I'm parsing BLAST output.

                  Comment

                  Latest Articles

                  Collapse

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 06-05-2026, 10:09 AM
                  0 responses
                  14 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-04-2026, 08:59 AM
                  0 responses
                  28 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 12:03 PM
                  0 responses
                  33 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 11:40 AM
                  0 responses
                  23 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...