SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
BLAST+ creating custom blast database and using blast+ filtering features deniz Bioinformatics 2 10-26-2012 11:04 AM
Custom local blast results detq182 Bioinformatics 1 02-05-2012 06:49 AM
Blast > parsing result in Exel Giorgio C Bioinformatics 6 11-16-2011 12:16 AM
blast results nitinkumar Bioinformatics 8 05-16-2011 07:17 AM
BLAST output parsing in R? rdu Bioinformatics 3 01-25-2011 06:25 AM

Reply
 
Thread Tools
Old 08-20-2010, 01:50 AM   #1
Ben Saville
Junior Member
 
Location: Leeds, UK

Join Date: Aug 2010
Posts: 3
Default Parsing BLAST results using BioPerl

Hi Everyone,

I'm very much new to the world of sequence data analysis (and this forum!), and have reached a roadblock.

I have BLASTed some contigs against a series of databases that I created. From this I would like to parse through the data and separate it before extracting the information of interest at a later point. I would like to separate the data by query ID. I found the following Bioperl script;

Code:
#!/usr/bin/perl

use Bio::Search::Result::BlastResult;
use Bio::SearchIO;
 
my $report = Bio::SearchIO->new( -file=>'All_BCM_results.bls', -format => blast);
my $result = $report->next_result;
my %hits_by_query;
while (my $hit = $result->next_hit) {
  push @{$hits_by_query{$hit->name}}, $hit;
}
 
foreach my $qid ( keys %hits_by_query ) {
  my $result = Bio::Search::Result::BlastResult->new();
  $result->add_hit($_) for ( @{$hits_by_query{$qid}} );
  my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' );
  $blio->write_result($result);
}
running this script resulted in the following error;

BlastResult::new(): Not adding iterations.

------------- EXCEPTION: Bio::Root::NoSuchThing -------------
MSG: No such iteration number: 0. Valid range=1-0
VALUE: The number zero (0)
STACK: Error::throw
STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.8/Bio/Root/Root.pm:368
STACK: Bio::Search::Result::BlastResult::iteration /sw/lib/perl5/5.8.8/Bio/Search/Result/BlastResult.pm:328
STACK: Bio::Search::Result::BlastResult::add_hit /sw/lib/perl5/5.8.8/Bio/Search/Result/BlastResult.pm:258
STACK: /Users/bsaville/Desktop/Parsing_BLAST_by_query.pl:15
-------------------------------------------------------------

So I added
Code:
my $result = Bio::Search::Result::BlastResult->new(1);
The 1 to the line shown above, as it told me this was within the valid range. This produced the following error;

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Must define arrayref of Iterations when initializing a Bio::Search::Result::BlastResult

STACK: Error::throw
STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.8/Bio/Root/Root.pm:368
STACK: Bio::Search::Result::BlastResult::new /sw/lib/perl5/5.8.8/Bio/Search/Result/BlastResult.pm:128
STACK: /Users/bsaville/Desktop/Parsing_BLAST_by_query.pl:14
-----------------------------------------------------------

I know that it is my inexperience that is causing this problem, but I really can't figure this out.

If anyone has any better suggestions for parsing BLAST results, I'm open to ideas.

Thanks
Ben
Ben Saville is offline   Reply With Quote
Old 08-20-2010, 03:42 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,336
Default

You'd probably get a reply quickly on the BioPerl mailing list.

Update:
http://lists.open-bio.org/pipermail/...st/033948.html

Last edited by maubp; 08-20-2010 at 05:48 AM. Reason: Adding link to email on BioPerl mailing list
maubp is offline   Reply With Quote
Old 08-20-2010, 06:09 AM   #3
Zigster
(Jeremy Leipzig)
 
Location: Philadelphia, PA

Join Date: May 2009
Posts: 116
Default

The key to debugging is usually to start with something that works and make subtle changes until it doesn't.

So either:
  1. find some blast output that works with this script and see how it differs from yours, or
  2. simplify the script to its most basic components and use print statements to verify it is parsing your blast output as expected
__________________
--
Jeremy Leipzig
Bioinformatics Programmer
--
My blog
Twitter
Zigster is offline   Reply With Quote
Old 08-20-2010, 06:47 AM   #4
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 885
Default

Quote:
If anyone has any better suggestions for parsing BLAST results, I'm open to ideas.
There are somewhere around 12 different Blast output formats. Feeding the wrong type to your script could cause it to blow up.

My suggestion to newbies who are not doing much blast work is to use the blast format '-m 9' (tabular with comment lines) when doing their blast searches. This will allow you to import the work into a spreadsheet. Or, if you are unix-savvy, use the tools 'cut', 'paste', 'grep', 'uniq', etc. to obtain what you want.

The default blast output (-m 0 ... pairwise) is almost impossible to parse correctly.

XML output (-m 7) can be useful if you like XML.
westerman is offline   Reply With Quote
Old 08-20-2010, 07:42 AM   #5
rglover
Rachel Glover
 
Location: York, UK

Join Date: Dec 2008
Posts: 51
Default

I'm at the same institute as Ben - rather than use BioPerl (the Blast writer section has always been a bit of a pain to get working correctly) we hashed together a script that would do what he wanted (splitting a multiblast file into smaller blast output files named by the blast queryname):
Code:
#!/usr/bin/perl

use strict;
use warnings;

open(BLAST,"All_All_results.bls");
my @data = <BLAST>;
close BLAST;

my @querylines = ();
for (my $i=0; $i<@data; $i++) {
    my $line = $data[$i];
    if ($line =~ m/Query=/g) {
        push(@querylines,$i);
    }
}

my @queries = ();
for (my $j=1; $j<@querylines; $j++) {
    my $endline = $querylines[$j];
    $endline--;
    my $startline = $querylines[$j-1];
    my @result = ();
    for (my $k=$startline; $k<=$endline; $k++) {
        push(@result,$data[$k]);
    }
    push(@queries,[@result]);
}

my $numqueries = scalar(@queries);
for (my $m=0; $m<$numqueries; $m++) {
    my $firstline = $queries[$m][0];
    my @linearray = split(' ',$firstline);
    my $name = $linearray[1];
    my @array = @{$queries[$m]};
    open(OUT,">$name.bls");
    foreach my $line (@array) {
        print OUT $line;
    }   
    close OUT;    
}
It's based on the standard Blast+ blastn output.

I'm sure there are more efficient perl ways of doing it, but I hope that helps anyone else who comes up with the same problem

Last edited by rglover; 08-20-2010 at 07:46 AM.
rglover is offline   Reply With Quote
Old 08-20-2010, 09:53 AM   #6
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 956
Default

Ben,

Could you please clarify something for me.

Is the input BLAST file the result of a single BLAST run? In other words are you just trying to break up a single BLAST report into separate files, one corresponding to each query?

OR

Did you first concatenate several BLAST reports together, and were those BLAST reports the result of searching with the same set of query sequences against various target databases? If this is the case is your desire to aggregate all of the results for each query into a single report file?
kmcarr is online now   Reply With Quote
Old 08-20-2010, 02:37 PM   #7
Ben Saville
Junior Member
 
Location: Leeds, UK

Join Date: Aug 2010
Posts: 3
Default

Rachel was a little over generous saying that 'we' wrote the perl file, I just watched really.

I set about this in completely the wrong way, I had several Multi-FASTA files to be BLASTed against each other. Originally I had 5 db's and 5 FASTA files and I BLASTed them all against each other.

Then I realised that I could produce 5 larger BLAST reports by combining all of the FASTA sequences into one file. Then just before going to speak with Rachel I realised that I could create a database using my combined FASTA file too. So in the end I produced just the one BLAST report. This was then successfully parsed with the script shown above, and broken down by query ID into files using the query ID as the filename.

The input blast report was produced using default -blastn settings on NCBI's blast++ command line program.
Ben Saville is offline   Reply With Quote
Old 08-23-2010, 07:17 AM   #8
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 956
Default

Ben,

Combining the queries and making a single database is definitely the best way to go. Having a single output file, with each query represented only once, makes the task relatively simple.

I know you and Rachel solved the problem but since you originally asked about using BioPerl I though I would throw together a demo script.

Code:
#!/usr/bin/perl

###########################
# bp_simpleSplitBlast.pl
# 
# Simple script demonstrating spliting a multi query BLAST report
# into individual query reports using the Bioperl SearchIO module.
###########################

# Standard pragmas.

use strict;
use warnings;


# The SearchIO module is used for both reading and writing the BLAST report files.
# The Writer::TextResultWriter is used by the output SearchIO object to format the output
# before writing.

use Bio::SearchIO;
use Bio::SearchIO::Writer::TextResultWriter;


# Input BLAST report filename is passed as the only argument on the command line.

my $fileName = $ARGV[0];


# Instantiate a SearchIO object to read the input file.
# The arguments are the file format and filename.

my $blastReport = Bio::SearchIO->new(-format => 'blast',
                                     -file => $fileName);


# Instantiate a Writer object which will be used to format the output stream.
# This object will be reused by each output writer.

my $reportWriter = Bio::SearchIO::Writer::TextResultWriter->new();


# Parse the input BLAST report one result object at a time.
# The Bio::Search::Result::ResultI object exactly corresponds to one query,
# which is the desired output.
# For each result the query name is read to create unique file names.
# A new SearchIO object is instantiated for each output,
# passing the Writer object and file name.

while (my $blastResult = $blastReport->next_result) {
	my $qid = $blastResult->query_name;
	my $reportFile = "${qid}.bls";
	my $reportOutput = Bio::SearchIO->new(-writer => $reportWriter,
                                             -file => ">$reportFile");
	$reportOutput->write_result($blastResult);
}
kmcarr is online now   Reply With Quote
Old 08-24-2010, 07:43 AM   #9
Ben Saville
Junior Member
 
Location: Leeds, UK

Join Date: Aug 2010
Posts: 3
Default

Thanks Km,

I like to think that I'm not the only person who has / will fall at this hurdle and I hope that this thread will help others in the future!

I will be sure to try out your Bioperl script next time I'm parsing BLAST output.
Ben Saville is offline   Reply With Quote
Reply

Tags
bioperl, blast, parsing blast results

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:20 AM.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.