SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Species comparison of number of hits by dc-megablast Noa Bioinformatics 0 08-06-2013 12:57 AM
BLAST filtering out 'no hits' in output kga1978 Bioinformatics 16 06-01-2013 05:38 AM
Local BLAST no hits found PSchneeb Bioinformatics 6 04-18-2012 06:16 AM
BWA:getting hits with given number of mismatches and indels Chandana Bioinformatics 0 01-11-2012 11:03 AM
Blast table format hsp alignment !! empyrean Bioinformatics 0 03-17-2011 09:06 AM

Reply
 
Thread Tools
Old 11-14-2013, 07:39 PM   #1
Kennels
Senior Member
 
Location: Sydney

Join Date: Feb 2011
Posts: 149
Default how to identify blast hits with only 1 HSP (not limit the number of HSP)

Hi,

Is there a way to identify blast hits where there was only one HSP to the database?

i.e. I don't want to limit the number of HSP's displayed, but I want to identify all queries which only had one HSP to the database to begin with in the blast result.

Maybe there is an accessory script somewhere that parses a blast report and identify the queries with this property?

thanks
Kennels is offline   Reply With Quote
Old 11-15-2013, 12:23 AM   #2
whataBamBam
Member
 
Location: Italy

Join Date: May 2013
Posts: 27
Default

If I understand what your after correctly then I do the same using this..

cat BlastOutputFile | grep -B 3 -A 1 "# 1 hits found"

When BlastOutputFile is in outformat 7. Obviously if your using Xml output or any other format it'll be different.
whataBamBam is offline   Reply With Quote
Old 11-15-2013, 01:52 AM   #3
Kennels
Senior Member
 
Location: Sydney

Join Date: Feb 2011
Posts: 149
Default

Quote:
Originally Posted by whataBamBam View Post
If I understand what your after correctly then I do the same using this..

cat BlastOutputFile | grep -B 3 -A 1 "# 1 hits found"

When BlastOutputFile is in outformat 7. Obviously if your using Xml output or any other format it'll be different.
Thanks bambam, but that's not what I'm after.

Yes that will pull out those with 1 match, but what I'm after is those that have only have one alignment 'section' (i.e. high-scoring segment pair).

e.g. an isoform could match to only one target in the database, but half of it could be aligning to to the 5' end, then a gap, then the other half the 3' end of the target.

I'm after hits which aligns without any interruptions, i.e. only 1 HSP. Blastn options don't seem to do what I want (e.g. -ungapped, -culling_limit, -max_hsps_per_subject just mask the information).
Kennels is offline   Reply With Quote
Old 11-15-2013, 02:46 AM   #4
sphil
Senior Member
 
Location: Stuttgart, Germany

Join Date: Apr 2010
Posts: 192
Default

Quote:
Originally Posted by Kennels View Post

I'm after hits which aligns without any interruptions, i.e. only 1 HSP. Blastn options don't seem to do what I want (e.g. -ungapped, -culling_limit, -max_hsps_per_subject just mask the information).
So you mean you want to parse for full-length hsp?
sphil is offline   Reply With Quote
Old 11-15-2013, 02:56 AM   #5
Kennels
Senior Member
 
Location: Sydney

Join Date: Feb 2011
Posts: 149
Default

Yes and no.
Also looking for partial alignment, but only those with single hsp.
Kennels is offline   Reply With Quote
Old 11-15-2013, 11:33 AM   #6
atcghelix
Member
 
Location: CA

Join Date: Jul 2013
Posts: 74
Default

I usually use BioPerl for something like this. The following will get you close (the formatting of the output blast file generated by Bio::SearchIO::Writer::TextResultWriter is slightly different, but basically the same).

Usage would be:
perl script.pl --blast inputBlastFile.txt --out reducedBlastFile.txt

Code:
#!/usr/bin/perl

use strict;
use warnings;
use Getopt::Long;
use Bio::SearchIO;
use Bio::SearchIO::Writer::TextResultWriter;

my $blastFile;
my $outFile;

GetOptions("blast=s" => \$blastFile,
           "out=s"   => \$outFile) || die "Couldn't get options with GetOpt::Long.\n";

my $blastIO = Bio::SearchIO->new(-file => $blastFile,
                                 -format => 'blast'); #you might need to change this

my $searchIOWriter = Bio::SearchIO::Writer::TextResultWriter->new();
my $out = Bio::SearchIO->new(-writer => $searchIOWriter,
                             -file => '>outtest.txt');

while (my $result = $blastIO->next_result) {
    if ($result->num_hits() == 1) {
        my $hit = $result->next_hit();
        if ($hit->num_hsps() == 1) {
            print "A single HSP for a single hit found for " . $result->query_name() . "\n";
            $out->write_result($result);
        }
    }
}
atcghelix is offline   Reply With Quote
Old 11-16-2013, 02:44 AM   #7
Kennels
Senior Member
 
Location: Sydney

Join Date: Feb 2011
Posts: 149
Default

Thanks I'll give it a try.
Kennels is offline   Reply With Quote
Old 11-16-2013, 03:18 AM   #8
lindenb
Senior Member
 
Location: France

Join Date: Apr 2010
Posts: 143
Default

if you used the XML output of blast, you could use the following XSLT stylesheet:

Code:
<?xml version='1.0'  encoding="ISO-8859-1"?>
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0' >
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:apply-templates select="BlastOutput/BlastOutput_iterations/Iteration/Iteration_hits/Hit"/>
</xsl:template>
<xsl:template match="Hit">
<xsl:if test="count(Hit_hsps/Hsp) = 1">
<xsl:value-of select="Hit_id"/><xsl:text>	</xsl:text>
<xsl:value-of select="Hit_def"/><xsl:text>	</xsl:text>
<xsl:value-of select="Hit_accession"/><xsl:text>
</xsl:text>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
usage:

Code:
$ xsltproc --novalid  stylesheet.xsl  blast.xml
lindenb is offline   Reply With Quote
Old 11-17-2013, 05:06 PM   #9
whataBamBam
Member
 
Location: Italy

Join Date: May 2013
Posts: 27
Default

Quote:
Originally Posted by Kennels View Post
Thanks bambam, but that's not what I'm after.

Yes that will pull out those with 1 match, but what I'm after is those that have only have one alignment 'section' (i.e. high-scoring segment pair).

e.g. an isoform could match to only one target in the database, but half of it could be aligning to to the 5' end, then a gap, then the other half the 3' end of the target.

I'm after hits which aligns without any interruptions, i.e. only 1 HSP. Blastn options don't seem to do what I want (e.g. -ungapped, -culling_limit, -max_hsps_per_subject just mask the information).
Ah sorry I get you. Yes that would be really useful. I'll have a look at the script posted and see what that's doing.

I think Gmap will do something like that.. it isn't really designed to match pairs of transcript length sequences (it's designed to map a transcript length sequence to a genome) but I actually do use it to map transcripts to ESTs and it seems to work really well. I then filter the output based on length of the alignment and can find the matches where the EST aligned along it's whole length without any significant gaps. Which is probably quite similar to what your trying to do.

Possibly I really shouldn't be using Gmap to do this so if someone can tell me that then great because I actually do it all the time.
whataBamBam is offline   Reply With Quote
Old 11-17-2013, 11:17 PM   #10
Kennels
Senior Member
 
Location: Sydney

Join Date: Feb 2011
Posts: 149
Default

Quote:
Originally Posted by atcghelix View Post
I usually use BioPerl for something like this. The following will get you close (the formatting of the output blast file generated by Bio::SearchIO::Writer::TextResultWriter is slightly different, but basically the same).
Just letting you know the script did what I wanted.
I just modified the script a little:

Code:
#!/usr/bin/perl

use strict;
use warnings;
use Getopt::Long;
use Bio::SearchIO;
use Bio::SearchIO::Writer::TextResultWriter;

my $blastFile = $ARGV[0];     ## added one argument
my $outFile;

GetOptions("blast=s" => \$blastFile,
           "out=s"   => \$outFile) || die "Couldn't get options with GetOpt::Long.\n";

my $blastIO = Bio::SearchIO->new(-file => $blastFile,
                                 -format => 'blasttable'); ### changed from 'blast'

my $searchIOWriter = Bio::SearchIO::Writer::TextResultWriter->new();
my $out = Bio::SearchIO->new(-writer => $searchIOWriter,
                             -file => '>outtest.txt');

while (my $result = $blastIO->next_result) {
    if ($result->num_hits() == 1) {
        my $hit = $result->next_hit();
        if ($hit->num_hsps() == 1) {
            
	print $result->query_name() . "\n";   ## just print the result ID
            $out->write_result($result);
        }
    }
}
Since my output was in -outfmt 6 from blastn, I changed the format to 'blasttable'
Also I just wanted the ID of the query for single HSP, so I just modified the stdout and redirect to a file to get my list of IDs.
I ran as:

Code:
parse_blastreport_single_hsp.pl Result.blastn6 > singleHSPs.txt

For others, also need to make sure the output has '-max_target_seqs' in blastn is set to more than 1.

Many thanks!
Kennels is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:27 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO