![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
blastn: restrict output to same length as query | Mimoeschen | Bioinformatics | 3 | 04-18-2013 11:32 PM |
Get query coverage using CLI blast | hezichia | Bioinformatics | 1 | 03-05-2013 11:41 PM |
Any script to parse HMMsearch results? | Shishir | Bioinformatics | 2 | 02-11-2013 10:33 AM |
How to use the Bioperl to parse the parse flat file of UniProtKB database? | bewlib | Bioinformatics | 1 | 11-29-2012 05:30 PM |
query on various coverage terms | icebreaker | Bioinformatics | 0 | 11-15-2011 01:32 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Germany Join Date: Oct 2012
Posts: 48
|
![]()
I have been playing around with blast+ (blastn), a local installation and various custom databases.
I thought I had my workflow figured out, but some output is confusing me. Specifically the qcovs flag. As per the blast manual 'qcovs means Query coverage per subject' - i.e. how much of my query is represented in an alignment. I assumed this to be a percentage value (maxium 100%). And I have used this for filtering. But, now, I have done a local blast using a genome db, where the qcovs value goes up to 400! So clearly, it is not calculated in % ! Which means my previous filtering is probably crap... ![]() I basically want to do the following: Blast a set of sequences against dátabase 1. Filter blast result for: a) %idendity and b) alignment length and c) % of query sequence covered in alignment. I am basically not interested in alignments that cover 100% of the query, as I am doing breakpoint/insertion mapping. So I wanna filter these out and re-blast against database 2. Any ideas? ![]() ![]() |
![]() |
![]() |
![]() |
#2 |
Member
Location: Germany Join Date: Oct 2012
Posts: 48
|
![]()
Nevermind...
![]() ![]() |
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: HongKong Join Date: Jul 2013
Posts: 3
|
![]()
Using what options will produce the qcovs?
|
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: sub-surface moon base Join Date: Apr 2013
Posts: 372
|
![]() |
![]() |
![]() |
![]() |
#5 |
Member
Location: Germany Join Date: Oct 2012
Posts: 48
|
![]()
Indeed, using the -outfmt paramater, you can add all of the fields specified in the manual, see here from the manual:
outfmt string 0 alignment view options: 0 = pairwise, 1 = query-anchored showing identities, 2 = query-anchored no identities, 3 = flat query-anchored, show identities, 4 = flat query-anchored, no identities, 5 = XML Blast output, 6 = tabular, 7 = tabular with comment lines, 8 = Text ASN.1, 9 = Binary ASN.1 10 = Comma-separated values 11 = BLAST archive format (ASN.1) Options 6, 7, and 10 can be additionally configured to produce a custom format specified by space delimited format specifiers. The supported format specifiers are: qseqid means Query Seq-id qgi means Query GI qacc means Query accesion sseqid means Subject Seq-id sallseqid means All subject Seq-id(s), separated by a ';' sgi means Subject GI sallgi means All subject GIs sacc means Subject accession sallacc means All subject accessions qstart means Start of alignment in query qend means End of alignment in query sstart means Start of alignment in subject send means End of alignment in subject qseq means Aligned part of query sequence sseq means Aligned part of subject sequence evalue means Expect value bitscore means Bit score score means Raw score length means Alignment length pident means Percentage of identical matches nident means Number of identical matches mismatch means Number of mismatches positive means Number of positive-scoring matches gapopen means Number of gap openings gaps means Total number of gap ppos means Percentage of positive-scoring matches frames means Query and subject frames separated by a '/' qframe means Query frame sframe means Subject frame btop means Blast traceback operations (BTOP) staxids means unique Subject Taxonomy ID(s), separated by a ';'(in numerical order) sscinames means unique Subject Scientific Name(s), separated by a ';' scomnames means unique Subject Common Name(s), separated by a ';' sblastnames means unique Subject Blast Name(s), separated by a ';' (in alphabetical order) sskingdoms means unique Subject Super Kingdom(s), separated by a ';' (in alphabetical order) stitle means Subject Title salltitles means All Subject Title(s), separated by a '<>' sstrand means Subject Strand qcovs means Query Coverage Per Subject qcovhsp means Query Coverage Per HSP When not provided, the default value is: 'qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore', which is equivalent to the keyword 'std' |
![]() |
![]() |
![]() |
#6 |
Junior Member
Location: Washington State Join Date: Apr 2013
Posts: 4
|
![]()
I think the qcov sums up the HSP lengths and divide it against query-length. If there is repeats in your query, sth bigger than 100% can show up. Because HSPs are repeatedly calculated. Is that your case?
I have no solution for this problem, it seems complicated to program and filter the result. It will give you a bias towards bigger qcov. But I don't mind too much about it I wonder about what qcovhsp does though. |
![]() |
![]() |
![]() |
Thread Tools | |
|
|