![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
BLAST+ creating custom blast database and using blast+ filtering features | deniz | Bioinformatics | 3 | 07-07-2019 08:04 AM |
sorting bam files | frymor | Bioinformatics | 23 | 02-10-2016 05:46 PM |
BLAST+ vs BLASTALL (legacy BLAST) | Symphysodon | Bioinformatics | 4 | 10-25-2011 02:52 PM |
BLAST database error - when changing to new BLAST+ local program | biobio | Bioinformatics | 4 | 06-15-2011 05:20 AM |
Sorting large files | scami | Bioinformatics | 3 | 09-20-2010 11:45 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
not just another member
Location: Belgium Join Date: Aug 2010
Posts: 264
|
![]()
Hi,
I don't know to sort blastn output compare to the score ( or the e-value ). I've got to sequence ( one with some queries, one with the subjects ) I executed this command : blast2 -i query.fa -j subject.fa -o output.blast -p blastn -W 6 Here's a little example of my output : Code:
Sequences producing significant alignments: score(bits) E-Value 1_0 hsa-let-7a MIMAT0000062 Homo sapiens let-7a 36 4e-09 2_0 hsa-let-7a* MIMAT0004481 Homo sapiens let-7a* 28 1e-06 4_0 hsa-let-7b MIMAT0000063 Homo sapiens let-7b 24 2e-05 5_0 hsa-let-7b* MIMAT0004482 Homo sapiens let-7b* 20 3e-04 6_0 hsa-let-7c MIMAT0000064 Homo sapiens let-7c 28 1e-06 8_0 hsa-let-7d MIMAT0000065 Homo sapiens let-7d 26 4e-06 9_0 hsa-let-7d* MIMAT0004484 Homo sapiens let-7d* 12 0.064 10_0 hsa-let-7e MIMAT0000066 Homo sapiens let-7e 28 1e-06 11_0 hsa-let-7e* MIMAT0004485 Homo sapiens let-7e* 12 0.064 12_0 hsa-let-7f MIMAT0000067 Homo sapiens let-7f 44 2e-11 13_0 hsa-let-7f-1* MIMAT0004486 Homo sapiens let-7f-1* 24 2e-05 14_0 hsa-let-7f-2* MIMAT0004487 Homo sapiens let-7f-2* 20 3e-04 23_0 hsa-miR-19a* MIMAT0004490 Homo sapiens miR-19a* 12 0.064 24_0 hsa-miR-19a MIMAT0000073 Homo sapiens miR-19a 12 0.067 Thanks a lot, Nicolas |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: WashU Join Date: Aug 2010
Posts: 117
|
![]()
Could you provide a sample of your two .fa files and the version of BLAST that you are using?
Is it possible that those are the best hits from different queries. I'm very not familiar with this style of BLAST output. Have you considered using the tabular output format? If you do not need the actual alignments but only the alignment info (including score) you can specify tabular output using the '-m 8' option. |
![]() |
![]() |
![]() |
#3 |
not just another member
Location: Belgium Join Date: Aug 2010
Posts: 264
|
![]()
My version of blast is 2.2.21
here's a sample of the two .fa I use. the query : >04_x240274 TGAGGTAGTAGATTGTATAGTTT the subject (it's the homo sapiens mature mirna from mirbase 16 ) : >hsa-let-7a MIMAT0000062 Homo sapiens let-7a TGAGGTAGTAGGTTGTATAGTT >hsa-let-7a* MIMAT0004481 Homo sapiens let-7a* CTATACAATCTACTGTCTTTC >hsa-let-7a-2* MIMAT0010195 Homo sapiens let-7a-2* CTGTACAGCCTCCTAGCTTTCC >hsa-let-7b MIMAT0000063 Homo sapiens let-7b TGAGGTAGTAGGTTGTGTGGTT >hsa-let-7b* MIMAT0004482 Homo sapiens let-7b* CTATACAACCTACTGCCTTCCC >hsa-let-7c MIMAT0000064 Homo sapiens let-7c TGAGGTAGTAGGTTGTATGGTT >hsa-let-7c* MIMAT0004483 Homo sapiens let-7c* TAGAGTTACACCCTGGGAGTTA >hsa-let-7d MIMAT0000065 Homo sapiens let-7d AGAGGTAGTAGGTTGCATAGTT >hsa-let-7d* MIMAT0004484 Homo sapiens let-7d* CTATACGACCTGCTGCCTTTCT >hsa-let-7e MIMAT0000066 Homo sapiens let-7e TGAGGTAGGAGGTTGTATAGTT >hsa-let-7e* MIMAT0004485 Homo sapiens let-7e* CTATACGGCCTCCTAGCTTTCC >hsa-let-7f MIMAT0000067 Homo sapiens let-7f TGAGGTAGTAGATTGTATAGTT >hsa-let-7f-1* MIMAT0004486 Homo sapiens let-7f-1* CTATACAATCTATTGCCTTCCC >hsa-let-7f-2* MIMAT0004487 Homo sapiens let-7f-2* CTATACAGTCTACTGTCTTTCC |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: WashU Join Date: Aug 2010
Posts: 117
|
![]()
Your output is definitely strange... Try something like the following using the binaries within your BLAST installation:
Create a search database using formatdb formatdb -i subject.fa -t subject -p F -o F -n subject Now use blastall in a similar command to the one you posted using a word size of 6 and standard output option blastall -i query.fa -d subject -p blastn -W 6 -o alignments.txt Now try again using the tabular output option blastall -m 8 -i query.fa -d subject -p blastn -W 6 -o alignments.tabular.txt Sequences producing significant alignments: (bits) Value hsa-let-7f MIMAT0000067 Homo sapiens let-7f 44 2e-10 hsa-let-7a MIMAT0000062 Homo sapiens let-7a 36 5e-08 hsa-let-7e MIMAT0000066 Homo sapiens let-7e 28 1e-05 hsa-let-7c MIMAT0000064 Homo sapiens let-7c 28 1e-05 hsa-let-7a* MIMAT0004481 Homo sapiens let-7a* 28 1e-05 hsa-let-7d MIMAT0000065 Homo sapiens let-7d 26 5e-05 hsa-let-7f-1* MIMAT0004486 Homo sapiens let-7f-1* 24 2e-04 hsa-let-7b MIMAT0000063 Homo sapiens let-7b 24 2e-04 hsa-let-7f-2* MIMAT0004487 Homo sapiens let-7f-2* 20 0.003 hsa-let-7e* MIMAT0004485 Homo sapiens let-7e* 12 0.73 hsa-let-7d* MIMAT0004484 Homo sapiens let-7d* 12 0.73 In tabular format it looks like this: 04_x240274 hsa-let-7f 100.00 22 0 0 1 22 1 22 2e-10 44.1 04_x240274 hsa-let-7a 95.45 22 1 0 1 22 1 22 5e-08 36.2 04_x240274 hsa-let-7e 90.91 22 2 0 1 22 1 22 1e-05 28.2 04_x240274 hsa-let-7c 94.44 18 1 0 1 18 1 18 1e-05 28.2 04_x240274 hsa-let-7a* 100.00 14 0 0 7 20 14 1 1e-05 28.2 04_x240274 hsa-let-7d 90.48 21 2 0 2 22 2 22 5e-05 26.3 04_x240274 hsa-let-7f-1* 100.00 12 0 0 9 20 12 1 2e-04 24.3 04_x240274 hsa-let-7b 93.75 16 1 0 1 16 1 16 2e-04 24.3 04_x240274 hsa-let-7b 92.86 14 1 0 7 20 36 23 0.003 20.3 04_x240274 hsa-let-7f-2* 92.86 14 1 0 7 20 14 1 0.003 20.3 04_x240274 hsa-let-7e* 100.00 6 0 0 15 20 6 1 0.73 12.4 04_x240274 hsa-let-7d* 100.00 6 0 0 15 20 6 1 0.73 12.4 As you can see, the sorting does appear to be according to E-value by default... |
![]() |
![]() |
![]() |
#5 |
not just another member
Location: Belgium Join Date: Aug 2010
Posts: 264
|
![]()
works fine now !
Thanks a lot ! ![]() Nicolas |
![]() |
![]() |
![]() |
#6 |
not just another member
Location: Belgium Join Date: Aug 2010
Posts: 264
|
![]()
other question
I want to thresold on the e-value to discard the bad alignment. What is the best value for the thresold ? 0.1 , 1 ? Thanks |
![]() |
![]() |
![]() |
#7 |
Member
Location: Canada Join Date: Feb 2011
Posts: 61
|
![]()
I have a question following the thread.
When the output is in tabular format (-m 8), is there a way to mark the score by order with rank? Say the original output: Code:
hsa-let-7f MIMAT0000067 Homo sapiens let-7f 44 2e-10 hsa-let-7f MIMAT0000062 Homo sapiens let-7a 36 5e-08 hsa-let-7f MIMAT0000066 Homo sapiens let-7e 28 1e-05 Code:
hsa-let-7f MIMAT0000067 Homo sapiens let-7f 44 1 2e-10 hsa-let-7f MIMAT0000062 Homo sapiens let-7a 36 2 5e-08 hsa-let-7f MIMAT0000066 Homo sapiens let-7e 28 3 1e-05 Yifang Last edited by yifangt; 03-08-2011 at 03:35 PM. |
![]() |
![]() |
![]() |
#8 |
Member
Location: France Join Date: Jan 2010
Posts: 23
|
![]()
Hi,
You want the second best blast hit ? If so, no need to count the rank, just use a shell command : Code:
head -n 2 blast_output.m8 | tail -n 1 Will give you the second line in the blast_output.m8 file Code:
hsa-let-7f MIMAT0000062 Homo sapiens let-7a 36 5e-08 |
![]() |
![]() |
![]() |
#9 | |
Member
Location: Canada Join Date: Feb 2011
Posts: 61
|
![]() Quote:
I should have made myself clearer, actually my question is more general as for several thousands queries, each of which may give many hits, than the specific 3 rows example. What I meant is the second best hit (or the first hit, or third hit, etc) of each query in the output file. That is why I need the rank of the e-value or bits score of "each query" so that I can sort them later. There are parameters to ask BLAST only to output the first 3 hits (-v3 -b3), but without an extra rank column that I need. It seems there is a option in the blastall command line. Just could not figure it out myself. Iam aware of the output can be parsed later for this purpose, but the option (if any) could make the job much easier. |
|
![]() |
![]() |
![]() |
#10 | |
Member
Location: France Join Date: Jan 2010
Posts: 23
|
![]()
Sorry, now I get it...
Me neither ![]() Quote:
Code:
awk '{print $1" "$2" "$3" "$4" "$5" "$6" "NR" "$7}' |
|
![]() |
![]() |
![]() |
#11 |
Member
Location: Oregon Join Date: Feb 2011
Posts: 29
|
![]()
normal untested disclaimers apply but the pattern is adaptable
this is assuming the fields are grouped by first (queries) and ordered by whatever field (E value) you choose. and do not mind the rank number out front awk 'BEGIN{q=$1;c=0}'{if(q != $1){q=$1;c=0};c++;print c $0}' |
![]() |
![]() |
![]() |
Thread Tools | |
|
|