SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BLAST+ creating custom blast database and using blast+ filtering features deniz Bioinformatics 3 07-07-2019 09:04 AM
sorting bam files frymor Bioinformatics 23 02-10-2016 06:46 PM
BLAST+ vs BLASTALL (legacy BLAST) Symphysodon Bioinformatics 4 10-25-2011 03:52 PM
BLAST database error - when changing to new BLAST+ local program biobio Bioinformatics 4 06-15-2011 06:20 AM
Sorting large files scami Bioinformatics 3 09-21-2010 12:45 AM

Reply
 
Thread Tools
Old 09-15-2010, 07:05 AM   #1
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 264
Default blast e-value sorting

Hi,

I don't know to sort blastn output compare to the score ( or the e-value ). I've got to sequence ( one with some queries, one with the subjects )

I executed this command :

blast2 -i query.fa -j subject.fa -o output.blast -p blastn -W 6

Here's a little example of my output :

Code:
                                                                
Sequences producing significant alignments:                   score(bits)   E-Value

1_0 hsa-let-7a MIMAT0000062 Homo sapiens let-7a                        36   4e-09
2_0 hsa-let-7a* MIMAT0004481 Homo sapiens let-7a*                      28   1e-06
4_0 hsa-let-7b MIMAT0000063 Homo sapiens let-7b                        24   2e-05
5_0 hsa-let-7b* MIMAT0004482 Homo sapiens let-7b*                      20   3e-04
6_0 hsa-let-7c MIMAT0000064 Homo sapiens let-7c                        28   1e-06
8_0 hsa-let-7d MIMAT0000065 Homo sapiens let-7d                        26   4e-06
9_0 hsa-let-7d* MIMAT0004484 Homo sapiens let-7d*                      12   0.064
10_0 hsa-let-7e MIMAT0000066 Homo sapiens let-7e                       28   1e-06
11_0 hsa-let-7e* MIMAT0004485 Homo sapiens let-7e*                     12   0.064
12_0 hsa-let-7f MIMAT0000067 Homo sapiens let-7f                       44   2e-11
13_0 hsa-let-7f-1* MIMAT0004486 Homo sapiens let-7f-1*                 24   2e-05
14_0 hsa-let-7f-2* MIMAT0004487 Homo sapiens let-7f-2*                 20   3e-04
23_0 hsa-miR-19a* MIMAT0004490 Homo sapiens miR-19a*                   12   0.064
24_0 hsa-miR-19a MIMAT0000073 Homo sapiens miR-19a                     12   0.067
I highlight in red the best alignment. What's the parameter for sorting the matches compare to the e-value or the score ?

Thanks a lot,

Nicolas
NicoBxl is offline   Reply With Quote
Old 09-15-2010, 11:29 AM   #2
malachig
Senior Member
 
Location: WashU

Join Date: Aug 2010
Posts: 117
Default

Could you provide a sample of your two .fa files and the version of BLAST that you are using?

Is it possible that those are the best hits from different queries. I'm very not familiar with this style of BLAST output. Have you considered using the tabular output format? If you do not need the actual alignments but only the alignment info (including score) you can specify tabular output using the '-m 8' option.
malachig is offline   Reply With Quote
Old 09-15-2010, 11:46 PM   #3
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 264
Default

My version of blast is 2.2.21

here's a sample of the two .fa I use.

the query :

>04_x240274
TGAGGTAGTAGATTGTATAGTTT


the subject (it's the homo sapiens mature mirna from mirbase 16 ) :

>hsa-let-7a MIMAT0000062 Homo sapiens let-7a
TGAGGTAGTAGGTTGTATAGTT
>hsa-let-7a* MIMAT0004481 Homo sapiens let-7a*
CTATACAATCTACTGTCTTTC
>hsa-let-7a-2* MIMAT0010195 Homo sapiens let-7a-2*
CTGTACAGCCTCCTAGCTTTCC
>hsa-let-7b MIMAT0000063 Homo sapiens let-7b
TGAGGTAGTAGGTTGTGTGGTT
>hsa-let-7b* MIMAT0004482 Homo sapiens let-7b*
CTATACAACCTACTGCCTTCCC
>hsa-let-7c MIMAT0000064 Homo sapiens let-7c
TGAGGTAGTAGGTTGTATGGTT
>hsa-let-7c* MIMAT0004483 Homo sapiens let-7c*
TAGAGTTACACCCTGGGAGTTA
>hsa-let-7d MIMAT0000065 Homo sapiens let-7d
AGAGGTAGTAGGTTGCATAGTT
>hsa-let-7d* MIMAT0004484 Homo sapiens let-7d*
CTATACGACCTGCTGCCTTTCT
>hsa-let-7e MIMAT0000066 Homo sapiens let-7e
TGAGGTAGGAGGTTGTATAGTT
>hsa-let-7e* MIMAT0004485 Homo sapiens let-7e*
CTATACGGCCTCCTAGCTTTCC
>hsa-let-7f MIMAT0000067 Homo sapiens let-7f
TGAGGTAGTAGATTGTATAGTT
>hsa-let-7f-1* MIMAT0004486 Homo sapiens let-7f-1*
CTATACAATCTATTGCCTTCCC
>hsa-let-7f-2* MIMAT0004487 Homo sapiens let-7f-2*
CTATACAGTCTACTGTCTTTCC
NicoBxl is offline   Reply With Quote
Old 09-16-2010, 09:03 PM   #4
malachig
Senior Member
 
Location: WashU

Join Date: Aug 2010
Posts: 117
Default

Your output is definitely strange... Try something like the following using the binaries within your BLAST installation:

Create a search database using formatdb
formatdb -i subject.fa -t subject -p F -o F -n subject

Now use blastall in a similar command to the one you posted using a word size of 6 and standard output option
blastall -i query.fa -d subject -p blastn -W 6 -o alignments.txt

Now try again using the tabular output option
blastall -m 8 -i query.fa -d subject -p blastn -W 6 -o alignments.tabular.txt

Sequences producing significant alignments: (bits) Value
hsa-let-7f MIMAT0000067 Homo sapiens let-7f 44 2e-10
hsa-let-7a MIMAT0000062 Homo sapiens let-7a 36 5e-08
hsa-let-7e MIMAT0000066 Homo sapiens let-7e 28 1e-05
hsa-let-7c MIMAT0000064 Homo sapiens let-7c 28 1e-05
hsa-let-7a* MIMAT0004481 Homo sapiens let-7a* 28 1e-05
hsa-let-7d MIMAT0000065 Homo sapiens let-7d 26 5e-05
hsa-let-7f-1* MIMAT0004486 Homo sapiens let-7f-1* 24 2e-04
hsa-let-7b MIMAT0000063 Homo sapiens let-7b 24 2e-04
hsa-let-7f-2* MIMAT0004487 Homo sapiens let-7f-2* 20 0.003
hsa-let-7e* MIMAT0004485 Homo sapiens let-7e* 12 0.73
hsa-let-7d* MIMAT0004484 Homo sapiens let-7d* 12 0.73

In tabular format it looks like this:
04_x240274 hsa-let-7f 100.00 22 0 0 1 22 1 22 2e-10 44.1
04_x240274 hsa-let-7a 95.45 22 1 0 1 22 1 22 5e-08 36.2
04_x240274 hsa-let-7e 90.91 22 2 0 1 22 1 22 1e-05 28.2
04_x240274 hsa-let-7c 94.44 18 1 0 1 18 1 18 1e-05 28.2
04_x240274 hsa-let-7a* 100.00 14 0 0 7 20 14 1 1e-05 28.2
04_x240274 hsa-let-7d 90.48 21 2 0 2 22 2 22 5e-05 26.3
04_x240274 hsa-let-7f-1* 100.00 12 0 0 9 20 12 1 2e-04 24.3
04_x240274 hsa-let-7b 93.75 16 1 0 1 16 1 16 2e-04 24.3
04_x240274 hsa-let-7b 92.86 14 1 0 7 20 36 23 0.003 20.3
04_x240274 hsa-let-7f-2* 92.86 14 1 0 7 20 14 1 0.003 20.3
04_x240274 hsa-let-7e* 100.00 6 0 0 15 20 6 1 0.73 12.4
04_x240274 hsa-let-7d* 100.00 6 0 0 15 20 6 1 0.73 12.4

As you can see, the sorting does appear to be according to E-value by default...
Attached Files
File Type: txt alignments.txt (5.6 KB, 5 views)
File Type: txt alignments.tabular.txt (695 Bytes, 10 views)
malachig is offline   Reply With Quote
Old 09-17-2010, 12:50 AM   #5
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 264
Smile

works fine now !

Thanks a lot !

Nicolas
NicoBxl is offline   Reply With Quote
Old 09-17-2010, 01:04 AM   #6
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 264
Default

other question

I want to thresold on the e-value to discard the bad alignment. What is the best value for the thresold ? 0.1 , 1 ?

Thanks
NicoBxl is offline   Reply With Quote
Old 03-08-2011, 04:19 PM   #7
yifangt
Member
 
Location: Canada

Join Date: Feb 2011
Posts: 61
Default How to mark the order of the output by e-value or score?

I have a question following the thread.
When the output is in tabular format (-m 8), is there a way to mark the score by order with rank?
Say the original output:
Code:
hsa-let-7f MIMAT0000067 Homo sapiens let-7f 44 2e-10
hsa-let-7f MIMAT0000062 Homo sapiens let-7a 36 5e-08
hsa-let-7f MIMAT0000066 Homo sapiens let-7e 28 1e-05
But I want it like this with the extra column (look at the second last):
Code:
hsa-let-7f MIMAT0000067 Homo sapiens let-7f 44 1 2e-10
hsa-let-7f MIMAT0000062 Homo sapiens let-7a 36 2 5e-08
hsa-let-7f MIMAT0000066 Homo sapiens let-7e 28 3 1e-05
The reason I want this is to pick up the second one only (instead of the first one, sounds silly?!) which could be done by sorting. I tried the b option which seemed not working. Not sure the parameter for this format of output, if any. Thanks a lot!

Yifang

Last edited by yifangt; 03-08-2011 at 04:35 PM.
yifangt is offline   Reply With Quote
Old 03-09-2011, 02:11 AM   #8
nicolallias
Member
 
Location: France

Join Date: Jan 2010
Posts: 23
Default

Hi,
You want the second best blast hit ? If so, no need to count the rank, just use a shell command :
Code:
head -n 2 blast_output.m8 | tail -n 1
(in fact, change 2 by the position of the line you want)
Will give you the second line in the blast_output.m8 file
Code:
hsa-let-7f MIMAT0000062 Homo sapiens let-7a 36 5e-08
Hope this could help.
nicolallias is offline   Reply With Quote
Old 03-09-2011, 05:20 AM   #9
yifangt
Member
 
Location: Canada

Join Date: Feb 2011
Posts: 61
Default How to mark the order of the output by e-value or score?

Quote:
Originally Posted by nicolallias View Post
Hi,
You want the second best blast hit ? If so, no need to count the rank, just use a shell command :
Code:
head -n 2 blast_output.m8 | tail -n 1
(in fact, change 2 by the position of the line you want)
Will give you the second line in the blast_output.m8 file
Code:
hsa-let-7f MIMAT0000062 Homo sapiens let-7a 36 5e-08
Hope this could help.
Thanks,
I should have made myself clearer, actually my question is more general as for several thousands queries, each of which may give many hits, than the specific 3 rows example.
What I meant is the second best hit (or the first hit, or third hit, etc) of each query in the output file. That is why I need the rank of the e-value or bits score of "each query" so that I can sort them later. There are parameters to ask BLAST only to output the first 3 hits (-v3 -b3), but without an extra rank column that I need. It seems there is a option in the blastall command line. Just could not figure it out myself.
Iam aware of the output can be parsed later for this purpose, but the option (if any) could make the job much easier.
yifangt is offline   Reply With Quote
Old 03-09-2011, 06:31 AM   #10
nicolallias
Member
 
Location: France

Join Date: Jan 2010
Posts: 23
Default

Sorry, now I get it...
Quote:
Originally Posted by yifangt View Post
Just could not figure it out myself.
Me neither
Quote:
Originally Posted by yifangt View Post
Iam aware of the output can be parsed later for this purpose, but the option (if any) could make the job much easier.
Anyway, you could pipe with a
Code:
awk '{print $1" "$2" "$3" "$4" "$5" "$6" "NR" "$7}'
nicolallias is offline   Reply With Quote
Old 03-09-2011, 08:31 AM   #11
tomc
Member
 
Location: Oregon

Join Date: Feb 2011
Posts: 29
Default

normal untested disclaimers apply but the pattern is adaptable
this is assuming the fields are grouped by first (queries) and ordered by whatever field (E value) you choose. and do not mind the rank number out front

awk 'BEGIN{q=$1;c=0}'{if(q != $1){q=$1;c=0};c++;print c $0}'
tomc is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:02 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO