Hey Folks,
Background: I am trying to shortlist the list of genes predicted by different gene prediction methods. For this i am considering blast hits as my basis for shortlisting. I merged the results obtained from different gene prediction methods as follows:
AB_Contig140_G3 Start end ABC_Contig131_G47 Start end ABC_Contig67_G20 Start end
where A, B, C stands for gene prediction methods. For example g3 with coordinates start and end in contig 140 was found in gene prediction methods A and B.
Question: Blast: i am using a large database to blast my genes against it and i want only the best hit(Say 1 or 2) to just confirm that the gene predicted by gene prediction methods is actually a good gene with a good score.
I have tried to use various flags for blast commands as follows: 1) Command: blastn -query gene_Seq_22471 -out 24712_trial0.blast.txt -db Fastadb -outfmt 6
output: 17 hits
ABC_Contig7_G17 gi|385856165|ref|NC_017518.1| 100.00 924 0 0 1 924 1523212 1524135 0.0 1707 ABC_Contig7_G17 gi|385340991|ref|NC_017514.1| 100.00 924 0 0 1 924 812018 811095 0.0 1707 ABC_Contig7_G17 gi|254804028|ref|NC_013016.1| 100.00 924 0 0 1 924 1364185 1365108 0.0 1707 ABC_Contig7_G17 gi|385339062|ref|NC_017513.1| 99.78 924 2 0 1 924 1412253 1413176 0.0 1696 ABC_Contig7_G17 gi|385852231|ref|NC_017516.1| 99.68 924 3 0 1 924 822476 821553 0.0 1690 ABC_Contig7_G17 gi|385850283|ref|NC_017515.1| 99.68 924 3 0 1 924 819387 818464 0.0 1690 ABC_Contig7_G17 gi|77358697|ref|NC_003112.2| 99.68 924 3 0 1 924 1508038 1508961 0.0 1690 ABC_Contig7_G17 gi|385854193|ref|NC_017517.1| 99.68 924 3 0 1 924 1490110 1491033 0.0 1690 ABC_Contig7_G17 gi|121633901|ref|NC_008767.1| 99.57 924 4 0 1 924 1396373 1397296 0.0 1685 ABC_Contig7_G17 gi|385327372|ref|NC_017505.1| 99.57 924 4 0 1 924 980909 979986 0.0 1685 ABC_Contig7_G17 gi|15793034|ref|NC_003116.1| 99.46 924 5 0 1 924 1593604 1594527 0.0 1679 ABC_Contig7_G17 gi|385337120|ref|NC_017512.1| 99.46 924 5 0 1 924 1370042 1370965 0.0 1679 ABC_Contig7_G17 gi|385323172|ref|NC_017501.1| 99.46 924 5 0 1 924 923048 922125 0.0 1679 ABC_Contig7_G17 gi|161869018|ref|NC_010120.1| 99.46 924 5 0 1 924 1390914 1391837 0.0 1679
2) Command: blastn -query gene_Seq_22471 -out 22471_trial1.blast.txt -db Fastadb -num_threads 12 -evalue 1e-10 -best_hit_score_edge 0.05 -best_hit_overhang 0.25 -outfmt 6 -perc_identity 50 -max_target_seqs 1
output: just 1 hit ABC_Contig7_G17 gi|385856165|ref|NC_017518.1| 100.00 924 0 0 1 924 1523212 1524135 0.0 1707
3) Command: command: blastn -query gene_Seq_22471 -out 22471_trial2.blast.txt -db Fastadb -evalue 1e-10 -best_hit_score_edge 0.05 -best_hit_overhang 0.25 -outfmt 6 -perc_identity 50 -max_target_seqs 1
Output: ABC_Contig7_G17 gi|385856165|ref|NC_017518.1| 100.00 924 0 0 1 924 1523212 1524135 0.0 1707
1) I just wanted to confirm if i am doing everything right here and what is the significance of the blast flags i am using. 2) Also if you can suggest me a good threshold values for the flags i am using to blast against the database.
Thanks in advance.
Background: I am trying to shortlist the list of genes predicted by different gene prediction methods. For this i am considering blast hits as my basis for shortlisting. I merged the results obtained from different gene prediction methods as follows:
AB_Contig140_G3 Start end ABC_Contig131_G47 Start end ABC_Contig67_G20 Start end
where A, B, C stands for gene prediction methods. For example g3 with coordinates start and end in contig 140 was found in gene prediction methods A and B.
Question: Blast: i am using a large database to blast my genes against it and i want only the best hit(Say 1 or 2) to just confirm that the gene predicted by gene prediction methods is actually a good gene with a good score.
I have tried to use various flags for blast commands as follows: 1) Command: blastn -query gene_Seq_22471 -out 24712_trial0.blast.txt -db Fastadb -outfmt 6
output: 17 hits
ABC_Contig7_G17 gi|385856165|ref|NC_017518.1| 100.00 924 0 0 1 924 1523212 1524135 0.0 1707 ABC_Contig7_G17 gi|385340991|ref|NC_017514.1| 100.00 924 0 0 1 924 812018 811095 0.0 1707 ABC_Contig7_G17 gi|254804028|ref|NC_013016.1| 100.00 924 0 0 1 924 1364185 1365108 0.0 1707 ABC_Contig7_G17 gi|385339062|ref|NC_017513.1| 99.78 924 2 0 1 924 1412253 1413176 0.0 1696 ABC_Contig7_G17 gi|385852231|ref|NC_017516.1| 99.68 924 3 0 1 924 822476 821553 0.0 1690 ABC_Contig7_G17 gi|385850283|ref|NC_017515.1| 99.68 924 3 0 1 924 819387 818464 0.0 1690 ABC_Contig7_G17 gi|77358697|ref|NC_003112.2| 99.68 924 3 0 1 924 1508038 1508961 0.0 1690 ABC_Contig7_G17 gi|385854193|ref|NC_017517.1| 99.68 924 3 0 1 924 1490110 1491033 0.0 1690 ABC_Contig7_G17 gi|121633901|ref|NC_008767.1| 99.57 924 4 0 1 924 1396373 1397296 0.0 1685 ABC_Contig7_G17 gi|385327372|ref|NC_017505.1| 99.57 924 4 0 1 924 980909 979986 0.0 1685 ABC_Contig7_G17 gi|15793034|ref|NC_003116.1| 99.46 924 5 0 1 924 1593604 1594527 0.0 1679 ABC_Contig7_G17 gi|385337120|ref|NC_017512.1| 99.46 924 5 0 1 924 1370042 1370965 0.0 1679 ABC_Contig7_G17 gi|385323172|ref|NC_017501.1| 99.46 924 5 0 1 924 923048 922125 0.0 1679 ABC_Contig7_G17 gi|161869018|ref|NC_010120.1| 99.46 924 5 0 1 924 1390914 1391837 0.0 1679
2) Command: blastn -query gene_Seq_22471 -out 22471_trial1.blast.txt -db Fastadb -num_threads 12 -evalue 1e-10 -best_hit_score_edge 0.05 -best_hit_overhang 0.25 -outfmt 6 -perc_identity 50 -max_target_seqs 1
output: just 1 hit ABC_Contig7_G17 gi|385856165|ref|NC_017518.1| 100.00 924 0 0 1 924 1523212 1524135 0.0 1707
3) Command: command: blastn -query gene_Seq_22471 -out 22471_trial2.blast.txt -db Fastadb -evalue 1e-10 -best_hit_score_edge 0.05 -best_hit_overhang 0.25 -outfmt 6 -perc_identity 50 -max_target_seqs 1
Output: ABC_Contig7_G17 gi|385856165|ref|NC_017518.1| 100.00 924 0 0 1 924 1523212 1524135 0.0 1707
1) I just wanted to confirm if i am doing everything right here and what is the significance of the blast flags i am using. 2) Also if you can suggest me a good threshold values for the flags i am using to blast against the database.
Thanks in advance.