Dear all,
I have performed a blastx on my 30.000 contigs of my transcriptome.
I want to keep only the contigs for which the sequence description (output of blast2go) is reliable. i often see papers in which people only use a e-value cut off or a bit score cut off. Using this, I still have contigs with hits with a similarity of 40% that are not removed. I'm convinced that this is insufficient to trust the sequence description and to continue with GO, KEGG pathways and any furhter downstream application.
I would like to use a similarity cut off to reliable say that the contig is in fact the gene and related function of the blast results. Does anybody know what a good cut off value might be? Or how to proceed.
I have yet to encounter a paper where people really adress the low similarity problem. If you would have any suggestions feel free to share.
Cheers,
I have performed a blastx on my 30.000 contigs of my transcriptome.
I want to keep only the contigs for which the sequence description (output of blast2go) is reliable. i often see papers in which people only use a e-value cut off or a bit score cut off. Using this, I still have contigs with hits with a similarity of 40% that are not removed. I'm convinced that this is insufficient to trust the sequence description and to continue with GO, KEGG pathways and any furhter downstream application.
I would like to use a similarity cut off to reliable say that the contig is in fact the gene and related function of the blast results. Does anybody know what a good cut off value might be? Or how to proceed.
I have yet to encounter a paper where people really adress the low similarity problem. If you would have any suggestions feel free to share.
Cheers,
Comment