Hello all,
After trolling the forum for some time trying to find an answer to my question, I decided to reach out to the community for some additional assistance. Currently, my group is working on SNP detection in bacterial species for phylogenetic analyses. I am fairly new to the bioinformatics but am luckily working on these data within a local implementation of Galaxy.
At this point, I am trying to compare SNP calls between UnifiedGenotyper, Mpileup/Varscan, and FreeBayes, all which seem to be popular algorithms/programs. For our analysis, we are using short read paired-end illumina data (2x100) with usually 25-30 samples.
My first question is which algorithm/program is preferred for bacterial species, or are there any best practices for these analyses. Secondly, and most importantly, I am having trouble finding criteria/guidelines for SNP filtering. On the GATK site and essentially everyplace else I found, it says that this process is highly subjective. However, I need a starting place, or some criteria to base my subjective decisions on. For example, should I base the decision strictly on the QUAL value, or a number of values including the gentoype quality, read depth, or mapping quality. My ultimate goal is the reduce false positive SNP calls (which I assume would have to do with some probability) as well as SNPs in areas with low coverage.
Any assistance would be greatly appreciated.
After trolling the forum for some time trying to find an answer to my question, I decided to reach out to the community for some additional assistance. Currently, my group is working on SNP detection in bacterial species for phylogenetic analyses. I am fairly new to the bioinformatics but am luckily working on these data within a local implementation of Galaxy.
At this point, I am trying to compare SNP calls between UnifiedGenotyper, Mpileup/Varscan, and FreeBayes, all which seem to be popular algorithms/programs. For our analysis, we are using short read paired-end illumina data (2x100) with usually 25-30 samples.
My first question is which algorithm/program is preferred for bacterial species, or are there any best practices for these analyses. Secondly, and most importantly, I am having trouble finding criteria/guidelines for SNP filtering. On the GATK site and essentially everyplace else I found, it says that this process is highly subjective. However, I need a starting place, or some criteria to base my subjective decisions on. For example, should I base the decision strictly on the QUAL value, or a number of values including the gentoype quality, read depth, or mapping quality. My ultimate goal is the reduce false positive SNP calls (which I assume would have to do with some probability) as well as SNPs in areas with low coverage.
Any assistance would be greatly appreciated.
Comment