I am working with paired end Illumina resequencing data from multiple populations.
My workflow involves samtools view -q to generate bamfiles with specific mapq score thresholds, samtools mpileup, and then a workflow specific to pooled DNA samples and multiple population comparisons (popoolation2), which identifies fst values for snps and more.
I have run into a confusing aspect of mapq scores (perhaps). When I use a mapq score of 10 to generate the bam-files, I recover substantial numbers of snps in my workflow. However, if I use 20, ALL of these disappear. I have examined the mapq scores at various snp locations from the data with the mapq score of 10, and given the read depth (correct term?), there are several mapq scores for each snp, most of which are >20.
I am basically interested in how samtools view -q 20 versus -q 10 "removes" particular locations from the bamfile - for it seems like a very steep threshold.
I hope this makes sense.
My workflow involves samtools view -q to generate bamfiles with specific mapq score thresholds, samtools mpileup, and then a workflow specific to pooled DNA samples and multiple population comparisons (popoolation2), which identifies fst values for snps and more.
I have run into a confusing aspect of mapq scores (perhaps). When I use a mapq score of 10 to generate the bam-files, I recover substantial numbers of snps in my workflow. However, if I use 20, ALL of these disappear. I have examined the mapq scores at various snp locations from the data with the mapq score of 10, and given the read depth (correct term?), there are several mapq scores for each snp, most of which are >20.
I am basically interested in how samtools view -q 20 versus -q 10 "removes" particular locations from the bamfile - for it seems like a very steep threshold.
I hope this makes sense.
Comment