I have noticed a problem with mpileup (samtools) that results in a large loss of reads for certain base positions, with some bases being entirely absent. Briefly, I created a pileup file from a merged bam file which comprised 13 separate bams (using the merge option in samtools) and then created an mpileup from the same 13 bams and compared the results. The pileup contained 133M unique bases, while the mpileup contained only 111M unique bases, a reduction of 16%. Looking in more detail at the output, I noticed that not only were many bases completely absent, several others had massively reduced coverage. Moreover, in many cases the only remaining reads were those containing the alternate (non-reference) allele, all other possibilities were absent.
I was using samtools 0.1.9 in each case. All parameters were set to default for the pileup, and only -B -Q 0 were used for the mpileup. Prior to the creation of the different pileups the bam files had previously been filtered for base and mapping qualities below 20, and all ambiguously mapped reads were removed.
Any ideas on what might be causing this discrepency?
I was using samtools 0.1.9 in each case. All parameters were set to default for the pileup, and only -B -Q 0 were used for the mpileup. Prior to the creation of the different pileups the bam files had previously been filtered for base and mapping qualities below 20, and all ambiguously mapped reads were removed.
Any ideas on what might be causing this discrepency?
Comment