I ran the following command thinking that it would return information for every site in my reference sequence (a transcriptome).
<code>
samtools mpileup -uf Ref.fasta aln.sorted.bam | bcftools view -bcg - > aln.sorted.bcf
</code>
However, ~2% of the sites are missing from the output. I wrote a perl script to determine which sites were missing and ~92% of the missing sites were from transcript ends. I guessed that perhaps sites that had no coverage were not returned; however, some internal sites with no coverage were returned. I have rulled out BAQ as being the problem as well as the same sites are consistently missing whether or not a BAQ calculation is used.
Does anyone know what would cause some of the sites to not be included in the output?
Thanks for your help!
<code>
samtools mpileup -uf Ref.fasta aln.sorted.bam | bcftools view -bcg - > aln.sorted.bcf
</code>
However, ~2% of the sites are missing from the output. I wrote a perl script to determine which sites were missing and ~92% of the missing sites were from transcript ends. I guessed that perhaps sites that had no coverage were not returned; however, some internal sites with no coverage were returned. I have rulled out BAQ as being the problem as well as the same sites are consistently missing whether or not a BAQ calculation is used.
Does anyone know what would cause some of the sites to not be included in the output?
Thanks for your help!