I used bwa and samtools mpileup to call SNPs of an exome dataset. The data was targeted enrichment followed by HiSeq. However, I got about 80~90k SNPs. Even worse, 75% of these SNPs are non-exonic. If I use pileup, there are still 60k SNPs.
The original data was shown as Illumina 1.5+ (by the software FastQC), so I used -I in the bwa aln step. All the other steps followed the mpileup website.
Can anyone give me some suggestions why so many SNPs?
The original data was shown as Illumina 1.5+ (by the software FastQC), so I used -I in the bwa aln step. All the other steps followed the mpileup website.
Can anyone give me some suggestions why so many SNPs?