I am analysing sequencing data (exome and region capture) of ENU-mutagenised mice. The most recent sample is a capture of ~28Mb region where the mutation has been previously mapped to. After running the sample through our pipeline I found that approx. 60% of variants called within the target region were INDELs compared to approx. 30% for similar-ish exome sequencing projects. All samples are sequencing on the Illumina GAII or HiSeq.
Our pipeline is roughly as follows:
Alignment with quality score recalibration with Novoalign -> remove multimapping reads -> duplicate removal with Picard's MarkDuplicates -> Variant calling with mpileup and bcftools
with mpileup and bcftools parameters as follows:
samtools mpileup -q1 -C50 -d10000 -L10000 -ugf $reference $bamfile | bcftools view -bvcg - > $bcffile
bcftools view $bcffile | vcfutils.pl varFilter -D10000 -w0 -W0
I realise there are considerable differences between my region capture sample and my comparison exome samples (e.g. different capture platforms, generally higher coverage for region capture), however this result still has me a little concerned.
Would local realignment (such as that offered in GATK) be advisable to resolve the status of these putative INDELs?
Does this INDEL proportion strike others as too high?
Any suggestions would be welcome.
Pete
Our pipeline is roughly as follows:
Alignment with quality score recalibration with Novoalign -> remove multimapping reads -> duplicate removal with Picard's MarkDuplicates -> Variant calling with mpileup and bcftools
with mpileup and bcftools parameters as follows:
samtools mpileup -q1 -C50 -d10000 -L10000 -ugf $reference $bamfile | bcftools view -bvcg - > $bcffile
bcftools view $bcffile | vcfutils.pl varFilter -D10000 -w0 -W0
I realise there are considerable differences between my region capture sample and my comparison exome samples (e.g. different capture platforms, generally higher coverage for region capture), however this result still has me a little concerned.
Would local realignment (such as that offered in GATK) be advisable to resolve the status of these putative INDELs?
Does this INDEL proportion strike others as too high?
Any suggestions would be welcome.
Pete
Comment