Hey All,
I only used so far three filters for my whole exome pipeline (aligning to hg19) for a HapMap sample. I tried it on the NA19240 Hapmap sample from paper below (Table 3) which shows ~196 variants (SNPs and INDELs).
However, using my filters as below I get = ~15000 (just NON_SYNONYMOUS_CODING alterations) and ~500 (INDELs). If you add INDELS, it's going to be much higher number. What am I doing wrong?
My list of filters are:
1) vcfutils varFilter -D1000
2) snpEff -minQ 20 -minCoverage 30
Could they have different filters like frequency of variants etc.? If so, how do I set these up? Any help? What are the default parameters for # of reads (minimum) and frequency in bwa,samtools?
Below is my pipeline:
* bwa aln hg19.fa S375_R1.fastq > S375_1.sai
* bwa aln hg19.fa S375_R2.fastq > S375_2.sai
* bwa sampe hg19.fa S375_1.sai S375_2.sai S375_R1.fastq S375_R2.fastq > S375_NoIndex_L007.sam
* samtools view -bS S375_NoIndex_L007.sam > S375_NoIndex_L007.bam
* samtools sort S375_NoIndex_L007.bam S375_NoIndex_L007.sorted
* Marked duplicates using picard
* samtools index S375_NoIndex_L007.marked.bam
* samtools mpileup -uf hg19.fa S375_NoIndex_L007.marked.bam | bcftools view -bvcg - > S375_NoIndex_L007.raw.bcf
* bcftools view S375_NoIndex_L007.raw.bcf | vcfutils.pl varFilter -D1000 > S375_NoIndex_L007_var_d200.flt.vcf
I only used so far three filters for my whole exome pipeline (aligning to hg19) for a HapMap sample. I tried it on the NA19240 Hapmap sample from paper below (Table 3) which shows ~196 variants (SNPs and INDELs).
However, using my filters as below I get = ~15000 (just NON_SYNONYMOUS_CODING alterations) and ~500 (INDELs). If you add INDELS, it's going to be much higher number. What am I doing wrong?
My list of filters are:
1) vcfutils varFilter -D1000
2) snpEff -minQ 20 -minCoverage 30
Could they have different filters like frequency of variants etc.? If so, how do I set these up? Any help? What are the default parameters for # of reads (minimum) and frequency in bwa,samtools?
Below is my pipeline:
* bwa aln hg19.fa S375_R1.fastq > S375_1.sai
* bwa aln hg19.fa S375_R2.fastq > S375_2.sai
* bwa sampe hg19.fa S375_1.sai S375_2.sai S375_R1.fastq S375_R2.fastq > S375_NoIndex_L007.sam
* samtools view -bS S375_NoIndex_L007.sam > S375_NoIndex_L007.bam
* samtools sort S375_NoIndex_L007.bam S375_NoIndex_L007.sorted
* Marked duplicates using picard
* samtools index S375_NoIndex_L007.marked.bam
* samtools mpileup -uf hg19.fa S375_NoIndex_L007.marked.bam | bcftools view -bvcg - > S375_NoIndex_L007.raw.bcf
* bcftools view S375_NoIndex_L007.raw.bcf | vcfutils.pl varFilter -D1000 > S375_NoIndex_L007_var_d200.flt.vcf
Comment