![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Discrepancy in DP value in vcf file and reads visualization in IGV | vaibhavvsk | Bioinformatics | 4 | 12-30-2015 10:51 AM |
Using the IGV for vcf files, quality scores | jrhyno | Bioinformatics | 3 | 11-26-2014 04:09 AM |
Viewing VCF v4.2 files in IGV | ruggedtextile | Bioinformatics | 0 | 07-30-2014 08:03 AM |
A mismatch between BAM file and final SNP sequence | mcarmel | Bioinformatics | 0 | 06-01-2014 07:02 AM |
Errors loading particular vcf files into IGV but not others | NBlackburn | Bioinformatics | 2 | 01-17-2013 03:12 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Turkey Join Date: Nov 2016
Posts: 1
|
![]()
I use GATK 4.0 for the variant calling pipeline. my steps involve MarkDuplicates, BaseRecalibration, ApplyBaseRecalibration and HaplotypeCaller. When I check in a loci there is no mutation in the original BAM file in IGV, but there is a mutation in final VCF and when I check the bamout of the HaplotypeCaller there seems to be a mutation. Then I tried Sanger sequencing and see that there is actually no mutation. So the original Bam file is the right one and bamout is the wrong mutation.
So how could I overcome this problem? This is a serious issue and occurs several times. Thanks in advance. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Cambridge Join Date: Sep 2010
Posts: 109
|
![]()
Since the GATK HaplotypeCaller function involves reassembly of the reads, it may pull in the reads from the different areas for the repetitive loci.
There may be multiple causes for the problem: repeats, SV (Structural Variants), allelic variation and mapping/assembly artifacts. If doing a QC with Sanger make sure your primers are NOT allele specific - otherwise they would just amplify the reference allele... If your dataset is PCR-free and has low GC bias - try looking for any signs of CNV (Copy Number Variation). PS: Ideally (if funding is not limited) I would also try to sequence the affected samples using the 3-rd generation sequencing technology (ONT/PacBio) and assemble them de novo . or at least get some 10X allele phasing data for the samples... Given the existing dataset and no new funding/experiments: 1. determine the areas affected by the repeats and mask them for time being. 2. Try to trace the origin of the reads in the affected area of the bam file. If they are unmapped in the original bam file, and the coverage is half of what it is in the final bam file - probably your other allele is too divergent from the reference to be mapped successfully by your mapper. You may also try making a custom reference (include the divergent region version as a separate sequence and retry mapping). |
![]() |
![]() |
![]() |
Tags |
bam file, bamout, gatk haplotypecaller, vcf file |
Thread Tools | |
|
|