SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Discrepancy in DP value in vcf file and reads visualization in IGV vaibhavvsk Bioinformatics 4 12-30-2015 11:51 AM
Using the IGV for vcf files, quality scores jrhyno Bioinformatics 3 11-26-2014 05:09 AM
Viewing VCF v4.2 files in IGV ruggedtextile Bioinformatics 0 07-30-2014 09:03 AM
A mismatch between BAM file and final SNP sequence mcarmel Bioinformatics 0 06-01-2014 08:02 AM
Errors loading particular vcf files into IGV but not others NBlackburn Bioinformatics 2 01-17-2013 04:12 PM

Reply
 
Thread Tools
Old 04-11-2018, 03:37 AM   #1
mhmtgenc
Junior Member
 
Location: Turkey

Join Date: Nov 2016
Posts: 4
Default No mutations in BAM (IGV) but a mutation in final VCF?

I use GATK 4.0 for the variant calling pipeline. my steps involve MarkDuplicates, BaseRecalibration, ApplyBaseRecalibration and HaplotypeCaller. When I check in a loci there is no mutation in the original BAM file in IGV, but there is a mutation in final VCF and when I check the bamout of the HaplotypeCaller there seems to be a mutation. Then I tried Sanger sequencing and see that there is actually no mutation. So the original Bam file is the right one and bamout is the wrong mutation.

So how could I overcome this problem? This is a serious issue and occurs several times. Thanks in advance.
mhmtgenc is offline   Reply With Quote
Old 04-11-2018, 05:42 AM   #2
Markiyan
Senior Member
 
Location: Cambridge

Join Date: Sep 2010
Posts: 115
Lightbulb Any repeats in the loci?

Since the GATK HaplotypeCaller function involves reassembly of the reads, it may pull in the reads from the different areas for the repetitive loci.

There may be multiple causes for the problem: repeats, SV (Structural Variants), allelic variation and mapping/assembly artifacts.

If doing a QC with Sanger make sure your primers are NOT allele specific - otherwise they would just amplify the reference allele...

If your dataset is PCR-free and has low GC bias - try looking for any signs of CNV (Copy Number Variation).

PS: Ideally (if funding is not limited) I would also try to sequence the affected samples using the 3-rd generation sequencing technology (ONT/PacBio) and assemble them de novo .

or at least get some 10X allele phasing data for the samples...

Given the existing dataset and no new funding/experiments:

1. determine the areas affected by the repeats and mask them for time being.
2. Try to trace the origin of the reads in the affected area of the bam file. If they are unmapped in the original bam file, and the coverage is half of what it is in the final bam file - probably your other allele is too divergent from the reference to be mapped successfully by your mapper.

You may also try making a custom reference (include the divergent region version as a separate sequence and retry mapping).
Markiyan is offline   Reply With Quote
Reply

Tags
bam file, bamout, gatk haplotypecaller, vcf file

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:00 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO