Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
bam file and vcf wildly disagree Mark Bioinformatics 0 05-25-2016 06:02 PM
Best tool to view sequence alignment? cacti Bioinformatics 1 07-31-2015 04:01 PM
View column header in VCF file Claudius Bioinformatics 0 07-09-2015 08:38 AM
How to view only the variants that are present in multiple VCF files? ronton Bioinformatics 2 10-15-2014 04:18 PM
MEGAN view alignment problems OllyBolly Bioinformatics 2 07-08-2013 12:29 AM

Thread Tools
Old 09-28-2020, 06:53 AM   #1
Location: Raleigh, NC

Join Date: Nov 2008
Posts: 51
Default alignment view and vcf disagree

I have a pacbio CCS fastq file that I have aligned to a small (5037 bp) reference which is part of a sequenced BAC clone. I have aligned this data with multiple aligners (minimap2 and smalt), called variants with multiple variant callers (bbmap, freebayes, and gatk) and viewed the bam file with multiple viewers (igv and tablet). All produce similar results. I will focus here on minimap2 and callvariants (bbmap), as the problem likely stems from the variant caller and seqanswers is the preferred avenue for support for bbmap.
Here is what I did:

minimap2 -t 5 -ax asm20 reference.fa  sample.ccs.fastq > out.sam
samtools view -O BAM -o out.bam out.sam && samtools sort -o out.sort.bam out.bam && samtools index out.sort.bam ploidy=1 in=out.sort.bam ref=reference.fa vcf=out.vcf
A partial reading of callvariants log output:
1 of 3311 variants passed primary filters (0.0302%).
Type            Count   Rate    AD      Depth   AF      Score   Qual
Substitutions:  0       0.0%    0.0     0.0     0.000   0.0     0.0
Deletions:      1       100.0%  619.0   1000.0  0.619   23.0    34.0
Insertions:     0       0.0%    0.0     0.0     0.000   0.0     0.0
Variation Rate: 1/5037
Homozygous:     1       100.0%
The single called variant (a deletion at the site of a homopolymer) is reported in the vcf.
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  out.sort
reference       1401    .       Tc      T       23.66   PASS    SN=0;STA=1401;STO=1402;TYP=DEL;R1P=338;R1M=281;R2P=0;R2M=0;AD=619;DP=1000;MCOV=-1;PPC=0;AF=0.6190;RAF=0.6190;LS=2996497;MQS=37140;MQM=60;BQS=21455;BQM=41;EDS=956749;EDM=4467;IDS=617766;IDM=999;NVC=0;FLG=0;CED=601;HMP=6;SB=0.9997    GT:DP:AD:AF:RAF:NVC:FLG:SB:SC:PF        1:1000:619:0.6190:0.6190:0:0:0.9997:23.66:PASS
To my interpretation of this, a single "c" has been lost from the poly-C which starts at this position. Moreover this deletions occurs in 619 of 1000 reads covering this position.
Now the strange part. When I view the alignment (out.sort.bam) in IGV (or Tablet), I see at most 1-2% indels at our near this site while the rest of the sequence is largely free of indels. Why is there this large discrepancy between what is observed in an alignment viewer and what is called by the variant caller (i.e. there are many more variant reads called than are observed on the viewer?

Thank you for your help

Last edited by Mark; 09-28-2020 at 12:37 PM. Reason: Added [code] tags
Mark is offline   Reply With Quote
Old 09-29-2020, 09:02 AM   #2
Location: Raleigh, NC

Join Date: Nov 2008
Posts: 51

Turns out the problem was, not surprisingly, in IGV. By default IGV will not show one base pair deletions for what it perceives as PacBio sequence. I had unchecked the "hide indels" box in the 'alignment' preferences but there is a equivalent checkbox in the ' Third Gen' preferences I had overlooked. Uncheck it and viola.
Mark is offline   Reply With Quote

bbmap, freebayes, igv, minimap2, smalt

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 11:51 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO