I am unable to recover variants that I know exist when mapping to de novo assembled scaffolds, even though the BAMs (when checked in a genome browser (Artemis)) look great.
I am mapping using bwa-0.6.1 and SAMtools for variants (indels and SNPs) to a non-standard reference, some assembled scaffolds I made myself in Velvet + SSPACE.
Using standard methods, the VCFs end-up empty:
Bypassing BCF/vcfultils and going right to a VCF finds SNPs, but the reference is always "N" (even though I know these regions have actual sequence, not NNNNN connectors).
Example:
Looking at the BAM files with the scaffold reference in a browser and visualizing SNPs, I see good coverage and the SNPs are pretty obvious. Searching the no-BCF VCF files (example above) for positions in the reference where the sequence is good and I can see SNPs, the position is called as a SNP and the alternative allele is correct, but the reference is still N and Quality is 0.
Using the same pipeline with a standard reference recovers SNPs in the VCF just as expected.
I am mapping using bwa-0.6.1 and SAMtools for variants (indels and SNPs) to a non-standard reference, some assembled scaffolds I made myself in Velvet + SSPACE.
Using standard methods, the VCFs end-up empty:
Code:
samtools mpileup -Euf scaffolds.fasta myfile_sorted.bam | bcftools view -bvcg -> myfile.bcf bcftools view myfile.bcf | vcfutils.pl varFilter > myfile.vcf
Example:
Code:
##fileformat=VCFv4.1 ##samtoolsVersion=0.1.18 (r982:295) #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT scaf.1 74 . N A,X 0 . DP=111;I16=0,0,98,13,0,0,4267,165509,0,0,6133,352697,0,0,1284,21070;VDB=0.0808 PL 255,255,0,255,255,255 scaf.1 76 . N C,X 0 . DP=119;I16=0,0,105,14,0,0,4564,176946,0,0,6582,378738,0,0,1252,19002;VDB=0.0808 PL 255,255,0,255,255,255 scaf.1 77 . N A,X 0 . DP=129;I16=0,0,115,14,0,0,4881,186413,0,0,7182,414738,0,0,1242,18140;VDB=0.0801 PL 255,255,0,255,255,255 scaf.1 78 . N A,X 0 . DP=129;I16=0,0,115,13,0,0,4844,184614,0,0,7153,413897,0,0,1242,17464;VDB=0.0801 PL 255,255,0,255,255,255 scaf.1 79 . N A,X 0 . DP=138;I16=0,0,124,13,0,0,5152,195276,0,0,7662,443538,0,0,1246,16986;VDB=0.0801 PL 255,255,0,255,255,255 scaf.1 80 . N A,X 0 . DP=144;I16=0,0,131,13,0,0,5409,205033,0,0,8113,471497,0,0,1260,16742;VDB=0.0801 PL 255,255,0,255,255,255 scaf.1 81 . N A,X 0 . DP=143;I16=0,0,136,7,0,0,5411,206159,0,0,8239,484451,0,0,1236,16254;VDB=0.0801 PL 255,255,0,255,255,255 scaf.1 82 . N G,T,X 0 . DP=144;I16=0,0,136,8,0,0,5495,210839,0,0,8299,488051,0,0,1271,16481;VDB=0.0808 PL 255,255,0,255,255,255,255,255,255,255 scaf.1 83 . N A,X 0 . DP=146;I16=0,0,138,8,0,0,5587,214799,0,0,8419,495251,0,0,1307,16845;VDB=0.0808 PL 255,255,0,255,255,255 scaf.1 84 . N A,X 0 . DP=145;I16=0,0,137,8,0,0,5586,216040,0,0,8359,491651,0,0,1347,17347;VDB=0.0808 PL 255,255,0,255,255,255 scaf.1 85 . N G,X 0 . DP=151;I16=0,0,141,10,0,0,5817,225869,0,0,8688,510492,0,0,1390,18038;VDB=0.0808 PL 255,255,0,255,255,255 scaf.1 86 . N G,X 0 .
Using the same pipeline with a standard reference recovers SNPs in the VCF just as expected.
Comment