I've tried this a few times and I'm getting very occassional / odd behavior in the output from mpileup / bcftools / vcfutils varFilter. The output reports an indel, but upon examination of the reads (with IGV), a single read contains the indel, while all the others do not, yet the vcf output file for my data reports:
Chr1-gr2-PU5 662026 . ATGGAATGGTGGATTGG ATGGAATGGTGGATTGGAATGGTGGATTGG 99 . INDEL;DP=398;AF1=1;CI95=1,1;DP4=0,0,173,125;MQ=60 PL:GT:GQ 255,255,0:1/1:99
One problem is that this output appears to state that there are 173, and 125 reads supporting the indel, when in fact there is 1. Other than manual inspection, I can't tell this apart from other true indels.
I'm curious if this is due to a parameter in the command line? or if this is some kind of bug? (probably the former).
The versions and command line parameters are as follows:
This is a bacterial genome resequencing project with read depth > 300.
Paired-end, 100 nt reads were aligned with default BWA parameters to a reference genome (BWA version 0.5.9-r16. (The genome contains some IUPAC codes, but none in the region of the problem I'm reporting).
The resulting (sorted, indexed) bam file was used for input into samtools mpileup (0.1.12a)
>samtools mpileup -uf ref.fasta aln.sorted.bam |bcftools view -bvcg - > aln.raw.bcf
>bcftools view aln.raw.bcf | vcfutils.pl varFilter > aln.flt.vcf
Thanks for any help !
Jim
Chr1-gr2-PU5 662026 . ATGGAATGGTGGATTGG ATGGAATGGTGGATTGGAATGGTGGATTGG 99 . INDEL;DP=398;AF1=1;CI95=1,1;DP4=0,0,173,125;MQ=60 PL:GT:GQ 255,255,0:1/1:99
One problem is that this output appears to state that there are 173, and 125 reads supporting the indel, when in fact there is 1. Other than manual inspection, I can't tell this apart from other true indels.
I'm curious if this is due to a parameter in the command line? or if this is some kind of bug? (probably the former).
The versions and command line parameters are as follows:
This is a bacterial genome resequencing project with read depth > 300.
Paired-end, 100 nt reads were aligned with default BWA parameters to a reference genome (BWA version 0.5.9-r16. (The genome contains some IUPAC codes, but none in the region of the problem I'm reporting).
The resulting (sorted, indexed) bam file was used for input into samtools mpileup (0.1.12a)
>samtools mpileup -uf ref.fasta aln.sorted.bam |bcftools view -bvcg - > aln.raw.bcf
>bcftools view aln.raw.bcf | vcfutils.pl varFilter > aln.flt.vcf
Thanks for any help !
Jim
Comment