I am completely new to samtools pileup and bcftools. I hope to stumble here on someone more knowledgeable than me !
So i have two alignments and a reference genome and here is how i use the samtools command:
samtools mpileup -d 200 -D -B -f ../hg19_bt2/hg19-bt2-index.fa -b bamlist.txt -u | bcftools view - -v -c -g > variants.vcf
By doing this i would like to call variants only and call genotypes as well.
The command is still running but when i am having a look at my result file i don't quite like what i see:
-> First of all, i see that each base per chr is shown. Haven't i asked to get the potential variant sites only ? (-v)
-> I don't get these 'X's at the ALT position. If there is no variant, why just not omit the position?
-> Then, i'm absolutely lost why my QUAL is 0. All the time, everywhere in this vcf file. What am i missing ?
-> Then, in the genotype fields, i get these comma separated PL which i read is (PL : the phred-scaled genotype likelihoods rounded to the closest integer (and otherwise defined precisely as the GL field) (Integers)) but why there are several and not just one for each of the two samples?
-> And then why the DP in INFO does not equal to the DP in the genotypes ? For example, in the third row, DP=5 in INFO but it is 1 and 3 respectively for each of the 2 samples ?
Thank you in advance guys !
So i have two alignments and a reference genome and here is how i use the samtools command:
samtools mpileup -d 200 -D -B -f ../hg19_bt2/hg19-bt2-index.fa -b bamlist.txt -u | bcftools view - -v -c -g > variants.vcf
By doing this i would like to call variants only and call genotypes as well.
The command is still running but when i am having a look at my result file i don't quite like what i see:
Code:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT gcancer.srtd.reh.ddup.bam normal.srtd.reh.ddup.bam chr1 10001 . T C,X 0 . DP=3;I16=2,0,0,1,75,2813,17,289,1,1,1,1,0,0,0,0 PL:DP 0,0,0,0,0,0:0 0,3,3,6,6,4:3 chr1 10002 . A X 0 . DP=4;I16=3,1,0,0,133,4657,0,0,2,2,0,0,3,3,0,0 PL:DP 0,0,0:0 0,12,13:4 chr1 10003 . A X 0 . DP=5;I16=4,0,0,0,151,5701,0,0,1,1,0,0,5,9,0,0 PL:DP 0,3,4:1 0,9,9:3 chr1 10004 . C X 0 . DP=5;I16=4,0,0,0,150,5626,0,0,1,1,0,0,9,23,0,0 PL:DP 0,3,4:1 0,9,9:3 chr1 10005 . C X 0 . DP=5;I16=4,1,0,0,180,6542,0,0,2,2,0,0,17,61,0,0 PL:DP 0,3,4:1 0,12,13:4
-> I don't get these 'X's at the ALT position. If there is no variant, why just not omit the position?
-> Then, i'm absolutely lost why my QUAL is 0. All the time, everywhere in this vcf file. What am i missing ?
-> Then, in the genotype fields, i get these comma separated PL which i read is (PL : the phred-scaled genotype likelihoods rounded to the closest integer (and otherwise defined precisely as the GL field) (Integers)) but why there are several and not just one for each of the two samples?
-> And then why the DP in INFO does not equal to the DP in the genotypes ? For example, in the third row, DP=5 in INFO but it is 1 and 3 respectively for each of the 2 samples ?
Thank you in advance guys !
Comment