Dear All,
I have three BAM files, want to cal variant all together. Here is a command I used:
#java -Xmx4g -jar ~/GenomeAnalysisTK.jar -T UnifiedGenotyper -glm BOTH \
-R ~/hg19.GATK.fasta
-I Test1.recal.bam -I Test2.recal.bam -I Test3.recal.bam
-L /pipeline/resources/hg19.trueseq_exome_60mb_interval.bed
--dbsnp dbSNP.vcf
-stand_call_conf 50.0 -stand_emit_conf 10.0 -dcov 100 -l INFO -A AlleleBalance -A DepthOfCoverage -A FisherStrand
-log testRun2/rawvariants.log -o testRun2/rawvariants.vcf &
===== It work! === Here is part of the rawvariants.vcf ===
...
##contig=<ID=Y,length=59373566,assembly=hg19>
##reference=./annovar/hg19.GATK.fasta
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Homo_sapiens
1 14522 . G A 205.95 PASS AB=0.848;AC=1;AF=0.50;AN=2;BaseQRankSum=3.916;DP=125;Dels=0.00;FS=35.803;HRun=0;HaplotypeScore=1.6281;MQ=60.68;MQ0=0;MQRankSum=-1.207;QD=1.65;ReadPosRankSum=3.462 GT:AD P:GQ:PL 0/1:106,19:125:99:236,0,2845
1 14542 . A G 658.84 PASS AB=0.732;AC=1;AF=0.50;AN=2;BaseQRankSum=3.814;DP=127;Dels=0.00;FS=81.180;HRun=1;HaplotypeScore=4.7173;MQ=51.24;MQ0=0;MQRankSum=-2.094;QD=5.19;ReadPosRankSum=3.912 GT:ADP:GQ:PL 0/1:93,34:127:99:689,0,2561
....
There problem here is it doesn't report all three samples in every position (E.g 0/1:93,34:127:99:689,0,2561 on the last column)
I like to generate a vcf like this one, which has test1, test2 test3 on last three columns
....
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT test1 test2 test3
20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,
20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:3
20 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:4
20 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:2
Thanks,
Ng
I have three BAM files, want to cal variant all together. Here is a command I used:
#java -Xmx4g -jar ~/GenomeAnalysisTK.jar -T UnifiedGenotyper -glm BOTH \
-R ~/hg19.GATK.fasta
-I Test1.recal.bam -I Test2.recal.bam -I Test3.recal.bam
-L /pipeline/resources/hg19.trueseq_exome_60mb_interval.bed
--dbsnp dbSNP.vcf
-stand_call_conf 50.0 -stand_emit_conf 10.0 -dcov 100 -l INFO -A AlleleBalance -A DepthOfCoverage -A FisherStrand
-log testRun2/rawvariants.log -o testRun2/rawvariants.vcf &
===== It work! === Here is part of the rawvariants.vcf ===
...
##contig=<ID=Y,length=59373566,assembly=hg19>
##reference=./annovar/hg19.GATK.fasta
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Homo_sapiens
1 14522 . G A 205.95 PASS AB=0.848;AC=1;AF=0.50;AN=2;BaseQRankSum=3.916;DP=125;Dels=0.00;FS=35.803;HRun=0;HaplotypeScore=1.6281;MQ=60.68;MQ0=0;MQRankSum=-1.207;QD=1.65;ReadPosRankSum=3.462 GT:AD P:GQ:PL 0/1:106,19:125:99:236,0,2845
1 14542 . A G 658.84 PASS AB=0.732;AC=1;AF=0.50;AN=2;BaseQRankSum=3.814;DP=127;Dels=0.00;FS=81.180;HRun=1;HaplotypeScore=4.7173;MQ=51.24;MQ0=0;MQRankSum=-2.094;QD=5.19;ReadPosRankSum=3.912 GT:ADP:GQ:PL 0/1:93,34:127:99:689,0,2561
....
There problem here is it doesn't report all three samples in every position (E.g 0/1:93,34:127:99:689,0,2561 on the last column)
I like to generate a vcf like this one, which has test1, test2 test3 on last three columns
....
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT test1 test2 test3
20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,
20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:3
20 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:4
20 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:2
Thanks,
Ng
Comment