Hi,
We have performed SNP calling for 8 individuals with multi-sample approach and also individual SNP calling using GATK. Below are the few output lines from the outputs.
Multisample SNP calling output:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT BC210 BC212 BC214 BC217 BC219 BC226 BC229 BC230
chr1 356 . G T 179.43 . AC=7;AF=0.583;AN=12;BaseQRankSum=-0.787;DP=1455;Dels=0.00;FS=0.000;HaplotypeScore=1.4705;MLEAC=8;MLEAF=0.667;MQ=2.17;MQ0=1408;MQRankSum=-1.740;QD=0.23;ReadPosRankSum=2.837 GT:ADP:GQ:PL 0/1:62,121:184:17:19,0,17
./. 0/0:81,94:176:3:0,3,24 1/1:45,138:181:6:54,6,0 1/1:74,108:182:9:84,9,0 ./.
1/1:46,141:183:6:61,6,0 0/0:55,109:165:3:0,3,23
chr1 389 . T G 178.50 . AC=6;AF=0.375;AN=16;BaseQRankSum=-0.775;DP=202;Dels=0.00;FS=26.344;HaplotypeScore=0.7725;MLEAC=6;MLEAF=0.375;MQ=12.37;MQ0=75;MQRankSum=-4.220;QD=1.08;ReadPosRankSum=0.073 GT:ADP:GQ:PL 0/1:21,21:42:35:35,0,116
0/0:11,7:18:9:0,9,68 0/1:13,5:18:18:18,0,58 0/1:8,14:21:39:39,0,59 0/1:16,16:32:59:66,0,59 0/1:14,7:20:43:43,0,85 0/1:10,20:29:19:21,0,19 0/0:9,9:18:6:0,6,44
Output from Individual SNP calling but merged into a single VCF file:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT BC210 BC212 BC214 BC217 BC219 BC226 BC229 BC230
chr1 356 . G T 45.01 . AC=4;AF=1.00;AN=4;DP=388;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ0=190;MQ=2.89;QD=0.17;SF=1,5 GT:GQP:PL:AD . . . . 1/1:9:182:84,9,0:84,100 . 1/1:6:188:61,6,0:49,144 .
chr1 389 . T G 37.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-0.189;DP=33;Dels=0.00;FS=3.979;HaplotypeScore=2.9800;MLEAC=1;MLEAF=0.500;MQ0=12;MQ=12.44;MQRankSum=-1.474;QD=1.14;ReadPosRankSum=-0.794;SF=5 GT:GQP:PL:AD . .
. . 0/1:59:32:66,0,59:16,16 . . .
As we can see that multisample SNP calling output has genotypes for most of the samples whereas the output from single sample SNP calling shows "." for some individuals.
How should we interpret both the outputs. Is the output from single sample SNP calling more specific than multisample SNP calling or is it viceversa?
I would like to use these genotypes to convert into ped format and perform linkage analysis. Could someone suggest which of these outputs should be used to do so.
Thanks in advance!!
We have performed SNP calling for 8 individuals with multi-sample approach and also individual SNP calling using GATK. Below are the few output lines from the outputs.
Multisample SNP calling output:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT BC210 BC212 BC214 BC217 BC219 BC226 BC229 BC230
chr1 356 . G T 179.43 . AC=7;AF=0.583;AN=12;BaseQRankSum=-0.787;DP=1455;Dels=0.00;FS=0.000;HaplotypeScore=1.4705;MLEAC=8;MLEAF=0.667;MQ=2.17;MQ0=1408;MQRankSum=-1.740;QD=0.23;ReadPosRankSum=2.837 GT:ADP:GQ:PL 0/1:62,121:184:17:19,0,17
./. 0/0:81,94:176:3:0,3,24 1/1:45,138:181:6:54,6,0 1/1:74,108:182:9:84,9,0 ./.
1/1:46,141:183:6:61,6,0 0/0:55,109:165:3:0,3,23
chr1 389 . T G 178.50 . AC=6;AF=0.375;AN=16;BaseQRankSum=-0.775;DP=202;Dels=0.00;FS=26.344;HaplotypeScore=0.7725;MLEAC=6;MLEAF=0.375;MQ=12.37;MQ0=75;MQRankSum=-4.220;QD=1.08;ReadPosRankSum=0.073 GT:ADP:GQ:PL 0/1:21,21:42:35:35,0,116
0/0:11,7:18:9:0,9,68 0/1:13,5:18:18:18,0,58 0/1:8,14:21:39:39,0,59 0/1:16,16:32:59:66,0,59 0/1:14,7:20:43:43,0,85 0/1:10,20:29:19:21,0,19 0/0:9,9:18:6:0,6,44
Output from Individual SNP calling but merged into a single VCF file:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT BC210 BC212 BC214 BC217 BC219 BC226 BC229 BC230
chr1 356 . G T 45.01 . AC=4;AF=1.00;AN=4;DP=388;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ0=190;MQ=2.89;QD=0.17;SF=1,5 GT:GQP:PL:AD . . . . 1/1:9:182:84,9,0:84,100 . 1/1:6:188:61,6,0:49,144 .
chr1 389 . T G 37.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-0.189;DP=33;Dels=0.00;FS=3.979;HaplotypeScore=2.9800;MLEAC=1;MLEAF=0.500;MQ0=12;MQ=12.44;MQRankSum=-1.474;QD=1.14;ReadPosRankSum=-0.794;SF=5 GT:GQP:PL:AD . .
. . 0/1:59:32:66,0,59:16,16 . . .
As we can see that multisample SNP calling output has genotypes for most of the samples whereas the output from single sample SNP calling shows "." for some individuals.
How should we interpret both the outputs. Is the output from single sample SNP calling more specific than multisample SNP calling or is it viceversa?
I would like to use these genotypes to convert into ped format and perform linkage analysis. Could someone suggest which of these outputs should be used to do so.
Thanks in advance!!