Dear Seqanswers Community,
First, I have to apologize because I have sent a similar question to biostars. However, I haven't received yet a satisfying answers.
My task is to merge many (about 200) vcf files produced by IonTorrent. The suite version our lab uses is the 5.0.2. The version 5.0.2 produces both vcf and genome.vcf.
question 1: should I try to merge the vcf files or try to merge the genome.vcf (gvcf?). It seems to me that merging vcf files will not be correct because vcf files report only the polymorphic information per individual. Thus, if the reference is A, ind1 is A/A and ind2: A/C, I will not get any information for ind1 at the specific position. I'll not know then if this is because of low coverage/unavailable information or homozygosity for the reference. Therefore, in my opinion, gvcf's are the only appropriate files. Is it correct?
question 2: I tried to merge the gvcf files produced by v5.0.2 of Torrent Suite, using the bcftools. However, I got an error message from bcftools reporting the the REFERENCE nucleotide is different in two files that I'm trying to merge.... very weird. I then checked the files produced by Torrent Suite and I see the following weird thing:
Obviously, Torrent Suite is buggy.
The question is: what should I use to merge either vcf or gvcf files when I'm working with torrent data?
P.S. people suggest that I should not use gatk for SNP calling with torrent, because of the torrent technology. They advice to use only the Torrent Suite for SNP calling... (but how do I merge them then?).
thanks a lot
pavlos
First, I have to apologize because I have sent a similar question to biostars. However, I haven't received yet a satisfying answers.
My task is to merge many (about 200) vcf files produced by IonTorrent. The suite version our lab uses is the 5.0.2. The version 5.0.2 produces both vcf and genome.vcf.
question 1: should I try to merge the vcf files or try to merge the genome.vcf (gvcf?). It seems to me that merging vcf files will not be correct because vcf files report only the polymorphic information per individual. Thus, if the reference is A, ind1 is A/A and ind2: A/C, I will not get any information for ind1 at the specific position. I'll not know then if this is because of low coverage/unavailable information or homozygosity for the reference. Therefore, in my opinion, gvcf's are the only appropriate files. Is it correct?
question 2: I tried to merge the gvcf files produced by v5.0.2 of Torrent Suite, using the bcftools. However, I got an error message from bcftools reporting the the REFERENCE nucleotide is different in two files that I'm trying to merge.... very weird. I then checked the files produced by Torrent Suite and I see the following weird thing:
Code:
chr1 32052145 . G A 183.764 PASS AF=0.571429;AO=36;DP=63;FAO=36;FDP=63;FR=.;FRO=27;FSAF=16;FSAR=20;FSRF=19;FSRR=8 ;FWDB=-0.0607612;FXX=0;HRUN=1;LEN=1;MLLD=77.9064;OALT=A;OID=.;OMAPALT=A;OPOS=320 52145;OREF=G;PB=0.5;PBP=1;QD=11.6676;RBI=0.0674491;REFB=-0.0729453;REVB=0.029282 4;RO=27;SAF=16;SAR=20;SRF=19;SRR=8;SSEN=0;SSEP=0;SSSB=-0.19967;STB=0.609697;STBP =0.053;TYPE=snp;VARB=0.0471623 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR 0/1:103:63:63:27:27:36:36:0.571429:20:16:19:8:20:16:19:8 chr1 32052145 . T . 0 PASS DP=185;END=32052348;MAX_DP=206;MIN_DP=173 GT:DP:MIN_DP:MAX_DP 0/0:185:173:206
The question is: what should I use to merge either vcf or gvcf files when I'm working with torrent data?
P.S. people suggest that I should not use gatk for SNP calling with torrent, because of the torrent technology. They advice to use only the Torrent Suite for SNP calling... (but how do I merge them then?).
thanks a lot
pavlos
Comment