Hello,
After I run the samtools variant calling pipeline, I'm not getting the conventional .vcf output that I was expecting. Any advice or insight would be greatly appreciated.
Here is my command:
$ samtools mpileup -gf /scratch/$USER/Trinity.adj.fasta NFS_6525_adj_bowtie2.coordSorted.bam NFS_50254_adj_bowtie2.coordSorted.bam SFS_CC1_adj_bowtie2.coordSorted.bam SFS_25428_adj_bowtie2.coordSorted.bam | bcftools view -bvcg - > NFS_6525_50254_SFS_CC1_25428_adj_var.bcf
$ bcftools view NFS_6525_50254_SFS_CC1_25428_adj_var.bcf | vcfutils.pl varFilter -D 100 - > NFS_6525_50254_SFS_CC1_25428_adj_var.flt.vcf
I used VIM to view my NFS_6525_50254_SFS_CC1_25428_adj_var.flt.vcf file and it does not look like any examples that I've seen online. It's a combination of alpha-numeric characters and a list of my Trinity contigs. I've copied a sample of my .vcf file below:
(there is a full list of my Trinity contigs)
##contig=<ID=TRINITY_DN311005_c0_g1_i1,length=296,IDX=667121>
##contig=<ID=TRINITY_DN311035_c0_g1_i1,length=368,IDX=667122>
##contig=<ID=TRINITY_DN311050_c0_g1_i1,length=302,IDX=667123>
##contig=<ID=TRINITY_DN311084_c0_g1_i1,length=319,IDX=667124>
##contig=<ID=TRINITY_DN311076_c0_g1_i1,length=250,IDX=667125>
##contig=<ID=TRINITY_DN311012_c0_g1_i1,length=316,IDX=667126>
##contig=<ID=TRINITY_DN311030_c0_g1_i1,length=269,IDX=667127>
##contig=<ID=TRINITY_DN311065_c0_g1_i1,length=201,IDX=667128>
##contig=<ID=TRINITY_DN311069_c0_g1_i1,length=285,IDX=667129>
##contig=<ID=TRINITY_DN311034_c0_g1_i1,length=283,IDX=667130>
(some of the alphanumeric characters were changed to symbols when I copied and pasted here)
T7<*> õ @ ¸B P–E °B rE HB @œD % @ 1 , { .
H G7<*> õ @ ÚB èÓE °B rE HB @œD % @ 1 , "{ .
I G7<*> õ @ äB áE °B rE HB @œD % @ 1 , &{ .
J G7<*> õ @ äB áE °B rE HB @œD % @ 1 , &{ .
K G7<*> õ @ äB áE °B rE HB @œD % @ 1 , &{ .
L G7<*> õ @ äB áE °B rE HB @œD % @ 1 , &{ .
M G7<*> õ @ äB áE °B rE HB @œD % @ 1 , &{ .
N A7<*> õ @ àB ÝE °B rE HB @œD % @ 1 , ${ .
O G7<*> õ @ ÞB ÈÚE °B rE HB @œD % @ 1 , #{ .
P T7<*> õ @ ÀB šE °B rE HB @œD % @ 1 , #{ .
Q C7<*> õ @ ÜB ÏE °B rE HB @œD % @ 1 , &{ .
R T7<*> õ @ àB @ØE °B rE HB @œD % @ 1 , &{ .
S A7<*> õ @ äB áE °B rE HB
However, my output looks nothing like this, which is an example I found online:
##fileformat=VCFv4.0
##fileDate=20090805
##source=myImputationProgramV3.1
##reference=1000GenomesPilot-NCBI36
##phasing=partial
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">
##FILTER=<ID=q10,Description="Quality below 10">
##FILTER=<ID=s50,Description="Less than 50% of samples have data">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003
20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.
20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:3
20 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:4
20 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:2
20 1234567 microsat1 GTCT G,GTACT 50 PASS NS=3;DP=9;AA=G GT:GQP 0/1:35:4 0/2:17:2 1/1:40:3
Thank you.
After I run the samtools variant calling pipeline, I'm not getting the conventional .vcf output that I was expecting. Any advice or insight would be greatly appreciated.
Here is my command:
$ samtools mpileup -gf /scratch/$USER/Trinity.adj.fasta NFS_6525_adj_bowtie2.coordSorted.bam NFS_50254_adj_bowtie2.coordSorted.bam SFS_CC1_adj_bowtie2.coordSorted.bam SFS_25428_adj_bowtie2.coordSorted.bam | bcftools view -bvcg - > NFS_6525_50254_SFS_CC1_25428_adj_var.bcf
$ bcftools view NFS_6525_50254_SFS_CC1_25428_adj_var.bcf | vcfutils.pl varFilter -D 100 - > NFS_6525_50254_SFS_CC1_25428_adj_var.flt.vcf
I used VIM to view my NFS_6525_50254_SFS_CC1_25428_adj_var.flt.vcf file and it does not look like any examples that I've seen online. It's a combination of alpha-numeric characters and a list of my Trinity contigs. I've copied a sample of my .vcf file below:
(there is a full list of my Trinity contigs)
##contig=<ID=TRINITY_DN311005_c0_g1_i1,length=296,IDX=667121>
##contig=<ID=TRINITY_DN311035_c0_g1_i1,length=368,IDX=667122>
##contig=<ID=TRINITY_DN311050_c0_g1_i1,length=302,IDX=667123>
##contig=<ID=TRINITY_DN311084_c0_g1_i1,length=319,IDX=667124>
##contig=<ID=TRINITY_DN311076_c0_g1_i1,length=250,IDX=667125>
##contig=<ID=TRINITY_DN311012_c0_g1_i1,length=316,IDX=667126>
##contig=<ID=TRINITY_DN311030_c0_g1_i1,length=269,IDX=667127>
##contig=<ID=TRINITY_DN311065_c0_g1_i1,length=201,IDX=667128>
##contig=<ID=TRINITY_DN311069_c0_g1_i1,length=285,IDX=667129>
##contig=<ID=TRINITY_DN311034_c0_g1_i1,length=283,IDX=667130>
(some of the alphanumeric characters were changed to symbols when I copied and pasted here)
T7<*> õ @ ¸B P–E °B rE HB @œD % @ 1 , { .
H G7<*> õ @ ÚB èÓE °B rE HB @œD % @ 1 , "{ .
I G7<*> õ @ äB áE °B rE HB @œD % @ 1 , &{ .
J G7<*> õ @ äB áE °B rE HB @œD % @ 1 , &{ .
K G7<*> õ @ äB áE °B rE HB @œD % @ 1 , &{ .
L G7<*> õ @ äB áE °B rE HB @œD % @ 1 , &{ .
M G7<*> õ @ äB áE °B rE HB @œD % @ 1 , &{ .
N A7<*> õ @ àB ÝE °B rE HB @œD % @ 1 , ${ .
O G7<*> õ @ ÞB ÈÚE °B rE HB @œD % @ 1 , #{ .
P T7<*> õ @ ÀB šE °B rE HB @œD % @ 1 , #{ .
Q C7<*> õ @ ÜB ÏE °B rE HB @œD % @ 1 , &{ .
R T7<*> õ @ àB @ØE °B rE HB @œD % @ 1 , &{ .
S A7<*> õ @ äB áE °B rE HB
However, my output looks nothing like this, which is an example I found online:
##fileformat=VCFv4.0
##fileDate=20090805
##source=myImputationProgramV3.1
##reference=1000GenomesPilot-NCBI36
##phasing=partial
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">
##FILTER=<ID=q10,Description="Quality below 10">
##FILTER=<ID=s50,Description="Less than 50% of samples have data">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003
20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.
20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:3
20 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:4
20 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:2
20 1234567 microsat1 GTCT G,GTACT 50 PASS NS=3;DP=9;AA=G GT:GQP 0/1:35:4 0/2:17:2 1/1:40:3
Thank you.
Comment