Hi all,
I'm very new to this forum, and haven't looked around much to see if this has been answered already.
But is there an easy way to extract the genotype (consensus) for a sample (across all reads) at a very specific location (not a region), given its BAM file? I know this could be very simple & possible with samtools, but my search/research hasn't been great.
Example: The base at chr1 pos 45800 for that sample from its BAM file?
And also based on certain coverage depth & quality (Like say similar to Q30 & DP3 in VCF language).
To give you the context, the problem is we have analyzed SNPs for a large set of samples, using bowtie/samtools/vcftools, reporting the SNP calls in VCF format. Obviously, this captures only variants (wrt reference) individually. But, when comparing these samples across the population to define haplotype groups, using the VCF files... looking at the overlapping SNP loci, we do not know if a missing value for a sample means the genotype is same as reference or there hasn't been any coverage at all.
Hence now, I retroactively want to fetch these missing bases for each of the samples at all the SNP loci called, for more accurate haplotype groupings.
I only have the sorted BAM file (with index) and VCF file for each sample & reference of course. Hope someone can help.
Thanks so much in advance
I'm very new to this forum, and haven't looked around much to see if this has been answered already.
But is there an easy way to extract the genotype (consensus) for a sample (across all reads) at a very specific location (not a region), given its BAM file? I know this could be very simple & possible with samtools, but my search/research hasn't been great.
Example: The base at chr1 pos 45800 for that sample from its BAM file?
And also based on certain coverage depth & quality (Like say similar to Q30 & DP3 in VCF language).
To give you the context, the problem is we have analyzed SNPs for a large set of samples, using bowtie/samtools/vcftools, reporting the SNP calls in VCF format. Obviously, this captures only variants (wrt reference) individually. But, when comparing these samples across the population to define haplotype groups, using the VCF files... looking at the overlapping SNP loci, we do not know if a missing value for a sample means the genotype is same as reference or there hasn't been any coverage at all.
Hence now, I retroactively want to fetch these missing bases for each of the samples at all the SNP loci called, for more accurate haplotype groupings.
I only have the sorted BAM file (with index) and VCF file for each sample & reference of course. Hope someone can help.
Thanks so much in advance
Comment