I have used samtools/bcftools to generate diploid consensus sequences. The final output is a fastq file which only contains one sequence, along with quality information.
My question is: how can one sequence represent a diploid genome? How does encode when a site is heterozygous? I was planning on manipulating the data (concatenating, subsampling etc), but I can't do that until I understand how it's stored.
Cheers!
My question is: how can one sequence represent a diploid genome? How does encode when a site is heterozygous? I was planning on manipulating the data (concatenating, subsampling etc), but I can't do that until I understand how it's stored.
Cheers!