Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Haploid inspiration

    Hi all

    I'm looking for some inspiration/guidance on processing NGS data from haploid individuals. In a nutshell, we've sequenced at ~ 6X coverage a number of individuals (~30) from a single population. Following mapping, processing, and SNP calling, each individual has ~ 600K SNPs.

    At this depth of coverage, ~ 70% of the genome is callable for each individual, and it stands to reason that individuals will not have identical genome coverage. And so in order to generate a single VCF for the population, without sacrificing an enormous number of SNPs, it would make sense to impute missing genotypes. To this end I have experimented with Beagle (v4) on a single chromosome, and it appeared to do the job - although upon closer examination the results indicated ~ 25% heteroygosity in each haploid individual. In addition, all samples were on average 85% idential by state, including 2 individuals which had been sequenced twice and should present a reliable control for the methodology.

    Is anyone with experience in processing NGS data for haploids able to offer any insights/suggestions?

    D

  • #2
    What exactly is your question?


    Perhaps "Why are my genotypes showing to have some heterozygosity and why are replicates not identical?"

    The answer would simply be artifacts in your data,either from lab protocols, contamination, or sequencing. 25% seems high, but if you are 110% that these genomes are haploid then what is stoping you from throwing out the minor allele (i.e. error) in the heterozygous individuals?

    Another possibility is that you have duplicate regions aligning to the same part of your reference. Reference genomes are notorious is missing copy number variation or closely related paralogues, and the only way (as far as Im aware) to detect them is though various programs that apply certain depth/SNP algorithms to determine if its likely your genotypes contain errors or CNV. So in a nutshell, you may have particular genes that have been duplicated along the same chromosome. Something to think about.

    Comment


    • #3
      Hi

      Sorry for not being clear. All of the SNPs fed into Beagle have been filtered to ensure that they only include homozygous SNPs. So the resulting heterozygosity has been introduced during imputation by Beagle, and there's probably a perfectly reasonable statistical explanation for this - for instance the genotype probabilities in the following SNP are 0.112,0.444,0.444 and Beagle assigns the genotype as 0|1 to some individuals but 1|1 to others even though the probabilities are the same.

      Code:
      GroupUn1430	532	.	G	A	.	PASS	AR2=0;DR2=0.03;AF=0.677	GT:DS:GP	1|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|0:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	1|0:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|0:1.333:0.112,0.444,0.444	1|0:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	1|0:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|1:2:0,0,1
      My thread is more of a general request for a discussion on processing VCF files from haploid individuals, for downstream population genetics analyses. One thought was to use just the callable portion of the genome in all individuals, however this reduces the SNPs across all samples from ~ 4 M to < 200 K. And as demonstrated above, imputation may not be a reliable solution.

      I guess I'm looking for some reliable instruction on merging VCF files from haploid individuals, imputing missing genotypes where possible.

      Comment


      • #4
        On reflection, the Beagle imputation works on haplotypes so that can explain the discordance between the genotpye probabilities and genotype called at an individual SNP.

        I think the simplest solution might be just to impute missing genotypes with the most common allele for each SNP in a given population...

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        17 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Working...
        X