Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • hard filtering with allele balance

    I am looking for some advice on hard filtering based on allele balance (either over the entire cohort or by sample).

    Sorry this post my be long but I wanted to give a little background. I have 94 exomes that have been put through the GATK pipeline with the most current best practices. I did VQSR and filtered for PASS variants. I am also setting my minimum depth for each genotype at 20. After doing association analysis in PLINK, it is obvious that I have false heterozygous genotypes and that is giving me significant associations that shouldn't be there. Sanger sequencing these significant SNP's is giving mostly ref/ref where GATK is calling ref/alt. When I look at the vcf of many of these sites, I can see that they just look "wrong". I don't know how to explain it any other way than that. Depth might be high but you have cases where the numbers are like 150/5 and it calls it a het. Sometimes there are no alt alleles and it calls it as a het. I realize that GATK hapcaller is outputting the most likely genotype based on a model and not just looking at raw counts. It was suggested to me that maybe I consider filtering by the allele balance or BAF. In GATK I can annotate my VCF with the allele balance across all samples and I can annotate the allele balance on a per sample basis. Now that I have done that, I am unsure what threshold to set (if any). I want to be able to filter out these "bad genotypes" before running association analysis. At this point would rather have no significant associations versus errors.

    I was just hoping for some community feedback on this. Thanks for your help!

Latest Articles

Collapse

  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM
  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
22 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
24 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
20 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
52 views
0 likes
Last Post seqadmin  
Working...
X