Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Separate multi-allelic VCF lines to multiple rows d_chall Bioinformatics 9 12-12-2014 03:31 AM
fastPhase multi allelic loci etui Bioinformatics 0 10-29-2014 06:08 AM
VCF to BED, problems with multi allelic calls mamons Bioinformatics 0 09-11-2013 03:45 AM
Calling tri-allelic SNPs using samtools (or similar) dagarfield Bioinformatics 5 03-24-2011 02:44 PM

Thread Tools
Old 01-12-2016, 05:29 PM   #1
Junior Member
Location: Cambridge,MA

Join Date: Jan 2016
Posts: 1
Default How many multi-allelic SNPs to expect from large human cohorts?

Hi all:
I'm assessing a callset that aggregates a couple of thousand human germline samples. I'm concerned that the calls have too many multi-allelic SNP's (something like 15% of SNPs are multi-allelic) and that these sites are enriched for false positives.
Now, I get it that at some point the typical infinite sites model will break down and mutations will start happening in sites that are already polymorphic, but for example ExAC has only like 7% multi-allelic sites and they have 60k+samples, whereas I have less than 1/20 of that.
Are there ways to assess quality of these sites? Are there any results (empirical/theoretical) about how many multiallelic SNPs to expect as a function of sample number? Is Ts/Tv meaningful at multiallelic sites? (and then which allele should it be computed on)?

Last edited by gda; 01-12-2016 at 06:00 PM.
gda is offline   Reply With Quote

multi-allelic snp

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 12:31 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO