Go Back   SEQanswers > Literature Watch

Similar Threads
Thread Thread Starter Forum Replies Last Post
PubMed: Efficient storage of high throughput DNA sequencing data using reference-base Newsbot! Literature Watch 0 09-07-2011 03:00 AM
PubMed: Accurate and exact CNV identification from targeted high-throughput sequence Newsbot! Literature Watch 0 04-15-2011 12:30 AM
PubMed: GeneScreen: a program for high-throughput mutation detection in DNA sequence Newsbot! Literature Watch 0 11-03-2010 11:20 AM
PubMed: SeqTrim: a high-throughput pipeline for preprocessing any type of sequence re Newsbot! Literature Watch 1 08-02-2010 12:48 AM
PubMed: High-throughput sequence-based epigenomic analysis of Alu repeats in human ce Newsbot! Literature Watch 0 05-22-2009 06:00 AM

Thread Tools
Old 05-22-2011, 06:23 AM   #1
RSS Posting Maniac

Join Date: Feb 2008
Posts: 1,443
Default PubMed: Inference of Site Frequency Spectra From High-throughput Sequence Data: Quant

Syndicated from PubMed RSS Feeds

Inference of Site Frequency Spectra From High-throughput Sequence Data: Quantification of Selection on Nonsynonymous and Synonymous Sites in Humans.

Genetics. 2011 May 19;

Authors: Keightley PD, Halligan DL

Sequencing errors and random sampling of nucleotide types among sequencing reads at heterozygous sites present challenges for accurate, unbiased inference of single nucleotide polymorphism genotypes from high-throughput sequence data. Here, we develop a maximum likelihood approach to estimate the frequency distribution of the number of alleles in a sample of individuals (the site frequency spectrum) using high-throughput sequence data. Our method assumes binomial sampling of nucleotide types in heterozygotes and random sequencing error. By simulations, we show that close to unbiased estimates of the the site frequency spectrum can be obtained if the error rate per base read does not exceed the population nucleotide diversity. We also show that these estimates are reasonably robust if errors are non-random. We then apply the method to infer site frequency spectra for 0-fold degenerate, 4-fold degenerate and intronic sites of protein-coding genes using the low coverage human sequence data produced by the 1000 Genomes Project phase one pilot. By fitting a model to the inferred site frequency spectra that estimates parameters of the distribution of fitness effects of new mutations, we find evidence for significant natural selection operating on 4-fold sites. We also find that a model with variable effects of mutations at synonymous sites fits the data significantly better than a model with equal mutational effects. Under the variable effects model, we infer that 11% of synonymous mutations are subject to strong purifying selection.

PMID: 21596896 [PubMed - as supplied by publisher]

Newsbot! is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 12:11 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO