SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
explain plz depth for formula or theory? jeonhj Bioinformatics 1 01-20-2015 08:29 PM
Do you need to learn or refresh your molecular biology theory and lab bench skills? ScottC Events / Conferences 0 10-14-2014 05:39 PM
Can any one help with Lander waterman equations? D.Alshehri Bioinformatics 2 05-09-2013 03:55 AM
Smith Waterman for 3 Sequences julx5 Bioinformatics 0 03-05-2013 09:20 AM
Apllications of Graph Theory in Next Generation Sequencing martin_313 General 3 02-08-2012 12:40 PM

Reply
 
Thread Tools
Old 03-02-2015, 12:39 AM   #1
netpumber
Member
 
Location: GER

Join Date: May 2014
Posts: 21
Default Lander-Waterman theory explanation

Hi.

Could someone post any reference or describe the logic of the that theory (even with images). I read the text on wikipedia but couldn't understand much because i'm now beginning with NGS.

Is this a method to estimate the quality of your library and the size of its fragments ? When do we use calculators for that ?

Thank you.

Last edited by netpumber; 03-02-2015 at 12:41 AM.
netpumber is offline   Reply With Quote
Old 03-02-2015, 03:28 AM   #2
dariober
Senior Member
 
Location: Cambridge, UK

Join Date: May 2010
Posts: 311
Default

Quote:
Originally Posted by netpumber View Post
Is this a method to estimate the quality of your library and the size of its fragments ? When do we use calculators for that ?
Assuming reads map randomly to the reference genome, you can plan your experiments and answer questions like: How much coverage do I expect if I sequence so many reads? How many genomic positions can I expect to be covered by at least n reads (useful for SNP detection)?

Say your genome is of size G, you sequence N reads of length L, this R code answers the two questions above (given all the assumptions required):

Code:
L<- 100
G<- 3*1e9
N<- 100*1e6

## Expected coverage
C<- (L*N)/G

## % genome covered with depth...
depth<- 0:10
exp_cov<- dpois(depth, lambda= C) * 100
ggdepth<- qplot(x= depth, y= exp_cov, xlab= 'Depth', ylab= '% genome', main= 'Amount of genome\ncovered at depth n') + geom_line()
ggCum<- qplot(x= rev(depth), y= cumsum(rev(exp_cov)), xlab= 'Depth', ylab= '% genome', main= 'Amount of genome \ncovered at least with depth n') + geom_line()


This is just a use case example...
dariober is offline   Reply With Quote
Old 03-02-2015, 04:28 AM   #3
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Just to make explicit something in dariobers great reply, the general theory is that if reads are uniformly drawn from the genome, then coverage should follow a Poisson distribution.

It should be noted that in reality this isn't the case, and I don't think anyone actually uses this equation for these purposes anymore. In fact, it's vastly more reliable to just generate fake reads and then map them, since it turns out that not all regions are very mappable and there's also often a bias in what's even sequenced.

Having said that, the original context of the equations was more useful for assembly, since the equations can answer how many gaps one should expect given a certain number of reads (clones originally, but this was all pre-NGS). Again, though, I think people would be more likely to use k-mer frequency histograms for this sort of thing these days.
dpryan is offline   Reply With Quote
Old 03-02-2015, 09:16 AM   #4
netpumber
Member
 
Location: GER

Join Date: May 2014
Posts: 21
Default

Thank you very much guys.
netpumber is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:58 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO