
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
explain plz depth for formula or theory?  jeonhj  Bioinformatics  1  01202015 09:29 PM 
Do you need to learn or refresh your molecular biology theory and lab bench skills?  ScottC  Events / Conferences  0  10142014 06:39 PM 
Can any one help with Lander waterman equations?  D.Alshehri  Bioinformatics  2  05092013 04:55 AM 
Smith Waterman for 3 Sequences  julx5  Bioinformatics  0  03052013 10:20 AM 
Apllications of Graph Theory in Next Generation Sequencing  martin_313  General  3  02082012 01:40 PM 

Thread Tools 
03022015, 01:39 AM  #1 
Member
Location: GER Join Date: May 2014
Posts: 21

LanderWaterman theory explanation
Hi.
Could someone post any reference or describe the logic of the that theory (even with images). I read the text on wikipedia but couldn't understand much because i'm now beginning with NGS. Is this a method to estimate the quality of your library and the size of its fragments ? When do we use calculators for that ? Thank you. Last edited by netpumber; 03022015 at 01:41 AM. 
03022015, 04:28 AM  #2  
Senior Member
Location: Cambridge, UK Join Date: May 2010
Posts: 311

Quote:
Say your genome is of size G, you sequence N reads of length L, this R code answers the two questions above (given all the assumptions required): Code:
L< 100 G< 3*1e9 N< 100*1e6 ## Expected coverage C< (L*N)/G ## % genome covered with depth... depth< 0:10 exp_cov< dpois(depth, lambda= C) * 100 ggdepth< qplot(x= depth, y= exp_cov, xlab= 'Depth', ylab= '% genome', main= 'Amount of genome\ncovered at depth n') + geom_line() ggCum< qplot(x= rev(depth), y= cumsum(rev(exp_cov)), xlab= 'Depth', ylab= '% genome', main= 'Amount of genome \ncovered at least with depth n') + geom_line() This is just a use case example... 

03022015, 05:28 AM  #3 
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480

Just to make explicit something in dariobers great reply, the general theory is that if reads are uniformly drawn from the genome, then coverage should follow a Poisson distribution.
It should be noted that in reality this isn't the case, and I don't think anyone actually uses this equation for these purposes anymore. In fact, it's vastly more reliable to just generate fake reads and then map them, since it turns out that not all regions are very mappable and there's also often a bias in what's even sequenced. Having said that, the original context of the equations was more useful for assembly, since the equations can answer how many gaps one should expect given a certain number of reads (clones originally, but this was all preNGS). Again, though, I think people would be more likely to use kmer frequency histograms for this sort of thing these days. 
03022015, 10:16 AM  #4 
Member
Location: GER Join Date: May 2014
Posts: 21

Thank you very much guys.

Thread Tools  

