Seqanswers Leaderboard Ad

**amitm** · 02-14-2012, 11:56 AM

very interesting question..
any replies please... :-)

**Simon Anders** · 02-24-2012, 07:32 AM

You are halfway there, and a few extra preparation make things easier. First, how many cDNA fragments do we get from your N transcript molecules? Let's say, we fragment to lf=200 bp pieces, and the average length of the genes is L=1000 bp. Then, we should get roughly N' = N*L/lf = 500,000 cDNA molecules out of that. How many of these tell us about the isoform that you want to detect? If only a single stretch of l=200 bp is useful to ascertain that it is the isoform of interest and not another one, and if your gene has length 1000 bp, then it fragments, on average, into 5 pieces, only one of which is useful, i.e., we get n'=10 useful cDNA molecules which we have to fish out from N'=500,000 molecules. So, taking a random read, the probability that it is from the transcript stretch you are looking for is p = n'/N' = 1/50,000.

Note that we did not need the number of cells; we just assume that we have enough so that we can ignore the possibility that the few cells we are looking at happen to not contain the rare isoform, or that we lose it during sample prep because there are so few.

Hence, the answer is simple now: If you get a total of, say NR = 2,000,000 reads from your sequencing run, the probability that none of these contains the stretch of sequence that proves existence of your isoform, is given by (1-p)^NR=4e-18, i.e., it is nearly certain that you find it, using the numbers you suggested. This is because the expected number of reads from the stretch, p*NR, is 40, which is a lot. If n' were smaller, say, 1, this will look differently.

Finally, if you want to quantify the abundance n/N, the precision of your quantification is roughly 1/sqrt(p*NR), due to Poisson noise. Here 1/sqrt(40)=16%.

**Jon_Keats** · 02-24-2012, 08:53 PM

I think it's time to write up a book of Simon's replies. I'm constantly stunned by the things I still do not understand but greatly appreciate being educated.

**schelhorn** · 02-27-2012, 03:25 AM

Thanks Simon for setting this straight for me. And I agree with Jon's suggestion. Your answers usually are both precise and comprehensive, which make you a great asset for this forum.

**tantramarseq** · 08-27-2013, 02:17 PM

Thanks

Hi - an outdated thanks for this informative thread.
I am setting up some course notes and the arithmetic was helpful.
I think there is a term switch between the OP and Simon, using 'p' for two different factors (OP - desired statistical power; Simon - probability of a random read coming from the target transcript).
cheers, Doug

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 55 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 52 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Statistical model for RNA-Seq sensitivity estimation

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News