SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Illumina/Solexa (http://seqanswers.com/forums/forumdisplay.php?f=6)
-   -   How many million reads required to have a 20x coverage for rat RNA (http://seqanswers.com/forums/showthread.php?t=50094)

wingtec 02-04-2015 09:27 AM

How many million reads required to have a 20x coverage for rat RNA
 
Hi All,

Try to seek a piece of advice --- we are trying to obtain an average coverage of 20x for RNA-Seq of rat tissues. How many million reads should we try to get for each library to have that kind of coverage?

Much thanks in advance!


Wing

ymc 02-04-2015 09:54 AM

Well, it is very hard to tell because different tissues will have different part of the transcriptome expressed.

If $$$ is not an issue, 100M 2x100 reads should very likely be an overkill of what you want.

Good luck!

RNA 02-04-2015 02:20 PM

"20X coverage" for RNA-Seq is difficult to define since the copy number varies for transcripts across at least 4 orders of magnitude within a tissue. Therefore estimating "coverage" for RNA-Seq is not nearly as straightforward as it is for DNA applications.

For very highly expressed transcripts, as little as 1 Million reads will easily give you 20X coverage.

But for rare transcripts, you can collect 1 Billion or more reads and still not ever get to 20X coverage.

And of course this issue varies depending upon which tissue you are studying as well...a transcript may be easy to study in liver, but be virtually absent in brain.

For mRNA sequencing (TruSeq Stranded mRNA Kits) we usually recommend 50 Million paired-end 2 X 75 bp reads...you can always go to 100M if you want deeper coverage...but beyond that the cost-benefit ratio of collecting more reads on a single sample really falls off dramatically.

wingtec 02-04-2015 06:49 PM

Thanks much to y'all. These are very useful as well as practical helps!

Wing

wingtec 02-05-2015 05:41 AM

With all that said, if I am allowed to twist the question a bit.

Say, I already have some Affy microarray data and I want to better or at least confirm the array data with RNA-Seq. The Affy chip used was HG ST gene array and the experiment was done with n=3. Now we want to do also n=3 in RNA-Seq, will 20M clean read of PE2x100 have similar or better coverage than the array data?

Thanks

Wing

pmiguel 02-05-2015 07:28 AM

Quote:

Originally Posted by wingtec (Post 159637)
With all that said, if I am allowed to twist the question a bit.

Say, I already have some Affy microarray data and I want to better or at least confirm the array data with RNA-Seq. The Affy chip used was HG ST gene array and the experiment was done with n=3. Now we want to do also n=3 in RNA-Seq, will 20M clean read of PE2x100 have similar or better coverage than the array data?

Thanks

Wing

Long, long ago, when we did SOLiD runs, an ABI applications specialist told us 5M reads was equivalent to an Affy Chip. But I don't know what that was based on.
Possibly there are comparisons in the literature?
--
Phillip

AllSeq 02-05-2015 09:35 AM

Quote:

Originally Posted by wingtec (Post 159637)
With all that said, if I am allowed to twist the question a bit.

Say, I already have some Affy microarray data and I want to better or at least confirm the array data with RNA-Seq. The Affy chip used was HG ST gene array and the experiment was done with n=3. Now we want to do also n=3 in RNA-Seq, will 20M clean read of PE2x100 have similar or better coverage than the array data?

Thanks

Wing

People have a lot of opinions on the amount of coverage needed for RNA-Seq - it almost turns into a religious debate! Generally speaking, 10M reads should give you 'array-like' coverage. 20M PE reads (which I'm defining as 20M clusters) would be even better. If cost is a major issue, you could either reduce the number of clusters or go for SE reads. PE is nice, but unless you're going to do the hard work of trying to figure out splice isoforms, it's probably not necessary.

Good luck with the experiment!

RNA 02-05-2015 10:33 AM

I would highly recommend this blog from CoreGenomics which tries to address this issue using the data published by the SEQC group last year:

http://core-genomics.blogspot.com/20...not-quite.html

Bottom line is that many independent groups have come to the same conclusion: 10M to 20M single-end 50 bp reads (from libraries made with polyA mRNA preps) will give gene-level expression values that are better than an AFFY array.

These days, what with the lower price of sequencing etc., I always try to default to 25M paired-end 2x75 bp reads. This data will persist for a long time and can be used by lots of different pipelines to do more advanced analysis of splicing, fusions, and novel transcript discovery than can be done with 50 bp SE reads alone.

mbblack 02-05-2015 10:48 AM

Quote:

Originally Posted by wingtec (Post 159637)
With all that said, if I am allowed to twist the question a bit.

Say, I already have some Affy microarray data and I want to better or at least confirm the array data with RNA-Seq. The Affy chip used was HG ST gene array and the experiment was done with n=3. Now we want to do also n=3 in RNA-Seq, will 20M clean read of PE2x100 have similar or better coverage than the array data?

Thanks

Wing

Note that regardless of depth of coverage, you may well not be able to "confirm" some array results with an independent RNA-seq experiment. Just because you detect any given gene as significantly differentially expressed in one experiment does not mean you will do so in the other experiment. Sometimes the overlap in DEGs is great, but sometimes it can be quite low.

You may get better correspondance (better confirmation) in the end by ontology enrichment comparisons of the genes selected from the two experiments than you will with a direct comparison of signficant gene lists. Particularly given that your n=3 for biological replication is a minimally low number of replicates.

Array equivalence is a two part issue to my mind. First is the issue of equivalent sensitivity - how much RNA-seq coverage will give you equivalent statistical sensitivity of detection of change? But how much coverage do you need to pick up either the equivalent number of DEGs or largely the same set of DEGs is a different issue. Typically, coverage for the former is far less than for the latter. 5-10M reads per sample will equal or exceed array sensitivity, but you'd be better to have 30-50M reads per sample if you want a good chance of getting high overlap in detected DEGs in both experiments (in my experience).


All times are GMT -8. The time now is 11:11 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.