Seqanswers Leaderboard Ad

**malachig** · 10-06-2010, 09:21 PM

There is some discussion of this topic for human transcriptomes sequenced by Illumina paired-end sequencing here: ALEXA-seq. Most of the relevant figures and text are in the supplementary materials. I'm sure there are comparable discussions for 454 and SOLID.

I agree that there is not a consensus. Part of the problem is that the answer to the question is highly dependent on the end goals of your analysis and how you define these end points. For example, you mention X number of reads are required before the discovery rate 'flattens'. Flat is a highly subjective term. Unless the slope of the line is 0, it is not flat. How flat is flat enough?

The expression level difference between the most lowly expressed gene and the highest is very large (4 to 7 orders of magnitude depending on how you measure/estimate). This means, that when sampling randomly and noting newly discovered genes, the line begins to flatten very quickly (as all the most highly expressed genes are observed). But many lowly expressed genes will still not have been observed or sequenced to your minimum depth requirement. The discovery rate slows but unless you only are interested in the most highly expressed genes, you need to continue sequencing... If you want to cover 95% of base positions of 95% of expressed genes (including very lowly expressed genes) you may be surprised how much coverage you need. Unfortunately it also seems to depend a fair bit on the tissue you are studying, the manner of library preparation (library normalized versus not?), etc.

You can search the forums, but quickly here are some more posts relevant to your question: one, two, three.

**adumitri** · 01-03-2013, 02:25 PM

Hi malachig,

I was wondering if there are any new insights that you could give me on the topic of RNA-Seq read depth. Assuming that the RNA samples are polyA-tail selected, and the sequencing is done with 100 nucleotides, paired-end reads, what number of sequences/sample would be optimal to explore transcript differential expression for a high proportion of the transcriptome (even when the genes are expressed at a low level)?

Are there any relevant article reviews on this topic that you might be aware of? It is clear to me that tissue type (e.g. brain vs liver), RNA preparation protocols, RNA quality (e.g. RIN), and specific research questions for the RNA-Seq data will all have a great impact on the optimal read-depth and it would be great if some studies have already been performed to address some of these variables.

Thank you,
Alexandra

**schelhorn** · 01-04-2013, 03:58 AM

Thanks, malachig, for the insightful answer. Just to add to this thread, there is a recent paper for coverage estimates in monoculture bacterial transcriptomes that goes into some detail. It's on bacteria, so obviously the results are not applicable to human. Also, this Genome Research paper and this Bioinformatics paper may be of interest. Perhaps we and others could return this thread in case new references turn up and add them here. Until then, 100M reads seem to be a good target for human.

**adumitri** · 01-17-2013, 07:15 AM

schelhorn, thank you for the references! They were very useful.

**sisch** · 01-22-2013, 04:00 AM

I was just reading a paper about NOIseq (Differential expression in RNA-seq: A matter of depth) and had to think of this thread. In the paper they state "Some recent reports suggest that in a mammalian genome, about 700 million reads would be required to obtain accurate quantification of >95% of expressed transcripts (Blencowe et al. 2009) ..."
I didn't check the primary source, but maybe you will find your answer there. Full reference is:
Blencowe et al. 2009: Current-generation high-throughput sequencing: deepening insights into mammalian transcriptomes. Genes Dev 23: 1379-1386

Best,
Simon

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Required sequencing depth for finding (nearly) all unique human transcripts

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News