SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
finding unique reads with clc bio shawpa Bioinformatics 0 10-18-2011 06:41 AM
Bowtie and Human transcripts MerFer Bioinformatics 4 02-24-2011 08:00 AM
the minimal amount of DNA required for deep-sequencing platforms! hanjaylee Sample Prep / Library Generation 15 07-30-2010 09:21 AM
Helicos sequencing machine data & format required balamudiam Helicos / Direct Genomics 4 10-27-2009 08:28 PM
Solexa sequencing machine data required!! balamudiam Illumina/Solexa 1 10-26-2009 06:58 PM

Reply
 
Thread Tools
Old 10-06-2010, 04:52 AM   #1
schelhorn
Member
 
Location: Germany

Join Date: Sep 2010
Posts: 10
Default Required sequencing depth for finding (nearly) all unique human transcripts

Dear SEQanswers community,

does anyone know a study where the required sequencing depth/number of mapped reads is estimated for different sequencing technologies (454, Illumina, ABi) that allow identification of N% of the unique transcripts in the human genome? In other words, which depth would be needed to have a 95% coverage of unique transcripts in my human sample? It strikes me that there does not seem to be a published consensus on the depth we need to reliably identify (nearly) all transcripts. It seems to me that this kind of information is necessary for deciding if we can multiplex several samples within a run, as well as for estimating the suitability of long-read technology for whole-transcriptome RNA-Seq.

Literature on the topic seems to be sparse: while reference [1] indicates that up to 80 Million ABi reads in mouse could be necessary before the number of different transcripts that have been identified reaches a plateau, study [2] suggest that about 3 Million mappable Illumina reads from human are required before the discovery rate flattens. Does anyone know equivalent data for 454, or could share some more comprehensive insights on this problem?

[1] Wang et al. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet (2009) vol. 10 (1) pp. 57-63

[2] Li et al. Determination of tag density required for digital transcriptome analysis: application to an androgen-sensitive prostate cancer model. Proc Natl Acad Sci USA (2008) vol. 105 (51) pp. 20179-84
schelhorn is offline   Reply With Quote
Old 10-06-2010, 09:21 PM   #2
malachig
Senior Member
 
Location: WashU

Join Date: Aug 2010
Posts: 117
Default

There is some discussion of this topic for human transcriptomes sequenced by Illumina paired-end sequencing here: ALEXA-seq. Most of the relevant figures and text are in the supplementary materials. I'm sure there are comparable discussions for 454 and SOLID.

I agree that there is not a consensus. Part of the problem is that the answer to the question is highly dependent on the end goals of your analysis and how you define these end points. For example, you mention X number of reads are required before the discovery rate 'flattens'. Flat is a highly subjective term. Unless the slope of the line is 0, it is not flat. How flat is flat enough?

The expression level difference between the most lowly expressed gene and the highest is very large (4 to 7 orders of magnitude depending on how you measure/estimate). This means, that when sampling randomly and noting newly discovered genes, the line begins to flatten very quickly (as all the most highly expressed genes are observed). But many lowly expressed genes will still not have been observed or sequenced to your minimum depth requirement. The discovery rate slows but unless you only are interested in the most highly expressed genes, you need to continue sequencing... If you want to cover 95% of base positions of 95% of expressed genes (including very lowly expressed genes) you may be surprised how much coverage you need. Unfortunately it also seems to depend a fair bit on the tissue you are studying, the manner of library preparation (library normalized versus not?), etc.

You can search the forums, but quickly here are some more posts relevant to your question: one, two, three.
malachig is offline   Reply With Quote
Old 01-03-2013, 01:25 PM   #3
adumitri
Member
 
Location: Cambridge, MA

Join Date: Jan 2010
Posts: 27
Default

Hi malachig,

I was wondering if there are any new insights that you could give me on the topic of RNA-Seq read depth. Assuming that the RNA samples are polyA-tail selected, and the sequencing is done with 100 nucleotides, paired-end reads, what number of sequences/sample would be optimal to explore transcript differential expression for a high proportion of the transcriptome (even when the genes are expressed at a low level)?

Are there any relevant article reviews on this topic that you might be aware of? It is clear to me that tissue type (e.g. brain vs liver), RNA preparation protocols, RNA quality (e.g. RIN), and specific research questions for the RNA-Seq data will all have a great impact on the optimal read-depth and it would be great if some studies have already been performed to address some of these variables.

Thank you,
Alexandra
adumitri is offline   Reply With Quote
Old 01-04-2013, 02:58 AM   #4
schelhorn
Member
 
Location: Germany

Join Date: Sep 2010
Posts: 10
Default

Thanks, malachig, for the insightful answer. Just to add to this thread, there is a recent paper for coverage estimates in monoculture bacterial transcriptomes that goes into some detail. It's on bacteria, so obviously the results are not applicable to human. Also, this Genome Research paper and this Bioinformatics paper may be of interest. Perhaps we and others could return this thread in case new references turn up and add them here. Until then, 100M reads seem to be a good target for human.

Last edited by schelhorn; 01-04-2013 at 03:01 AM.
schelhorn is offline   Reply With Quote
Old 01-17-2013, 06:15 AM   #5
adumitri
Member
 
Location: Cambridge, MA

Join Date: Jan 2010
Posts: 27
Default

schelhorn, thank you for the references! They were very useful.
adumitri is offline   Reply With Quote
Old 01-22-2013, 03:00 AM   #6
sisch
Member
 
Location: Dusseldorf, Germany

Join Date: Jun 2011
Posts: 29
Default

I was just reading a paper about NOIseq (Differential expression in RNA-seq: A matter of depth) and had to think of this thread. In the paper they state "Some recent reports suggest that in a mammalian genome, about 700 million reads would be required to obtain accurate quantification of >95% of expressed transcripts (Blencowe et al. 2009) ..."
I didn't check the primary source, but maybe you will find your answer there. Full reference is:
Blencowe et al. 2009: Current-generation high-throughput sequencing: deepening insights into mammalian transcriptomes. Genes Dev 23: 1379-1386

Best,
Simon
sisch is offline   Reply With Quote
Reply

Tags
coverage, discovery rate, rna-seq, unique transcript

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:25 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO