Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • schelhorn
    Member
    • Sep 2010
    • 10

    Required sequencing depth for finding (nearly) all unique human transcripts

    Dear SEQanswers community,

    does anyone know a study where the required sequencing depth/number of mapped reads is estimated for different sequencing technologies (454, Illumina, ABi) that allow identification of N% of the unique transcripts in the human genome? In other words, which depth would be needed to have a 95% coverage of unique transcripts in my human sample? It strikes me that there does not seem to be a published consensus on the depth we need to reliably identify (nearly) all transcripts. It seems to me that this kind of information is necessary for deciding if we can multiplex several samples within a run, as well as for estimating the suitability of long-read technology for whole-transcriptome RNA-Seq.

    Literature on the topic seems to be sparse: while reference [1] indicates that up to 80 Million ABi reads in mouse could be necessary before the number of different transcripts that have been identified reaches a plateau, study [2] suggest that about 3 Million mappable Illumina reads from human are required before the discovery rate flattens. Does anyone know equivalent data for 454, or could share some more comprehensive insights on this problem?

    [1] Wang et al. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet (2009) vol. 10 (1) pp. 57-63

    [2] Li et al. Determination of tag density required for digital transcriptome analysis: application to an androgen-sensitive prostate cancer model. Proc Natl Acad Sci USA (2008) vol. 105 (51) pp. 20179-84
  • malachig
    Senior Member
    • Aug 2010
    • 117

    #2
    There is some discussion of this topic for human transcriptomes sequenced by Illumina paired-end sequencing here: ALEXA-seq. Most of the relevant figures and text are in the supplementary materials. I'm sure there are comparable discussions for 454 and SOLID.

    I agree that there is not a consensus. Part of the problem is that the answer to the question is highly dependent on the end goals of your analysis and how you define these end points. For example, you mention X number of reads are required before the discovery rate 'flattens'. Flat is a highly subjective term. Unless the slope of the line is 0, it is not flat. How flat is flat enough?

    The expression level difference between the most lowly expressed gene and the highest is very large (4 to 7 orders of magnitude depending on how you measure/estimate). This means, that when sampling randomly and noting newly discovered genes, the line begins to flatten very quickly (as all the most highly expressed genes are observed). But many lowly expressed genes will still not have been observed or sequenced to your minimum depth requirement. The discovery rate slows but unless you only are interested in the most highly expressed genes, you need to continue sequencing... If you want to cover 95% of base positions of 95% of expressed genes (including very lowly expressed genes) you may be surprised how much coverage you need. Unfortunately it also seems to depend a fair bit on the tissue you are studying, the manner of library preparation (library normalized versus not?), etc.

    You can search the forums, but quickly here are some more posts relevant to your question: one, two, three.

    Comment

    • adumitri
      Member
      • Jan 2010
      • 27

      #3
      Hi malachig,

      I was wondering if there are any new insights that you could give me on the topic of RNA-Seq read depth. Assuming that the RNA samples are polyA-tail selected, and the sequencing is done with 100 nucleotides, paired-end reads, what number of sequences/sample would be optimal to explore transcript differential expression for a high proportion of the transcriptome (even when the genes are expressed at a low level)?

      Are there any relevant article reviews on this topic that you might be aware of? It is clear to me that tissue type (e.g. brain vs liver), RNA preparation protocols, RNA quality (e.g. RIN), and specific research questions for the RNA-Seq data will all have a great impact on the optimal read-depth and it would be great if some studies have already been performed to address some of these variables.

      Thank you,
      Alexandra

      Comment

      • schelhorn
        Member
        • Sep 2010
        • 10

        #4
        Thanks, malachig, for the insightful answer. Just to add to this thread, there is a recent paper for coverage estimates in monoculture bacterial transcriptomes that goes into some detail. It's on bacteria, so obviously the results are not applicable to human. Also, this Genome Research paper and this Bioinformatics paper may be of interest. Perhaps we and others could return this thread in case new references turn up and add them here. Until then, 100M reads seem to be a good target for human.
        Last edited by schelhorn; 01-04-2013, 04:01 AM.

        Comment

        • adumitri
          Member
          • Jan 2010
          • 27

          #5
          schelhorn, thank you for the references! They were very useful.

          Comment

          • sisch
            Member
            • Jun 2011
            • 29

            #6
            I was just reading a paper about NOIseq (Differential expression in RNA-seq: A matter of depth) and had to think of this thread. In the paper they state "Some recent reports suggest that in a mammalian genome, about 700 million reads would be required to obtain accurate quantification of >95% of expressed transcripts (Blencowe et al. 2009) ..."
            I didn't check the primary source, but maybe you will find your answer there. Full reference is:
            Blencowe et al. 2009: Current-generation high-throughput sequencing: deepening insights into mammalian transcriptomes. Genes Dev 23: 1379-1386

            Best,
            Simon

            Comment

            Latest Articles

            Collapse

            • GATTACAT
              Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by GATTACAT
              Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
              07-01-2026, 11:43 AM
            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 07-02-2026, 11:08 AM
            0 responses
            11 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-30-2026, 05:37 AM
            0 responses
            14 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-26-2026, 11:10 AM
            0 responses
            20 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            54 views
            0 reactions
            Last Post SEQadmin2  
            Working...