Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Minimum short read required for transcriptome assembly

    I have Illumina short read, 2X50bp right now, around 14Gb data.
    I just curious whether got any parameter or formula able to calculate the minimum short read required to assemble a transcript sequence by transcriptome assembler program in order to obtain comprehensive transcript?
    eg. must have at least 1Mb Illumina short read in order to assemble it.

    Do we need consider coverage and depth of data when determine or calculate the minimum short read required for transcriptome assembly as well?

    Many thanks for advice.

  • #2
    Ah, I should have noted that you are a "Senior Member" and thus undoubtedly already know more about sequencing than many of us. My response below was more aimed towards the many new people we get on SeqAnswers thus it may not be applicable to you. Wish I did have more than a rough guide on an actual formula to use.

    -------------------

    Originally posted by edge View Post
    Do we need consider coverage and depth of data...
    Yes you do. In particular for a non-normalized transcriptome or non-rRNA-depleted sample then you need to be concerned with picking up low expression genes.

    You do not give enough information for us to make an intelligent decision for your particular case (e.g., we would need information on the organism you are sequencing, the complexity of the genes for the organism, if your sequence sample is normalized or not, etc.) However we can play around with some very rough numbers.

    Let us assume that your sample is completely normalized. In other words each transcript (gene) is present once and only once in your sample. Assume a complex eukaryotic organism. Then our numbers could look like:

    100,000 genes at 1000 bases each ... equals a sequence space of 100 Mbase

    Desire 30x sequencing coverage ... means we need 3 GB of sequence.

    Your 14 GB will do quite nicely.

    On the other hand let us assume that you do not have a normalized sample. Then some genes will be present thousands of times. Others only once. I am sure that there is some graph out there that describes this behavior and provides a multiplication factor but I'll make a wild guess that this increase the sequence space by at least 10. Thus you would need 30 GB of sequence.

    The numbers above are very, very rough so do not base your research off of them. The numbers are more meant as a way to say "... it depends ..."
    Last edited by westerman; 09-21-2011, 10:34 AM. Reason: Realized that 'edge' is not a newbie.

    Comment


    • #3
      The following publication shows a number of simulations on transcriptome assembly and the effects of coverage and sequencing technology. It`s a bit dated now but should help you out. I believe they also have some online software so you can do your own rough simulation.

      Wall PK, Leebens-Mack J, Chanderbali AS, Barakat A, Wolcott E, Liang H, Landherr L, Tomsho LP, Hu Y, Carlson JE, Ma H, Schuster SC, Soltis DE, Soltis PS, Altman N, dePamphilis CW. Comparison of next generation sequencing technologies for transcriptome characterization. BMC Genomics. 2009 Aug 1;10:347.

      Comment


      • #4
        many thanks, westerman.

        I have a RNA-seq human lung sample, 2X100bp, pair-end read with total 14GB file size right now.
        I plan to map my RNA-seq data against transcriptome database that downloaded from NCBI.
        After then, I plan to cluster all the short read depend on their mapped transcript group.
        My problem facing is to determine how many minimum pair-end read is best to be a cut-off for assembly purpose.
        From the mapping result, some of the transcript group only mapped by thousand read pair.

        Thanks for any advice.

        Comment


        • #5
          Minimum deep of coverage in transcriptome assembly

          Hi everyone, i have 4,46 Gigas of information on various sequencing of transcripts in various tissues of Illumina Miseq paired-end reads. I had assembly all these reads and i found that the mean deep of coverage is of 27,9X (Deep of coverage = efficiency of sequencing / efficiency of assembly)
          My question here is, what is de minimun of the deep of coverage for obtain robust information of the assembled transcriptome in a de novo transcriptome analysis?

          Thanks!
          Best regards!

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          51 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          68 views
          0 likes
          Last Post seqadmin  
          Working...
          X