Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNA-Seq Experimental Design Questions

    Hello,

    My name is David Brohawn and I am new to RNA-Seq.

    My advisor and I are interested in doing an RNA-Seq experiment to compare the transcriptomes of iPSC neurons we generate from both ALS patients and controls. Ultimately we would like to identify molecular phenotypes based on transcriptome expression profiles for different instances of ALS (much like how cancer researchers now identify underlying molecular phenotypes for different instances of a given cancer).

    We are primarily interested in generating transcriptome profiles (involving both coding and non-coding RNA and novel transcripts), with a heavy interest in differential gene expression and less interest in mapping full transcript isoforms.

    As I understand it, a greater number of small reads is best to assess differential gene expression (Solid and Illumina look most amenable to this), while a smaller number of long reads is best to assess isoforms (Roche and PacBio look most amenable to this).

    I see the ENCODE project recommends “Experiments whose purpose is discovery of novel transcribed elements and strong quantification of known transcript isoforms… a minimum depth of 100-200 M 2 x 76 bp or longer reads is currently recommended.”

    We plan on using Illumina Truseq total RNA prep kits followed by sequencing on the Illumina HiSeq 2500. An Illumina rep quoted 187 million reads per lane as typical output for a 2X100 run. If this is true, I am thinking we multiplex our 20 total samples (10 cases and controls) and run 11 total lanes which would average out to just over 100 million reads per sample.

    We would then analyze the data with the Tuxedo Suite bioinformatics package (we may substitute STAR for Tophat and Bowtie), and visualize our data using CummeRbund.

    We are considering purchasing a LINUX based machine or a Mac with these specs for processing:

    CPU – 2 quad core processors
    HDD 8 TB – RAID assembly of 4 2-TB drives
    RAM – 24 GB of RAM
    GHz – 3.2 GHz

    I have been told the number of reads per sample may be overkill given our goals, but I am really following ENCODEs recommendations. Do you all have any suggestions based on what I have reported?

    Thanks for taking the time to read and respond!

    Dave Brohawn

  • #2
    Have a look at this paper: http://seqanswers.com/forums/showthread.php?t=40365

    Comment


    • #3
      Some comments and analysis from the exciting and fast moving world of Genomics. This blog focuses on next-generation sequencing and microarray technologies, although it is likely to go off on tangents from time-to-time


      You could run all 20 of your samples across 2 lanes and get somewhere approaching 20m reads per sample. This should be more than adequate for differential expression analysis.

      Comment


      • #4
        Hey Guys,

        It looks like for a run of the mill differential gene expression analysis, 20-30 M reads is more than sufficient based on your response, Tony, and the paper that GenoMax kindly supplied.

        While we are most interested in differential gene expression, we still want to have a thorough representation of the transcriptome for both control and disease groups including novel transcripts. We aren't overly concerned with the ability to capture transcripts expressed at very low levels. Does 20-30 M still sound like a safe bet given these additional points?

        Further, while I understand the use of short reads is more amenable to differential gene expression analysis than it is for isoform detection or mapping, I would like to optimize our short read study design in a way that most benefits the Tuxedo Suite software algorithms in probabilistically guessing what isoforms we have present. This led me to choose the Illumina platform over Solid (100 bp reads over 35 bp reads), and paired end instead of single end reads to aid in alignment efforts. Does my rationale and this aspect of the study design sound appropriate for my goal?

        I appreciate your helping a newbie

        Dave

        Comment


        • #5
          Originally posted by TonyBrooks View Post
          http://core-genomics.blogspot.co.uk/...ions-need.html

          You could run all 20 of your samples across 2 lanes and get somewhere approaching 20m reads per sample. This should be more than adequate for differential expression analysis.
          @Tony: Can you correct this URL? It does not seem to be pointing to a specific link.
          Last edited by GenoMax; 01-31-2014, 01:01 PM.

          Comment


          • #6
            Originally posted by dbroh11 View Post
            This led me to choose the Illumina platform over Solid (100 bp reads over 35 bp reads), and paired end instead of single end reads to aid in alignment efforts. Does my rationale and this aspect of the study design sound appropriate for my goal?

            I appreciate your helping a newbie

            Dave
            Sequencing more reads is not going to hurt but what the general consensus is that you do not want to go overboard (i.e. 100 million) since that is a case of diminishing returns.

            There has been past discussion on benefits of single-end and paired-end reads but nothing that is of recent vintage. Here are a couple of links to peruse.

            Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

            Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

            Comment


            • #7
              Our sequencing center most often aims for 30M reads per sample for rnaSeq projects. However balancing the samples to get 30M each is troublesome. The way we do this is to do one (partial) sequencing run that undershoots 30M and then re-cluster the samples so that the next run will combined with the first run in order to bring up the per-sample reads to 30M. If you were going to do a 'one-shot' sequencing run then you will have to aim for around 50M reads in order to have at least 25M reads per sample. I'll agree that aiming for 100M reads is overkill.

              Comment


              • #8
                I appreciate your guys help with this - Do you have literature aside from the paper GenoMax sent that supports using far less than 100M reads (what Encode proposed?) I understand ENCODE is not the end all be all and their recommendations are several years old, but would like to better understand rationale/see more empirical data suggesting 50M is sufficient prior to committing funds to the project.

                In addition, do you all know of any literature out there showing the use of 100 bp reads over 35 bp reads (Illumina vs SOLiD) truly benefits Cufflink's estimation of the prevalence of different isoforms? We are most interested in differential gene expression so I have narrowed our design down to using shorter reads, but am still mulling over the pros and cons of these two platforms.

                Many thanks

                Dave

                Comment


                • #9
                  If you want the best isoform detection capability and have lots of money, long paired-end reads on illumina are the best option. Note that with 250bp reads and a 400bp fragment length, you should be able to get 400bp of continuous sequencing for most reads, with overlap (for consistency checks) around 50bp. We've found that 30Mish reads (i.e. 10M~100M) are fine for hypothesis-generating analysis, so go ahead and multiplex if you've got more than that.

                  The longer the sequence, the more chance you have of catching multiple splice points in a single read. If you don't do this you have to guess at possible isoforms based on frequency counts.
                  Last edited by gringer; 02-03-2014, 12:46 PM.

                  Comment


                  • #10
                    Originally posted by gringer View Post
                    If you want the best isoform detection capability and have lots of money, long paired-end reads on illumina are the best option.
                    If you want the best isoform detection capability and have an INSANE amount of money, PacBio runs with a few different size selections would be the best option.
                    AllSeq - The Sequencing Marketplace
                    [email protected]
                    www.AllSeq.com

                    Comment


                    • #11
                      Hi,

                      I am using TruSeq RNA sample prep kit v2 for WTA library. I started with the 6 ug of total RNA followed by Elute-prime fragment for 2 mins, 1st strand cDNA and then 2nd strand cDNA synthesis and got the following qubit readings

                      Elute primer fragment (RNA BR Assay): 15.6 ng/ul
                      dsCDNA synthesis (DNA dsHS assay)_before 1.8x bead purification: 0.312 ng/ul
                      dsCDNA synthesis (DNA dsHS assay)_after 1.8x bead purification: 0.225 ng/ul

                      On the basis of qubit reading i wanted to know that

                      >is it enough concentration of dscDNA or i am loosing the dscDNA amount? i didn't check the dscDNA profile on HS chip.
                      > My mRNA enrichment process and the results are satisfactory for cDNA conversion ?
                      > Is cDNA conversion done?
                      > How can i check my first strand cDNA product?

                      Basically, i wanted to know the checkpoints of each step to confirm that library preparation protocol is running correctly?
                      Last edited by mukeshwar; 02-04-2014, 10:21 AM.

                      Comment


                      • #12
                        Originally posted by mukeshwar View Post
                        Hi,

                        I am using TruSeq RNA sample prep kit v2 for WTA library. I started with the 6 ug of total RNA followed by Elute-prime fragment for 2 mins, 1st strand cDNA and then 2nd strand cDNA synthesis and got the following qubit readings

                        Elute primer fragment (RNA BR Assay): 15.6 ng/ul
                        dsCDNA synthesis (DNA dsHS assay)_before 1.8x bead purification: 0.312 ng/ul
                        dsCDNA synthesis (DNA dsHS assay)_after 1.8x bead purification: 0.225 ng/ul

                        On the basis of qubit reading i wanted to know that

                        >is it enough concentration of dscDNA or i am loosing the dscDNA amount? i didn't check the dscDNA profile on HS chip.
                        > My mRNA enrichment process and the results are satisfactory for cDNA conversion ?
                        > Is cDNA conversion done?
                        > How can i check my first strand cDNA product?

                        Basically, i wanted to know the checkpoints of each step to confirm that library preparation protocol is running correctly?
                        Please create a new thread since your question is unrelated to the thread you posted in.

                        New threads can be created by:

                        SeqAnswers.com --> Click "Forums" left navigation box --> Choose an appropriate forum to post question in --> "New Thread" button at top left.

                        You can then delete this post by choosing "Edit" --> "Go Advanced" --> Delete.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 03-27-2024, 06:37 PM
                        0 responses
                        12 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-27-2024, 06:07 PM
                        0 responses
                        11 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        53 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        69 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X