Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by pmiguel View Post
    Not sure why this bothers you. I think they are trying to address the question: How few reads am I likely to be able to get away with while retaining most of the information I need. So you can look at the curves they generate and dial in a number that might work for the genes you are interested in.

    I don't think their results were intended to surprise.

    --
    Phillip
    I'm not completely sure why it bothers me but it does non the less. I may eventually be able to put it into words. Maybe because within those plots you're drawn into comparing the different subsets of the complete data set to one another. So in a 100M read set you might be looking at what 50% of those reads quantifies as to 60% of the reads. I'd wager that is not as informative as looking at actual technical replicates sequenced to different depths. In the subset approach 83% of the reads in the 60% subset were quantified previously in the 50% subset so very little new information is added into the quantification and the chance that the quantification is different isn't nearly as great as a 20% quantification compared to a 10% quantification where only 50% of the data was in the previous subset. If you think of it in the other direction, starting with 100% of your data, how much of an impact would you expect to see to the quantification by removing 10% of the data. What about 70%?

    I guess what I'm getting at is I think those plots show a property of sub-setting a fixed size sample at different percentages. It's not a property of RNA-Seq data and I'm afraid that the analysis shows common results independent of the actual number of reads collected. For example if you've got 40 million reads the analysis might show that once you're past 20 or 25 million reads you've got a pretty good picture of what's happening at 40 million reads. If you had 900 million reads it might show you that around 400 or 500 million reads you've got a good picture of what's happening at 900 million reads. I suspect that in all cases when you have N total reads then this analysis will show you that you have a pretty good picture of the quantification of N reads at N/2 or 2N/3 reads. It's just a property of percentages. Take 10% of your data and that's 100% different than no data. At 20%, 50% of the data came from the last subset. At 30%, 66% of the data was quantified in the 20% subset. By 60%, as I mentioned before, you quantified 83% of the that sample in the 50% subset. At each additional subset the the chance of the quantification looking different from the previous subset gets smaller and smaller independent of the numbers of reads involved in each subset.

    Of course depth is important for other things - like guaranteeing good coverage of specific length transcripts down to some RPKM level and for splicing analysis. But to say that the gene expression values at 100 million reads are more correct than the values at 50 million seems to me irrelevant and it's certainly not demonstrated by comparing 50% of your reads to 100% of your reads within a single sample. The gene expressions of a single sample sequenced to 500 million reads are still just from a single sample. The distribution of expressions from 10 biological replicates sequenced at 50 million reads each would be much more reliable...which is obvious, right? It's just like any other kind of experiment.
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment


    • #17
      So before we go to page 2 in this interesting discussion .. which makes it harder for someone coming along afterwards (or searching) to find the conclusive information

      Shall we start collecting read levels people feel comfortable recommending (in "Mil" reads for a genome of "X" size).

      Possible scenarios (feel free to add more):

      1. for a run of the mill (not looking for anything fancy, microarray replacement type of RNA-seq) ---
      2. a run looking for alternate splicing ---
      3. looking for an elusive gene ---
      4. nothing is known about this transcriptome --

      Comment


      • #18
        start with the question

        Originally posted by LP_SEP23 View Post
        Hi,

        I just got my first RNA-seq dataset alignment results today, and I am wondering what would be considered a "good run".
        This is a pilot experiment, whose objective is to determine how many samples we can multiplex in one lane of Illumina HiSeq to get the information we want and the best trade off between cost, specificity and sensitivity.
        I have some basic questions I ask everyone that asks me this.

        1) What is the biological question you are trying to answer? (depending on your answer to this will determine if you want gene level expression differences or differential splice variation)
        2) What is your organism and how different are your biological replicates from each other at a genomic level (clones, siblings, population).
        -Organism matters because different species have different evolutions and polyploidy like events(more important for plants than animals).
        -large variation (overdispersion) between biological replicates is common and more so depending on how you controlled the environment.
        3) Are you hoping to find differential expression of a particular gene or is it ok to find differential expression of gene families (multiple mapping can affect gene expression estimates)
        4) How good is your annotation? Do you anticipate new genic information not contained in the current gff file?

        This just scratches the surface. I had another thread discussing what people are using in the counting (paired concordant, too-long, scramble, inverse, half-mapping, translocations, unpaired reads that map individual but not together and then normalization for paired vs unpaired data)

        The fact that different programs cuffdiff/bayseq/DEseq/DEXseq give different lists of genes makes for yet another interesting discussion also in another thread.

        Comment


        • #19
          Originally posted by severin View Post
          The fact that different programs cuffdiff/bayseq/DEseq/DEXseq give different lists of genes makes for yet another interesting discussion also in another thread.
          indeed.
          /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
          Salk Institute for Biological Studies, La Jolla, CA, USA */

          Comment


          • #20
            This would be my estimates, looking at human, mouse or rat:

            1. for a run of the mill (not looking for anything fancy, microarray replacement type of RNA-seq) 60M
            2. a run looking for alternate splicing 150M
            3. looking for an elusive gene 100M
            4. nothing is known about this transcriptome ?

            About the PCR duplicates: yes, I agree that you will sequence more and more duplicates. But with bigger depth I'm not interested in finding new reads. Let's say I capture one exon with 5-10 reads, these reads are present in my library and it will still give me much better extimates of expression if I sequence several replicates of them. So to do a comparison between samples I still consider the information of how many fragments are present in my library as important.
            Therefor I do not discard duplicate reads but I use them for quantification. For DNA sequencing I understand why one would discard duplicate reads, but not for quantitative analyses.

            Comment


            • #21
              Not to open a new tread for my question:

              I was wondering what is the optimal number of samples per lane on GAIIx if we want to do an RNA-seq of unsequenced plant?
              ------------
              SMART - bioinfo.uni-plovdiv.bg

              Comment


              • #22
                Originally posted by vebaev View Post
                Not to open a new tread for my question:

                I was wondering what is the optimal number of samples per lane on GAIIx if we want to do an RNA-seq of unsequenced plant?
                That is the wrong question. Unless you are doing a massive number of samples, you can and should multiplex all your samples and spread it across however many lanes of you need to achieve the desired depth.

                The more important question is how many samples do you need and how many reads do you need per sample. That will be determined by the goal of the experiment, not the sequencer.

                Comment


                • #23
                  Thanks.
                  The goal is to compare 2 samples of RNA-seq of unsequenced plant, so we do not know the size of the transcriptome. What should be the optimal M reads for this de novo RNA-seq task?
                  I asked since we will have 6 lanes but other colleagues will want to put also their samples (similar as ours samples) , and wanted to decide how much samples in total we can multiplex and load on these 6 lanes to be optimal for this task?
                  ------------
                  SMART - bioinfo.uni-plovdiv.bg

                  Comment


                  • #24
                    If there are only 2 samples is there any advantage in multiplexing them (unless you are worried about lane effect of some kind)?

                    You could just do 3 lanes of each sample. Since you do not know the size of the transcriptome you would want to get as many reads as you can.


                    Originally posted by vebaev View Post
                    Thanks.
                    The goal is to compare 2 samples of RNA-seq of unsequenced plant, so we do not know the size of the transcriptome. What should be the optimal M reads for this de novo RNA-seq task?
                    I asked since we will have 6 lanes but other colleagues will want to put also their samples (similar as ours samples) , and wanted to decide how much samples in total we can multiplex and load on these 6 lanes to be optimal for this task?

                    Comment


                    • #25
                      Originally posted by GenoMax View Post
                      If there are only 2 samples is there any advantage in multiplexing them (unless you are worried about lane effect of some kind)?

                      You could just do 3 lanes of each sample. Since you do not know the size of the transcriptome you would want to get as many reads as you can.
                      I see no reason not to multiplex. Lane effects may or may not be an issue, but why take the chance when it is so easy to multiplex.

                      Comment


                      • #26
                        Originally posted by chadn737 View Post
                        I see no reason not to multiplex. Lane effects may or may not be an issue, but why take the chance when it is so easy to multiplex.
                        I am not sure I understand the benefit of multiplexing in this case. What chance are you referring to?

                        Comment


                        • #27
                          Originally posted by GenoMax View Post
                          I am not sure I understand the benefit of multiplexing in this case. What chance are you referring to?
                          If there are lane to lane effects....which there usually are to some extent, then multiplexing all your samples means that those lane effects will affect all samples, not just one. It may be minor, but its a matter of good experimental design to reduce the possible technical variables as much as possible. Statistically this is the optimal experimental design.



                          Let me reverse the question. What do you gain by NOT multiplexing?

                          Comment


                          • #28
                            And if we came back on the topic what is the optimal number of millions reads for de novo RNA-seq without reference genome? as if we have 12 samples per 6 lanes = ~15M for sample is this OK for this task?
                            ------------
                            SMART - bioinfo.uni-plovdiv.bg

                            Comment


                            • #29
                              Originally posted by vebaev View Post
                              And if we came back on the topic what is the optimal number of millions reads for de novo RNA-seq without reference genome? as if we have 12 samples per 6 lanes = ~15M for sample is this OK for this task?
                              What do you want to do with the data? Are the 12 samples from the same plant? Do you have a close relative sequenced? Do you have any rough estimation of the genome and transcriptome size? You need to be clear about what exactly it is you want to do, there is no simple answer to these questions.

                              Even for Arabidopsis I would want more than 15M per sample.

                              Comment


                              • #30
                                no, each 2 are different plant, 1 control and 1 treatment , no info of transcriptome size (probably only one of them will be sequenced with ref genome - tomato)
                                ------------
                                SMART - bioinfo.uni-plovdiv.bg

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                8 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                8 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                49 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                66 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X