Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bacterial RNA-Seq Differential Gene Expression Biological Reps vs Exp Conditions

    We have 6 bacterial samples that we are working with - 2 sets of 3 biological replicates (condition A -1, 2, 3 and condition B 1, 2, 3)- that we performed stranded RNA-Seq on. We are completely new to NGS and are working out pipelines and workflows, so we decided to compare the differential gene expression between the biological replicates in condition A and the differential gene expression between the biological replicates in condition B before we did the differential gene expression analysis between conditions A and B. We are using Rockhopper (http://cs.wellesley.edu/~btjaden/Rockhopper/) to perform these differential gene expression comparisons as it is designed to specifically handle bacterial RNA-Seq data. In theory there would be less differentially expressed genes between the biological replicates than between the conditions, right?

    When we look at the number of differentially expressed genes between biological replicates in condition A (1 vs 2 = 87; 1 vs 3 = 80; 2 vs 3 = 132), there is a much greater number than when we compare conditions A and B (each with 3 biological replicates = 58 genes). This was VERY surprising to us and we are concerned that our data may not be usable.

    Is it normal to have more differentially expressed genes between biological replicates than between experimental conditions? Could batch effect be playing a role here? Anyone have a lot experience with BACTERIAL RNA-Seq that could give me some pointers? Any advice on how to proceed - data usable or not?

    Thank you!

  • #2
    This does not appear to me to be the correct way to analyze your data.

    You're much better off clustering your data, or doing a principal component analysis. If your replicates cluster together per condition, you can conclude that the results are valid. If they don't, you have reasons for concern.

    If you're using the p-value or the adjusted p-value to determine which genes are differentially expressed, the sample size will affect the value computed. Essentially, for individual comparisons, the p-value is nearly useless, since it there is no statistically sound method to get an accurate measure of the variation. So, you cannot compare p-values obtained for one-to-one comparisons, which are essentially useless, to p-values obtained from comparing sets of replicates.

    Using the fold change to determine differentially expressed genes is a more sensible approach for individual comparisons. If using the same cut-off, one would expect a greater number of differentially expressed genes for individual comparisons, given that averaging the counts over the replicates will smooth out the random variations. This is consistent with your results.

    I'm not sure if I'm explaining myself clearly enough. I don't see any reason for alarm in your data. The best way to verify if your results are valid are to do the clustering, either hierarchical clustering or a Principal Component Analysis.

    Incidentally, there is nothing about this that is specific to bacterial RNA-Seq, other than using Rockhopper. Not sure why that tool is so popular with the bacterial community. Other than offering a graphical user interface, I wasn't particularly impressed with it, and went back to my standard pipeline.

    Comment


    • #3
      Thank you for the feedback on many different things. I will look into doing a principal component analysis of the data to see how it clusters and will stop doing one-to-one comparisons of the biological replicates.

      We choose to work with Rockhopper because it was specifically designed to deal with bacterial issues (overlapping genes, no introns, etc.). We thought that it was probably best to use a tool designed to do bacterial alignments as opposed to using a tool that was designed for eukaryotic work and "adapted" to work for bacteria. Maybe we are not correct in our thinking. We also plan to do differential gene expression using EDGEpro - another program designed to work with bacterial data - and then compare the DE genes identified by Rockhopper and EDGEpro.

      If you don't mind sharing, what is your "standard" pipeline"?

      Comment


      • #4
        @beki: Is the ultimate aim to compare condition A vs condition B?

        Comment


        • #5
          @GenoMaz - Yes, the ultimate aim is to compare conditions A and B (and eventually C and D).

          We did the comparisons between the biological replicates within condition A to just see how much variation there was between them, but as blancha pointed out, the way we did it was not correct and we should probably do a principle components analysis.
          Last edited by beki.renberg; 11-23-2015, 11:13 AM. Reason: clarification

          Comment


          • #6
            Perhaps you should just go ahead and analyze the replicates together for DE as A vs B. What you are describing is only going to be useful for QC (making sure there are no sample swaps etc).

            Comment


            • #7
              Yes, I did try both Rockhopper and EDGE-Pro for an E. coli RNA-Seq experiment.

              I ended up just using bowtie2, featureCounts, and DESeq.
              I remember coming to the conclusion that the overlapping genes were not going to be a significant issue after looking at the annotation file for E. coli in IGV.

              Relative to my eukaryote RNA-seq pipeline, I just switched from TopHat, a splice-junction aligner, to bowtie2, which does not take into account splicing. I only do the occasional bacterial analysis, and mostly focus on eukaryotes, so perhaps I am not the greatest expert on the issue.

              Amongst other issues with Rockhopper, I got frustrated with the format of the SAM files outputted being incomprehensible.
              Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

              I feel more comfortable using tools that I fully understand, rather than "black box" tools.
              I want to be able to view the all the details of the alignments by loading the BAM files in IGV. If a researcher comes back to me questioning my results for a given gene, I want to be able to view all the details in IGV.

              With DESeq2, I can also do multi-factorial analyses if needed.

              Comment


              • #8
                @ blancha - Thank you for the information and insight.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                8 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                8 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                66 views
                0 likes
                Last Post seqadmin  
                Working...
                X