Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • beki.renberg
    Member
    • Oct 2015
    • 11

    Bacterial RNA-Seq Differential Gene Expression Biological Reps vs Exp Conditions

    We have 6 bacterial samples that we are working with - 2 sets of 3 biological replicates (condition A -1, 2, 3 and condition B 1, 2, 3)- that we performed stranded RNA-Seq on. We are completely new to NGS and are working out pipelines and workflows, so we decided to compare the differential gene expression between the biological replicates in condition A and the differential gene expression between the biological replicates in condition B before we did the differential gene expression analysis between conditions A and B. We are using Rockhopper (http://cs.wellesley.edu/~btjaden/Rockhopper/) to perform these differential gene expression comparisons as it is designed to specifically handle bacterial RNA-Seq data. In theory there would be less differentially expressed genes between the biological replicates than between the conditions, right?

    When we look at the number of differentially expressed genes between biological replicates in condition A (1 vs 2 = 87; 1 vs 3 = 80; 2 vs 3 = 132), there is a much greater number than when we compare conditions A and B (each with 3 biological replicates = 58 genes). This was VERY surprising to us and we are concerned that our data may not be usable.

    Is it normal to have more differentially expressed genes between biological replicates than between experimental conditions? Could batch effect be playing a role here? Anyone have a lot experience with BACTERIAL RNA-Seq that could give me some pointers? Any advice on how to proceed - data usable or not?

    Thank you!
  • blancha
    Senior Member
    • May 2013
    • 367

    #2
    This does not appear to me to be the correct way to analyze your data.

    You're much better off clustering your data, or doing a principal component analysis. If your replicates cluster together per condition, you can conclude that the results are valid. If they don't, you have reasons for concern.

    If you're using the p-value or the adjusted p-value to determine which genes are differentially expressed, the sample size will affect the value computed. Essentially, for individual comparisons, the p-value is nearly useless, since it there is no statistically sound method to get an accurate measure of the variation. So, you cannot compare p-values obtained for one-to-one comparisons, which are essentially useless, to p-values obtained from comparing sets of replicates.

    Using the fold change to determine differentially expressed genes is a more sensible approach for individual comparisons. If using the same cut-off, one would expect a greater number of differentially expressed genes for individual comparisons, given that averaging the counts over the replicates will smooth out the random variations. This is consistent with your results.

    I'm not sure if I'm explaining myself clearly enough. I don't see any reason for alarm in your data. The best way to verify if your results are valid are to do the clustering, either hierarchical clustering or a Principal Component Analysis.

    Incidentally, there is nothing about this that is specific to bacterial RNA-Seq, other than using Rockhopper. Not sure why that tool is so popular with the bacterial community. Other than offering a graphical user interface, I wasn't particularly impressed with it, and went back to my standard pipeline.

    Comment

    • beki.renberg
      Member
      • Oct 2015
      • 11

      #3
      Thank you for the feedback on many different things. I will look into doing a principal component analysis of the data to see how it clusters and will stop doing one-to-one comparisons of the biological replicates.

      We choose to work with Rockhopper because it was specifically designed to deal with bacterial issues (overlapping genes, no introns, etc.). We thought that it was probably best to use a tool designed to do bacterial alignments as opposed to using a tool that was designed for eukaryotic work and "adapted" to work for bacteria. Maybe we are not correct in our thinking. We also plan to do differential gene expression using EDGEpro - another program designed to work with bacterial data - and then compare the DE genes identified by Rockhopper and EDGEpro.

      If you don't mind sharing, what is your "standard" pipeline"?

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        @beki: Is the ultimate aim to compare condition A vs condition B?

        Comment

        • beki.renberg
          Member
          • Oct 2015
          • 11

          #5
          @GenoMaz - Yes, the ultimate aim is to compare conditions A and B (and eventually C and D).

          We did the comparisons between the biological replicates within condition A to just see how much variation there was between them, but as blancha pointed out, the way we did it was not correct and we should probably do a principle components analysis.
          Last edited by beki.renberg; 11-23-2015, 11:13 AM. Reason: clarification

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            Perhaps you should just go ahead and analyze the replicates together for DE as A vs B. What you are describing is only going to be useful for QC (making sure there are no sample swaps etc).

            Comment

            • blancha
              Senior Member
              • May 2013
              • 367

              #7
              Yes, I did try both Rockhopper and EDGE-Pro for an E. coli RNA-Seq experiment.

              I ended up just using bowtie2, featureCounts, and DESeq.
              I remember coming to the conclusion that the overlapping genes were not going to be a significant issue after looking at the annotation file for E. coli in IGV.

              Relative to my eukaryote RNA-seq pipeline, I just switched from TopHat, a splice-junction aligner, to bowtie2, which does not take into account splicing. I only do the occasional bacterial analysis, and mostly focus on eukaryotes, so perhaps I am not the greatest expert on the issue.

              Amongst other issues with Rockhopper, I got frustrated with the format of the SAM files outputted being incomprehensible.
              Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

              I feel more comfortable using tools that I fully understand, rather than "black box" tools.
              I want to be able to view the all the details of the alignments by loading the BAM files in IGV. If a researcher comes back to me questioning my results for a given gene, I want to be able to view all the details in IGV.

              With DESeq2, I can also do multi-factorial analyses if needed.

              Comment

              • beki.renberg
                Member
                • Oct 2015
                • 11

                #8
                @ blancha - Thank you for the information and insight.

                Comment

                Latest Articles

                Collapse

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, Today, 06:09 AM
                0 responses
                15 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-09-2026, 11:58 AM
                0 responses
                34 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-05-2026, 10:09 AM
                0 responses
                39 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-04-2026, 08:59 AM
                0 responses
                47 views
                0 reactions
                Last Post SEQadmin2  
                Working...