Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • amcloon
    Member
    • Sep 2012
    • 15

    differential gene expression and variance issues

    I have 5 time-points with 2 biological replicates (collected and prepared on separate days following exactly the same protocol) of bacteria during starvation-induced development. I've done analysis using CLC genome workbench and tophat-cufflinks-cuffdiff (yes, I realize I probably only need bowtie for bacteria, but I figured looking for nonexistent splice junctions would just take computational time and shouldn't change anything).

    My problem is this; there are a number of genes that I know are differentially regulated (previously published, validated by me by qPCR) that go up by many fold (one example goes from 50 RPKM to like 4000) but that both programs say are not statistically significantly regulated because there is high variability between replicates.

    Instead, the genes that are given as statistically significantly regulated are expressed at very low levels and don't have as much variability or a very high fold up-(or down) regulation (from 20 to 2 RPKM, for example). These seem less likely to be interesting biologically.

    So my question is, am I going to be able to get anything statistically valid out of this data, or if there's a lot of variation am I just out of luck? I am sure I could just cherry-pick genes for future work, but that seems like a waste of data.

    If I try DESeq, will I just have the same problem in a different format, or might the different ways the programs analyze the data change the way statistics are calculated?

    Thanks,
    Anna
  • Simon Anders
    Senior Member
    • Feb 2010
    • 995

    #2
    If you want to know whether DESeq will give you the same answer, you will just have to try.

    As for the qPCR validation: Have you only validated that the gene goes up in one replicate, or have you also validated that the variance is low by performing your qPCR on the time points of the second replicate, too?

    Comment

    • amcloon
      Member
      • Sep 2012
      • 15

      #3
      I didn't do qPCR validation of the second data set, but if I do parallel analyses for each set of replicates (at least in the CLC software) I do see up-regulation of a number of known genes within each replicate set of timepoints. There is a bit of variation in timing, etc. but the genes I expect to go up do go up.


      The problem comes when I try to do statistics, then the large variance in levels between the replicates makes the p-values really big for most of my "known" up-regulated genes.
      I'm considering whether I need to do some sort of paired comparison, but then I'm not sure if I'll have to do separate analyses for each timepoint, comparing each timepoint to 0hrs, and then if I do that, do I have to make an even more severe significance correction if I'm effectively doing 4 separate tests...I wish I'd taken statistics more recently than 10 years ago.

      On a partly unrelated note, the more I look through my data, the more I feel like cufflinks/cuffdiff is just not ideal for bacterial genomes. I feel like it doesn't deal well with the whole "many genes are in operons" issue. Has anyone else had experience with this and did you find something better?

      And are there any programs that don't lump sense and antisense transcripts when counting reads mapping to a particular genomic region (also a somewhat bacteria-specific problem, I think)?

      Comment

      • Simon Anders
        Senior Member
        • Feb 2010
        • 995

        #4
        Yes, when a paired analysis is warranted, it can have much more power than a naive one. Then, you have to use a tool like DESeq, because cuffdiff does not offer functionality for designs that go beyond a two-group comparison.

        Comment

        • amcloon
          Member
          • Sep 2012
          • 15

          #5
          Thanks, Simon, I'll give DESeq a try.

          Comment

          • Illuminoid
            Junior Member
            • Oct 2011
            • 2

            #6
            Hi amcloon

            I am interested in the outcome of your analysis with DEseq, since I have a similar issue with multiple timepoint analysis and variability between samples.

            Did you end up using the paired analysis, or staying with single analyses comparing everything to time zero?

            Cheers

            Sam

            Comment

            Latest Articles

            Collapse

            • GATTACAT
              Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by GATTACAT
              Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
              07-01-2026, 11:43 AM
            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, Yesterday, 11:08 AM
            0 responses
            6 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-30-2026, 05:37 AM
            0 responses
            11 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-26-2026, 11:10 AM
            0 responses
            19 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            53 views
            0 reactions
            Last Post SEQadmin2  
            Working...