SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Differential gene expression on metatranscriptome data Tka Metagenomics 3 04-16-2013 05:43 AM
RNA-Seq variance-based filter before differential expression analysis adumitri Bioinformatics 4 04-08-2013 03:23 AM
Does cufflink provide the variance of gene/transcript expression? Leo.Zhang Bioinformatics 0 07-04-2012 09:08 PM
Differential gene expression analysis without reference cerebralrust Bioinformatics 7 05-04-2012 02:57 AM
Differential gene expression of gene clusters anjana.vr RNA Sequencing 1 10-28-2010 10:33 AM

Reply
 
Thread Tools
Old 04-29-2013, 07:46 AM   #1
amcloon
Member
 
Location: Germany

Join Date: Sep 2012
Posts: 15
Default differential gene expression and variance issues

I have 5 time-points with 2 biological replicates (collected and prepared on separate days following exactly the same protocol) of bacteria during starvation-induced development. I've done analysis using CLC genome workbench and tophat-cufflinks-cuffdiff (yes, I realize I probably only need bowtie for bacteria, but I figured looking for nonexistent splice junctions would just take computational time and shouldn't change anything).

My problem is this; there are a number of genes that I know are differentially regulated (previously published, validated by me by qPCR) that go up by many fold (one example goes from 50 RPKM to like 4000) but that both programs say are not statistically significantly regulated because there is high variability between replicates.

Instead, the genes that are given as statistically significantly regulated are expressed at very low levels and don't have as much variability or a very high fold up-(or down) regulation (from 20 to 2 RPKM, for example). These seem less likely to be interesting biologically.

So my question is, am I going to be able to get anything statistically valid out of this data, or if there's a lot of variation am I just out of luck? I am sure I could just cherry-pick genes for future work, but that seems like a waste of data.

If I try DESeq, will I just have the same problem in a different format, or might the different ways the programs analyze the data change the way statistics are calculated?

Thanks,
Anna
amcloon is offline   Reply With Quote
Old 04-30-2013, 03:00 AM   #2
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

If you want to know whether DESeq will give you the same answer, you will just have to try.

As for the qPCR validation: Have you only validated that the gene goes up in one replicate, or have you also validated that the variance is low by performing your qPCR on the time points of the second replicate, too?
Simon Anders is offline   Reply With Quote
Old 04-30-2013, 05:15 AM   #3
amcloon
Member
 
Location: Germany

Join Date: Sep 2012
Posts: 15
Default

I didn't do qPCR validation of the second data set, but if I do parallel analyses for each set of replicates (at least in the CLC software) I do see up-regulation of a number of known genes within each replicate set of timepoints. There is a bit of variation in timing, etc. but the genes I expect to go up do go up.


The problem comes when I try to do statistics, then the large variance in levels between the replicates makes the p-values really big for most of my "known" up-regulated genes.
I'm considering whether I need to do some sort of paired comparison, but then I'm not sure if I'll have to do separate analyses for each timepoint, comparing each timepoint to 0hrs, and then if I do that, do I have to make an even more severe significance correction if I'm effectively doing 4 separate tests...I wish I'd taken statistics more recently than 10 years ago.

On a partly unrelated note, the more I look through my data, the more I feel like cufflinks/cuffdiff is just not ideal for bacterial genomes. I feel like it doesn't deal well with the whole "many genes are in operons" issue. Has anyone else had experience with this and did you find something better?

And are there any programs that don't lump sense and antisense transcripts when counting reads mapping to a particular genomic region (also a somewhat bacteria-specific problem, I think)?
amcloon is offline   Reply With Quote
Old 04-30-2013, 05:32 AM   #4
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Yes, when a paired analysis is warranted, it can have much more power than a naive one. Then, you have to use a tool like DESeq, because cuffdiff does not offer functionality for designs that go beyond a two-group comparison.
Simon Anders is offline   Reply With Quote
Old 04-30-2013, 05:44 AM   #5
amcloon
Member
 
Location: Germany

Join Date: Sep 2012
Posts: 15
Default

Thanks, Simon, I'll give DESeq a try.
amcloon is offline   Reply With Quote
Old 05-07-2013, 02:07 AM   #6
Illuminoid
Junior Member
 
Location: Geneva, Switzerland

Join Date: Oct 2011
Posts: 2
Default

Hi amcloon

I am interested in the outcome of your analysis with DEseq, since I have a similar issue with multiple timepoint analysis and variability between samples.

Did you end up using the paired analysis, or staying with single analyses comparing everything to time zero?

Cheers

Sam
Illuminoid is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:49 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO