SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Statistical treatment of biological reps from pooled samples danwiththeplan RNA Sequencing 7 10-23-2013 08:28 PM
Expression quantification/differential expression gene analysis by RNA-Seq chenjy Bioinformatics 12 08-02-2013 03:06 AM
Differential gene expression: Can Cufflinks/Cuffcompare handle biological replicates? marcora Bioinformatics 38 12-14-2010 03:57 PM
Differential gene expression: Can Cufflinks/Cuffcompare handle biological replicates? marcora Bioinformatics 0 05-19-2010 01:11 AM

Reply
 
Thread Tools
Old 11-20-2015, 09:55 AM   #1
beki.renberg
Member
 
Location: MD

Join Date: Oct 2015
Posts: 11
Question Bacterial RNA-Seq Differential Gene Expression Biological Reps vs Exp Conditions

We have 6 bacterial samples that we are working with - 2 sets of 3 biological replicates (condition A -1, 2, 3 and condition B 1, 2, 3)- that we performed stranded RNA-Seq on. We are completely new to NGS and are working out pipelines and workflows, so we decided to compare the differential gene expression between the biological replicates in condition A and the differential gene expression between the biological replicates in condition B before we did the differential gene expression analysis between conditions A and B. We are using Rockhopper (http://cs.wellesley.edu/~btjaden/Rockhopper/) to perform these differential gene expression comparisons as it is designed to specifically handle bacterial RNA-Seq data. In theory there would be less differentially expressed genes between the biological replicates than between the conditions, right?

When we look at the number of differentially expressed genes between biological replicates in condition A (1 vs 2 = 87; 1 vs 3 = 80; 2 vs 3 = 132), there is a much greater number than when we compare conditions A and B (each with 3 biological replicates = 58 genes). This was VERY surprising to us and we are concerned that our data may not be usable.

Is it normal to have more differentially expressed genes between biological replicates than between experimental conditions? Could batch effect be playing a role here? Anyone have a lot experience with BACTERIAL RNA-Seq that could give me some pointers? Any advice on how to proceed - data usable or not?

Thank you!
beki.renberg is offline   Reply With Quote
Old 11-20-2015, 11:14 AM   #2
blancha
Senior Member
 
Location: Montreal

Join Date: May 2013
Posts: 367
Default

This does not appear to me to be the correct way to analyze your data.

You're much better off clustering your data, or doing a principal component analysis. If your replicates cluster together per condition, you can conclude that the results are valid. If they don't, you have reasons for concern.

If you're using the p-value or the adjusted p-value to determine which genes are differentially expressed, the sample size will affect the value computed. Essentially, for individual comparisons, the p-value is nearly useless, since it there is no statistically sound method to get an accurate measure of the variation. So, you cannot compare p-values obtained for one-to-one comparisons, which are essentially useless, to p-values obtained from comparing sets of replicates.

Using the fold change to determine differentially expressed genes is a more sensible approach for individual comparisons. If using the same cut-off, one would expect a greater number of differentially expressed genes for individual comparisons, given that averaging the counts over the replicates will smooth out the random variations. This is consistent with your results.

I'm not sure if I'm explaining myself clearly enough. I don't see any reason for alarm in your data. The best way to verify if your results are valid are to do the clustering, either hierarchical clustering or a Principal Component Analysis.

Incidentally, there is nothing about this that is specific to bacterial RNA-Seq, other than using Rockhopper. Not sure why that tool is so popular with the bacterial community. Other than offering a graphical user interface, I wasn't particularly impressed with it, and went back to my standard pipeline.
blancha is offline   Reply With Quote
Old 11-23-2015, 09:52 AM   #3
beki.renberg
Member
 
Location: MD

Join Date: Oct 2015
Posts: 11
Default

Thank you for the feedback on many different things. I will look into doing a principal component analysis of the data to see how it clusters and will stop doing one-to-one comparisons of the biological replicates.

We choose to work with Rockhopper because it was specifically designed to deal with bacterial issues (overlapping genes, no introns, etc.). We thought that it was probably best to use a tool designed to do bacterial alignments as opposed to using a tool that was designed for eukaryotic work and "adapted" to work for bacteria. Maybe we are not correct in our thinking. We also plan to do differential gene expression using EDGEpro - another program designed to work with bacterial data - and then compare the DE genes identified by Rockhopper and EDGEpro.

If you don't mind sharing, what is your "standard" pipeline"?
beki.renberg is offline   Reply With Quote
Old 11-23-2015, 09:58 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

@beki: Is the ultimate aim to compare condition A vs condition B?
GenoMax is offline   Reply With Quote
Old 11-23-2015, 10:09 AM   #5
beki.renberg
Member
 
Location: MD

Join Date: Oct 2015
Posts: 11
Default

@GenoMaz - Yes, the ultimate aim is to compare conditions A and B (and eventually C and D).

We did the comparisons between the biological replicates within condition A to just see how much variation there was between them, but as blancha pointed out, the way we did it was not correct and we should probably do a principle components analysis.

Last edited by beki.renberg; 11-23-2015 at 10:13 AM. Reason: clarification
beki.renberg is offline   Reply With Quote
Old 11-23-2015, 10:12 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Perhaps you should just go ahead and analyze the replicates together for DE as A vs B. What you are describing is only going to be useful for QC (making sure there are no sample swaps etc).
GenoMax is offline   Reply With Quote
Old 11-23-2015, 10:37 AM   #7
blancha
Senior Member
 
Location: Montreal

Join Date: May 2013
Posts: 367
Default

Yes, I did try both Rockhopper and EDGE-Pro for an E. coli RNA-Seq experiment.

I ended up just using bowtie2, featureCounts, and DESeq.
I remember coming to the conclusion that the overlapping genes were not going to be a significant issue after looking at the annotation file for E. coli in IGV.

Relative to my eukaryote RNA-seq pipeline, I just switched from TopHat, a splice-junction aligner, to bowtie2, which does not take into account splicing. I only do the occasional bacterial analysis, and mostly focus on eukaryotes, so perhaps I am not the greatest expert on the issue.

Amongst other issues with Rockhopper, I got frustrated with the format of the SAM files outputted being incomprehensible.
http://seqanswers.com/forums/showthr...ght=rockhopper
I feel more comfortable using tools that I fully understand, rather than "black box" tools.
I want to be able to view the all the details of the alignments by loading the BAM files in IGV. If a researcher comes back to me questioning my results for a given gene, I want to be able to view all the details in IGV.

With DESeq2, I can also do multi-factorial analyses if needed.
blancha is offline   Reply With Quote
Old 11-24-2015, 09:32 AM   #8
beki.renberg
Member
 
Location: MD

Join Date: Oct 2015
Posts: 11
Default

@ blancha - Thank you for the information and insight.
beki.renberg is offline   Reply With Quote
Reply

Tags
bacterial rna seq, biological replicates, differential expression

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:47 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO