Unconfigured Ad

**blancha** · 11-20-2015, 12:14 PM

This does not appear to me to be the correct way to analyze your data.

You're much better off clustering your data, or doing a principal component analysis. If your replicates cluster together per condition, you can conclude that the results are valid. If they don't, you have reasons for concern.

If you're using the p-value or the adjusted p-value to determine which genes are differentially expressed, the sample size will affect the value computed. Essentially, for individual comparisons, the p-value is nearly useless, since it there is no statistically sound method to get an accurate measure of the variation. So, you cannot compare p-values obtained for one-to-one comparisons, which are essentially useless, to p-values obtained from comparing sets of replicates.

Using the fold change to determine differentially expressed genes is a more sensible approach for individual comparisons. If using the same cut-off, one would expect a greater number of differentially expressed genes for individual comparisons, given that averaging the counts over the replicates will smooth out the random variations. This is consistent with your results.

I'm not sure if I'm explaining myself clearly enough. I don't see any reason for alarm in your data. The best way to verify if your results are valid are to do the clustering, either hierarchical clustering or a Principal Component Analysis.

Incidentally, there is nothing about this that is specific to bacterial RNA-Seq, other than using Rockhopper. Not sure why that tool is so popular with the bacterial community. Other than offering a graphical user interface, I wasn't particularly impressed with it, and went back to my standard pipeline.

**beki.renberg** · 11-23-2015, 10:52 AM

Thank you for the feedback on many different things. I will look into doing a principal component analysis of the data to see how it clusters and will stop doing one-to-one comparisons of the biological replicates.

We choose to work with Rockhopper because it was specifically designed to deal with bacterial issues (overlapping genes, no introns, etc.). We thought that it was probably best to use a tool designed to do bacterial alignments as opposed to using a tool that was designed for eukaryotic work and "adapted" to work for bacteria. Maybe we are not correct in our thinking. We also plan to do differential gene expression using EDGEpro - another program designed to work with bacterial data - and then compare the DE genes identified by Rockhopper and EDGEpro.

If you don't mind sharing, what is your "standard" pipeline"?

**GenoMax** · 11-23-2015, 10:58 AM

@beki: Is the ultimate aim to compare condition A vs condition B?

**beki.renberg** · 11-23-2015, 11:09 AM

@GenoMaz - Yes, the ultimate aim is to compare conditions A and B (and eventually C and D).

We did the comparisons between the biological replicates within condition A to just see how much variation there was between them, but as blancha pointed out, the way we did it was not correct and we should probably do a principle components analysis.

**GenoMax** · 11-23-2015, 11:12 AM

Perhaps you should just go ahead and analyze the replicates together for DE as A vs B. What you are describing is only going to be useful for QC (making sure there are no sample swaps etc).

**blancha** · 11-23-2015, 11:37 AM

Yes, I did try both Rockhopper and EDGE-Pro for an E. coli RNA-Seq experiment.

I ended up just using bowtie2, featureCounts, and DESeq.
I remember coming to the conclusion that the overlapping genes were not going to be a significant issue after looking at the annotation file for E. coli in IGV.

Relative to my eukaryote RNA-seq pipeline, I just switched from TopHat, a splice-junction aligner, to bowtie2, which does not take into account splicing. I only do the occasional bacterial analysis, and mostly focus on eukaryotes, so perhaps I am not the greatest expert on the issue.

Amongst other issues with Rockhopper, I got frustrated with the format of the SAM files outputted being incomprehensible.

Unfamiliar SAM file format outputted by Rockhopper program - SEQanswers

http://seqanswers.com/forums/showthread.php?t=61137&highlight=rockhopper

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

I feel more comfortable using tools that I fully understand, rather than "black box" tools.
I want to be able to view the all the details of the alignments by loading the BAM files in IGV. If a researcher comes back to me questioning my results for a given gene, I want to be able to view all the details in IGV.

With DESeq2, I can also do multi-factorial analyses if needed.

**beki.renberg** · 11-24-2015, 10:32 AM

@ blancha - Thank you for the information and insight.

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, Today, 06:09 AM	0 responses 15 views 0 reactions	Last Post by SEQadmin2 Today, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 34 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 39 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 47 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

Bacterial RNA-Seq Differential Gene Expression Biological Reps vs Exp Conditions

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News