I'm aware of the ENCODE best practices and other recent research that give guidelines about the number of reads you need for RNA-seq in mammalian genomes. I generally recommend ~40-50M/sample for most applications, as low as 20M if the goal is just expression at the gene level, 100M+ if the goal is rare/aberrant isoform identification. I'm taking on a bacterial RNA-seq project where the goal is to identify differentially expressed genes and isoforms in a WT vs mutant strain of pathogenic F. tularensis. While there isn't splicing, I'm coming to appreciate that the prokaryotic transcriptome is still complex - overlapping genes, strand specificity, sRNAs, etc.
1. How many reads do I need? What length? Is paired end sequencing as necessary as with complex (spliced) mammalian genomes? A recent paper gave guidelines about another bacteria, P. syringae. With 3.5 million prefiltered reads they were able to cover 95% of the annotated genes with at least 10 reads (average 190). P. syringae has a larger genome and about 3 times as many annotated ORFs as our bacteria, F. tularensis. So can I get away with fewer reads, say, 2 million before any filtering?
2. If I'm about right on #1 above, needing ~2M reads/sample, and I want to sequence, say, 2-4 samples from each condition (WT vs Mut), what's my best choice for platform? Will MiSeq have the capacity to do this on a single flowcell, or should I use a single lane on our GAIIx?
3. What counts as a biological replicate in this case? I would imagine taking aliquots from the same flask would be more like technical replication, and taking two different flasks grown from two different colonies to be biological replicates. Am I thinking about this correctly?
1. How many reads do I need? What length? Is paired end sequencing as necessary as with complex (spliced) mammalian genomes? A recent paper gave guidelines about another bacteria, P. syringae. With 3.5 million prefiltered reads they were able to cover 95% of the annotated genes with at least 10 reads (average 190). P. syringae has a larger genome and about 3 times as many annotated ORFs as our bacteria, F. tularensis. So can I get away with fewer reads, say, 2 million before any filtering?
2. If I'm about right on #1 above, needing ~2M reads/sample, and I want to sequence, say, 2-4 samples from each condition (WT vs Mut), what's my best choice for platform? Will MiSeq have the capacity to do this on a single flowcell, or should I use a single lane on our GAIIx?
3. What counts as a biological replicate in this case? I would imagine taking aliquots from the same flask would be more like technical replication, and taking two different flasks grown from two different colonies to be biological replicates. Am I thinking about this correctly?
Comment