SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Biological replicates for RNA-seq vpp605 RNA Sequencing 15 08-29-2014 04:30 AM
Lots of unmapped reads - SOLiD bacterial RNA-seq and bowtie mapping Jean RNA Sequencing 10 01-17-2013 11:11 AM
RNA-Seq on MiSeq wingtec General 5 02-13-2012 01:12 PM
RNA-Seq: Ribosomal RNA Depletion for Massively Parallel Bacterial RNA-Sequencing Appl Newsbot! Literature Watch 0 03-25-2011 02:00 AM
RNA-Seq: Studying bacterial transcriptomes using RNA-seq. Newsbot! Literature Watch 0 10-05-2010 02:00 AM

Reply
 
Thread Tools
Old 02-29-2012, 04:36 AM   #1
turnersd
Senior Member
 
Location: Charlottesville, VA

Join Date: May 2011
Posts: 112
Default How many reads/replicates do I need for bacterial RNA-seq? MiSeq?

I'm aware of the ENCODE best practices and other recent research that give guidelines about the number of reads you need for RNA-seq in mammalian genomes. I generally recommend ~40-50M/sample for most applications, as low as 20M if the goal is just expression at the gene level, 100M+ if the goal is rare/aberrant isoform identification. I'm taking on a bacterial RNA-seq project where the goal is to identify differentially expressed genes and isoforms in a WT vs mutant strain of pathogenic F. tularensis. While there isn't splicing, I'm coming to appreciate that the prokaryotic transcriptome is still complex - overlapping genes, strand specificity, sRNAs, etc.

1. How many reads do I need? What length? Is paired end sequencing as necessary as with complex (spliced) mammalian genomes? A recent paper gave guidelines about another bacteria, P. syringae. With 3.5 million prefiltered reads they were able to cover 95% of the annotated genes with at least 10 reads (average 190). P. syringae has a larger genome and about 3 times as many annotated ORFs as our bacteria, F. tularensis. So can I get away with fewer reads, say, 2 million before any filtering?

2. If I'm about right on #1 above, needing ~2M reads/sample, and I want to sequence, say, 2-4 samples from each condition (WT vs Mut), what's my best choice for platform? Will MiSeq have the capacity to do this on a single flowcell, or should I use a single lane on our GAIIx?

3. What counts as a biological replicate in this case? I would imagine taking aliquots from the same flask would be more like technical replication, and taking two different flasks grown from two different colonies to be biological replicates. Am I thinking about this correctly?
turnersd is offline   Reply With Quote
Old 03-01-2012, 04:18 AM   #2
bioBob
Member
 
Location: Virginia

Join Date: Mar 2011
Posts: 72
Default

Hi Turner,

Regarding the replicates, you are thinking of that correctly. The important part to replicates is to replicate around your largest source of experimental variation which is usually (not always) biological. For the comment on 2-4, I would change that to 3-5. 2 imo is never an option and really is no better than 1.

For read length and paired vs single, there are a few publications out there now that state that short single is sufficient. The RSEM paper describes this as well. We did a little study where we had 101 PE data from mouse and in silico created a set of data sets that ranged from 36 cycle SE, 36 cycle PE, up to the full data set including partial read subsets to explore multiplexing possibilities. We looked at our sensitivity to splice variants and detection of known transcript d/dx. What we found was that somewhere between 50 and 76 cycle SE was the optimum which includes a little personal bias towards longer reads. The multiplexing question is a bit more ambiguous so we really don't (yet? not sure) have a good handle on that. What we have been telling people is that if you have to choose between long and more, choose more.

On the MiSeq vs GA, for the MiSeq, you will be doing 2-3 at a time for 2-3M reads per replicate while if Yongde has a good run, you should be able to do all 6 (thinking triplicates) in one go and get 2-3M+ per replicate. Tell your core you want >30M reads.

Good luck.
GO CAVS!

Last edited by bioBob; 03-01-2012 at 04:20 AM.
bioBob is offline   Reply With Quote
Old 03-01-2012, 05:50 AM   #3
protist
Senior Member
 
Location: Ireland

Join Date: Jan 2009
Posts: 101
Default

In your 2 million reads you have to take into account whether the original RNA has been rRNA depleted or not. If your libraries are from total RNA and there was no ribosomal RNA depletion you will not get sufficient mRNA coverage in 2 million reads.

I agree PE is not required for all our bacterial libraries we find 42 cycles to be sufficient.

With regard to the biorep question, you are correct sampling from the same flask constitutes a technical replicate not a biological one.

Best of luck.
protist is offline   Reply With Quote
Old 03-08-2012, 09:25 AM   #4
polyatail
Member
 
Location: New York, NY

Join Date: Dec 2010
Posts: 25
Default

You might get something out of this paper, and its supplemental:

http://www.ncbi.nlm.nih.gov/pubmed/20444704

They did a lot of trial and error to find the best way to do bacterial RNA-Seq on in vivo populations. Interestingly, they did a rarefaction analysis and found that above 300,000 reads aligning to mRNA, not much more information is gained.
polyatail is offline   Reply With Quote
Reply

Tags
bacteria, coverage, miseq, rna-seq, transcriptome

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:07 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO