SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
RNA-Seq Experimental Design dbroh11 Introductions 3 01-31-2014 07:32 AM
Questions about RNA-seq experiment design tineax RNA Sequencing 4 10-23-2013 06:19 AM
CASIM: RNA-Seq Factors in experimental design casim UK - Cambridge 0 04-18-2013 12:15 PM
seeking pool-seq experimental design advice bluesquid Genomic Resequencing 0 08-07-2012 04:15 AM

Reply
 
Thread Tools
Old 01-31-2014, 07:19 AM   #1
dbroh11
Junior Member
 
Location: Virginia, USA

Join Date: Jan 2014
Posts: 8
Default RNA-Seq Experimental Design Questions

Hello,

My name is David Brohawn and I am new to RNA-Seq.

My advisor and I are interested in doing an RNA-Seq experiment to compare the transcriptomes of iPSC neurons we generate from both ALS patients and controls. Ultimately we would like to identify molecular phenotypes based on transcriptome expression profiles for different instances of ALS (much like how cancer researchers now identify underlying molecular phenotypes for different instances of a given cancer).

We are primarily interested in generating transcriptome profiles (involving both coding and non-coding RNA and novel transcripts), with a heavy interest in differential gene expression and less interest in mapping full transcript isoforms.

As I understand it, a greater number of small reads is best to assess differential gene expression (Solid and Illumina look most amenable to this), while a smaller number of long reads is best to assess isoforms (Roche and PacBio look most amenable to this).

I see the ENCODE project recommends “Experiments whose purpose is discovery of novel transcribed elements and strong quantification of known transcript isoforms… a minimum depth of 100-200 M 2 x 76 bp or longer reads is currently recommended.”

We plan on using Illumina Truseq total RNA prep kits followed by sequencing on the Illumina HiSeq 2500. An Illumina rep quoted 187 million reads per lane as typical output for a 2X100 run. If this is true, I am thinking we multiplex our 20 total samples (10 cases and controls) and run 11 total lanes which would average out to just over 100 million reads per sample.

We would then analyze the data with the Tuxedo Suite bioinformatics package (we may substitute STAR for Tophat and Bowtie), and visualize our data using CummeRbund.

We are considering purchasing a LINUX based machine or a Mac with these specs for processing:

CPU – 2 quad core processors
HDD 8 TB – RAID assembly of 4 2-TB drives
RAM – 24 GB of RAM
GHz – 3.2 GHz

I have been told the number of reads per sample may be overkill given our goals, but I am really following ENCODEs recommendations. Do you all have any suggestions based on what I have reported?

Thanks for taking the time to read and respond!

Dave Brohawn
dbroh11 is offline   Reply With Quote
Old 01-31-2014, 07:25 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,060
Default

Have a look at this paper: http://seqanswers.com/forums/showthread.php?t=40365
GenoMax is offline   Reply With Quote
Old 01-31-2014, 07:33 AM   #3
TonyBrooks
Senior Member
 
Location: London

Join Date: Jun 2009
Posts: 298
Default

http://core-genomics.blogspot.co.uk/...ions-need.html

You could run all 20 of your samples across 2 lanes and get somewhere approaching 20m reads per sample. This should be more than adequate for differential expression analysis.
TonyBrooks is offline   Reply With Quote
Old 01-31-2014, 10:45 AM   #4
dbroh11
Junior Member
 
Location: Virginia, USA

Join Date: Jan 2014
Posts: 8
Default

Hey Guys,

It looks like for a run of the mill differential gene expression analysis, 20-30 M reads is more than sufficient based on your response, Tony, and the paper that GenoMax kindly supplied.

While we are most interested in differential gene expression, we still want to have a thorough representation of the transcriptome for both control and disease groups including novel transcripts. We aren't overly concerned with the ability to capture transcripts expressed at very low levels. Does 20-30 M still sound like a safe bet given these additional points?

Further, while I understand the use of short reads is more amenable to differential gene expression analysis than it is for isoform detection or mapping, I would like to optimize our short read study design in a way that most benefits the Tuxedo Suite software algorithms in probabilistically guessing what isoforms we have present. This led me to choose the Illumina platform over Solid (100 bp reads over 35 bp reads), and paired end instead of single end reads to aid in alignment efforts. Does my rationale and this aspect of the study design sound appropriate for my goal?

I appreciate your helping a newbie

Dave
dbroh11 is offline   Reply With Quote
Old 01-31-2014, 11:56 AM   #5
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,060
Default

Quote:
Originally Posted by TonyBrooks View Post
http://core-genomics.blogspot.co.uk/...ions-need.html

You could run all 20 of your samples across 2 lanes and get somewhere approaching 20m reads per sample. This should be more than adequate for differential expression analysis.
@Tony: Can you correct this URL? It does not seem to be pointing to a specific link.

Last edited by GenoMax; 01-31-2014 at 12:01 PM.
GenoMax is offline   Reply With Quote
Old 01-31-2014, 12:12 PM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,060
Default

Quote:
Originally Posted by dbroh11 View Post
This led me to choose the Illumina platform over Solid (100 bp reads over 35 bp reads), and paired end instead of single end reads to aid in alignment efforts. Does my rationale and this aspect of the study design sound appropriate for my goal?

I appreciate your helping a newbie

Dave
Sequencing more reads is not going to hurt but what the general consensus is that you do not want to go overboard (i.e. 100 million) since that is a case of diminishing returns.

There has been past discussion on benefits of single-end and paired-end reads but nothing that is of recent vintage. Here are a couple of links to peruse.

http://seqanswers.com/forums/showthread.php?t=13474
http://seqanswers.com/forums/showthread.php?t=9116
GenoMax is offline   Reply With Quote
Old 02-03-2014, 08:33 AM   #7
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Our sequencing center most often aims for 30M reads per sample for rnaSeq projects. However balancing the samples to get 30M each is troublesome. The way we do this is to do one (partial) sequencing run that undershoots 30M and then re-cluster the samples so that the next run will combined with the first run in order to bring up the per-sample reads to 30M. If you were going to do a 'one-shot' sequencing run then you will have to aim for around 50M reads in order to have at least 25M reads per sample. I'll agree that aiming for 100M reads is overkill.
westerman is offline   Reply With Quote
Old 02-03-2014, 09:02 AM   #8
dbroh11
Junior Member
 
Location: Virginia, USA

Join Date: Jan 2014
Posts: 8
Default

I appreciate your guys help with this - Do you have literature aside from the paper GenoMax sent that supports using far less than 100M reads (what Encode proposed?) I understand ENCODE is not the end all be all and their recommendations are several years old, but would like to better understand rationale/see more empirical data suggesting 50M is sufficient prior to committing funds to the project.

In addition, do you all know of any literature out there showing the use of 100 bp reads over 35 bp reads (Illumina vs SOLiD) truly benefits Cufflink's estimation of the prevalence of different isoforms? We are most interested in differential gene expression so I have narrowed our design down to using shorter reads, but am still mulling over the pros and cons of these two platforms.

Many thanks

Dave
dbroh11 is offline   Reply With Quote
Old 02-03-2014, 11:43 AM   #9
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

If you want the best isoform detection capability and have lots of money, long paired-end reads on illumina are the best option. Note that with 250bp reads and a 400bp fragment length, you should be able to get 400bp of continuous sequencing for most reads, with overlap (for consistency checks) around 50bp. We've found that 30Mish reads (i.e. 10M~100M) are fine for hypothesis-generating analysis, so go ahead and multiplex if you've got more than that.

The longer the sequence, the more chance you have of catching multiple splice points in a single read. If you don't do this you have to guess at possible isoforms based on frequency counts.

Last edited by gringer; 02-03-2014 at 11:46 AM.
gringer is offline   Reply With Quote
Old 02-04-2014, 07:36 AM   #10
AllSeq
Registered Vendor
 
Location: San Diego, CA

Join Date: Oct 2013
Posts: 138
Default

Quote:
Originally Posted by gringer View Post
If you want the best isoform detection capability and have lots of money, long paired-end reads on illumina are the best option.
If you want the best isoform detection capability and have an INSANE amount of money, PacBio runs with a few different size selections would be the best option.
__________________
AllSeq - The Sequencing Marketplace
info@AllSeq.com
www.AllSeq.com
AllSeq is offline   Reply With Quote
Old 02-04-2014, 09:05 AM   #11
mukeshwar
Junior Member
 
Location: India

Join Date: Apr 2013
Posts: 4
Default

Hi,

I am using TruSeq RNA sample prep kit v2 for WTA library. I started with the 6 ug of total RNA followed by Elute-prime fragment for 2 mins, 1st strand cDNA and then 2nd strand cDNA synthesis and got the following qubit readings

Elute primer fragment (RNA BR Assay): 15.6 ng/ul
dsCDNA synthesis (DNA dsHS assay)_before 1.8x bead purification: 0.312 ng/ul
dsCDNA synthesis (DNA dsHS assay)_after 1.8x bead purification: 0.225 ng/ul

On the basis of qubit reading i wanted to know that

>is it enough concentration of dscDNA or i am loosing the dscDNA amount? i didn't check the dscDNA profile on HS chip.
> My mRNA enrichment process and the results are satisfactory for cDNA conversion ?
> Is cDNA conversion done?
> How can i check my first strand cDNA product?

Basically, i wanted to know the checkpoints of each step to confirm that library preparation protocol is running correctly?

Last edited by mukeshwar; 02-04-2014 at 09:21 AM.
mukeshwar is offline   Reply With Quote
Old 02-04-2014, 11:09 AM   #12
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,060
Default

Quote:
Originally Posted by mukeshwar View Post
Hi,

I am using TruSeq RNA sample prep kit v2 for WTA library. I started with the 6 ug of total RNA followed by Elute-prime fragment for 2 mins, 1st strand cDNA and then 2nd strand cDNA synthesis and got the following qubit readings

Elute primer fragment (RNA BR Assay): 15.6 ng/ul
dsCDNA synthesis (DNA dsHS assay)_before 1.8x bead purification: 0.312 ng/ul
dsCDNA synthesis (DNA dsHS assay)_after 1.8x bead purification: 0.225 ng/ul

On the basis of qubit reading i wanted to know that

>is it enough concentration of dscDNA or i am loosing the dscDNA amount? i didn't check the dscDNA profile on HS chip.
> My mRNA enrichment process and the results are satisfactory for cDNA conversion ?
> Is cDNA conversion done?
> How can i check my first strand cDNA product?

Basically, i wanted to know the checkpoints of each step to confirm that library preparation protocol is running correctly?
Please create a new thread since your question is unrelated to the thread you posted in.

New threads can be created by:

SeqAnswers.com --> Click "Forums" left navigation box --> Choose an appropriate forum to post question in --> "New Thread" button at top left.

You can then delete this post by choosing "Edit" --> "Go Advanced" --> Delete.
GenoMax is offline   Reply With Quote
Reply

Tags
rna-seq advice, rna-seq help, rna-seq recommendations, rna-seq suggestions

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:30 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO