SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
RNA-Seq: Comparing Next-Generation Sequencing and Microarray Technologies in a Toxico Newsbot! Literature Watch 0 08-13-2011 03:10 AM
Comparing Poly(A)+ selection methods - rRNA contamination, yield, etc. daughart Sample Prep / Library Generation 1 01-31-2011 09:58 AM
RNA-Seq: Comparison of sequencing-based methods to profile DNA methylation and identi Newsbot! Literature Watch 0 09-21-2010 02:00 AM
RNA-Seq: Comprehensive comparative analysis of strand-specific RNA sequencing methods Newsbot! Literature Watch 0 08-17-2010 02:00 AM
Shearing methods for 454 8kb libraries lzembek Sample Prep / Library Generation 0 06-11-2010 06:24 AM

Reply
 
Thread Tools
Old 08-28-2008, 10:00 AM   #1
melano
Junior Member
 
Location: USA

Join Date: Aug 2008
Posts: 3
Default Methods for comparing RNA sequencing libraries?

Hi everyone,

I am going to explain briefly what I'd like to do. I want to see which genes are differentially expressed between plants sterile and fertile. To do so, I will make a massive sequencing (454) of RNA from anthers from each line (fertile and sterile). My question is: what is the protocol to follow to compare the two libraries (fertile and sterile). So, which software I need to use and the steps to follow.

Thanks very much for your attention
melano is offline   Reply With Quote
Old 08-28-2008, 10:54 AM   #2
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

I'm sure others will comment, but first, if your main interest is differential expression, 454 seems to me to be the wrong technology. A couple hundred thousand reads will likely not give you the resolution necessary...unless you're planning on making concatenated SAGE libraries.
ECO is offline   Reply With Quote
Old 08-28-2008, 11:39 AM   #3
melano
Junior Member
 
Location: USA

Join Date: Aug 2008
Posts: 3
Default

Thanks ECO,

Which method could you advise to me for a high resolution differential expression?. Could it work If I do a subtractive hybridization in both direction and then massive sequencing of the product?
melano is offline   Reply With Quote
Old 08-28-2008, 01:50 PM   #4
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

Solexa might be ..
See if this is of any use:

BioTechniques mappability article
http://www.biotechniques.com/default...full&id=112900
http://www.biotechniques.com/supplem...112900/307.pdf

RNA-seq (U of Chicago)
http://genome.cshlp.org/cgi/content/...r.079558.108v2
bioinfosm is offline   Reply With Quote
Old 08-29-2008, 11:16 AM   #5
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,177
Default

Illumina sequencing is preferred for differential expression analysis IF you have a reference genome or transcriptome to map your reads to. I have had to explain to a few P.I.s that we really can't do expression analysis using the Illumina on their favorite non-model plant when the sum total of transcriptome information is the 500 EST sequences they generated 10 years ago.

If you have reference and you have access to an Illumina then go with that, using the RNA-seq method referenced above by bioinfosm. You could try the Illumina DGEx method, but this is a lot of work both in the sample prep and data analysis. It also requires a very well annotated transcriptome to make the correct assignments.

A consideration in plants though is that they may contain large families of very closely related genes. Short reads from the Illumina may not be able to be unambiguously assigned to single family member. (Of course this is only in issue of you are interested in examining differential expression within a gene family.) In this case the longer reads from the 454 may be more useful, in particular when the sequencing is targeted at the 3'-UTR (http://www.plantphysiol.org/cgi/content/full/146/1/32). This may also be protocol to consider if you only have access to a 454 instrument.

You did not say what species of plant you are working with but if it a completely novel species then all of the above protocols would be hindered by lack of a reference. In situations like this we have done whole transcriptome shotgun sequencing from multiple libraries (e.g. different tissues, developmental stages, mutant vs. wt, etc.) and then assembled putative transcripts from the reads. For putative transcripts with a sufficient number of reads assigned you can get some differential expression information. Putative gene product IDs for the the transcript assemblies may be assigned using BLAST. This method really only provides expression information for moderately to highly expressed genes.
kmcarr is offline   Reply With Quote
Old 08-29-2008, 11:56 AM   #6
melano
Junior Member
 
Location: USA

Join Date: Aug 2008
Posts: 3
Default

Thanks to bioinfosm and kmcarr,

I work with wheat (hexaploid wheat), i think that it information is relevant for suggest the best method to do a differential expression. Although there is some information about the transcriptome (1034368 ESTs), I don't if it is enoguh.

On the other hand, i only have access to a 454 technology.

Is it possible to estimate the coverage of a 454 library?

Thanks again
melano is offline   Reply With Quote
Old 09-03-2008, 08:37 AM   #7
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

Just a quick note,

I have some software that I've been using to do this type of analysis based off of alignments to the transcriptome using the exonerate aligner and Solexa sequences (It was about a year ago, so it was still Solexa back then, and exonerate was pretty much "state of the art" for a while...). It generates an excel format output that can be used to compare gene expression, and is relatively easy to do statistics on.

I imagine that the same process could be used with 454 (albeit, the resolving power and the statistics would be dramatically limited compared to Illumina reads simply because of the sampling depth - 4 to 8M reads per lane for Illumina, and I'm not sure what you'd get with 454).

The issue, however, is that it sounds like you don't have a transcript database/fasta/etc for wheat that can be used for aligning against (exactly as kmcarr pointed out), which means that doing anything with high through genomics will be nearly impossible.

With those two hurdles, it might make more sense to try to use your 454 machine to try assembling a transcriptome reference as your first step, before trying to do comparative genomics. You may not get the depth you need (I don't know how deep transcriptome sampling you can do with 454 at this point), but you'd definitely obtain a reference that would allow you to start working on this problem.

Hopefully my comments aren't completely off-base!

Anthony
__________________
The more you know, the more you know you don't know. —Aristotle
apfejes is offline   Reply With Quote
Old 09-03-2008, 10:39 AM   #8
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,177
Default

Anthony, your basic idea is sound and it is what we have done a number of times with non-model plants. However, as melano mentioned, there are already > 1 million EST sequences for T. aestivum which should provide adequate coverage of the transcriptome. And the work of assembling a putative transcriptome has already been done by JVCI (nee TIGR). Check out their Plant Transcript Assemblies site (http://plantta.tigr.org/). Melano, you can download the wheat assemblies at ftp://ftp.tigr.org/pub/data/plantta/Triticum_aestivum. The wheat assembly was done a little over two years ago when there were ~840,000 ESTs (plust a few fl-cDNA and mRNAs) but I don't think the additional ESTs would make a significant difference. The last release contains ~62,000 assemblies plus ~350,000 singleton ESTs.

Note that the assemblies are simply shotgun assemblies of ESTs. There is no attempt to identify ORFs; the assemblies are error prone, including mis-calls and indels; some assemblies are chimeric, and there is a significant amount of redundancy in the data set (i.e. multiple assemblies apparently representing the same transcript.) Given all of that, at least they give you something to align to. The assemblies are annotated by BLAST vs the UniRef database.

Melano, as I described above, you can either do shotgun cDNA sequencing or use the targeted 3'-UTR approach. The advantage if the 3'-UTR approach is that each "read" you generate will be a "count" of transcript in your sample; whereas with the shotgun sequencing you could be generating multiple reads from a single transcript which does not provide you with any additional information. The 3'-UTR sequencing may also be better at distinguishing closely related transcripts. To identify a read its sequence must be represented in your reference set. The shotgun sequencing approach provides a better chance of identifying reads (you're not limiting the sequence you gather to one part of the transcript). Given the size of the EST set though I think the 3'-UTRs should be well represented.

You mentioned doing a subtractive hybridization above; what did you mean be this? Normally one would never do any sort of normalization on a sample to be used in a differential expression experiment. However if you know that there are certain transcripts which a) are very highly expressed and b) you are certain that you don't care about them, then it may be o.k. to try to remove them because they will "waste" a a lot of read capacity. I don't know about anthers but in an experiment we did with arabidopsis leaves we found that >50% of the reads were from the 10 most abundant transcripts (all photosynthesis related genes obviously). It really hurts to have >50% of your data be essentially worthless.

Once you have your reads you can align them to the TIGR-TA reference using your favorite aligner (exonerate, BLAT, megablast) and count reads. Complications will be reads aligning equally well to more than one assembly and, if you use the shotgun approach, you will have to normalize the counts to the cDNA length (of course you don't actually know the true cDNA length for most of the transcripts.) 454 will generate no where near as many reads as Illumina, ~ 300,000 - 400,000 for a whole picotiter plate vs. ~ 32,000,000 - 48,000,000 for a whole flow cell. For moderately to highly expressed genes you should be able to measure differential expression with some degree of confidence. For genes with very low levels of expression, or if the difference in expression between your samples is small you may not be able to make statistically confident determinations.

I hope this is enough information to get you started.

Kevin

Last edited by kmcarr; 09-03-2008 at 10:42 AM. Reason: Clarity
kmcarr is offline   Reply With Quote
Old 09-03-2008, 11:45 AM   #9
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

Quote:
Originally Posted by kmcarr View Post
Note that the assemblies are simply shotgun assemblies of ESTs. There is no attempt to identify ORFs; the assemblies are error prone, including mis-calls and indels; some assemblies are chimeric, and there is a significant amount of redundancy in the data set (i.e. multiple assemblies apparently representing the same transcript.) Given all of that, at least they give you something to align to. The assemblies are annotated by BLAST vs the UniRef database.
Basically, we're saying the same thing, and using the same methods - with the single difference that I'm suggesting it might be worth building a transcriptome which isn't error-prone and would have more confidence than a trancriptome built from ESTs.

Anyhow, for almost all 2nd gen sequencing, you either assemble your own reference, or you use a reference alignment - and the quality of the reference is a major factor in the success of your experiment, thus my suggestion was simply a way to bootstrap using the available tools. As long as melano is stuck using 454, he's not going to get the depth he needs for this to work, so this thread may be a moot point anyhow.

Anthony
__________________
The more you know, the more you know you don't know. —Aristotle
apfejes is offline   Reply With Quote
Old 09-04-2008, 06:23 PM   #10
Melissa
Senior Member
 
Location: Switzerland

Join Date: Aug 2008
Posts: 124
Default

I don't think reference will be a problem here. There's a large EST database derived from different wheat genotype available. There's no need to do transcriptome sequencing again. The transcriptome assembled using the EST databases should be good enough as long as you can map transcripts. After all, you're not looking for SNPs so you don't need to have error-free reference transcriptome.

Solexa is obviously the better choice. You can always send your samples to Solexa service provider. Try to find the best deal around.

By estimating the size of your transcriptome, you can calculate the coverage. coverage = Total amount of data generated (Mb)/ transcriptome size (Mb)

Anyway, hexaploid wheat is difficult to work with. But the advantage of working with a major crop is that there's extensive study on ESTs because genome sequencing is impossible. I hope you have a very good protocol to isolate RNA from anther. Best of luck.

Quote:
Originally Posted by melano View Post
Thanks to bioinfosm and kmcarr,

I work with wheat (hexaploid wheat), i think that it information is relevant for suggest the best method to do a differential expression. Although there is some information about the transcriptome (1034368 ESTs), I don't if it is enoguh.

On the other hand, i only have access to a 454 technology.

Is it possible to estimate the coverage of a 454 library?

Thanks again
Melissa is offline   Reply With Quote
Old 09-05-2008, 12:19 AM   #11
Roald
Director at CLC bio
 
Location: Denmark

Join Date: Aug 2008
Posts: 26
Default DeepSAGE using 454 platform

Hi Melano,

If 454 is the platform you will be doing this on, you may wish to explore the DeepSAGE approach for linking tags before pyrosequencing. This way you can increase sampling depth.
The approach is described in Nielsen et al. 2006. DeepSAGE--digital transcriptomics with high sensitivity, simple experimental protocol and multiplexing of samples.

Best of luck on your research.

Roald, CLC bio
Roald is offline   Reply With Quote
Old 09-05-2008, 09:09 AM   #12
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

Thanks for the additional information - I've never worked with wheat before, so I didn't know what resources were available. Hexaploidy does sound like a challenge though. I'm looking forward to hearing (reading) how this turns out.
__________________
The more you know, the more you know you don't know. —Aristotle
apfejes is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:33 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO