SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Statistical model for RNA-Seq sensitivity estimation schelhorn RNA Sequencing 5 08-27-2013 03:17 PM
BWA parameters for mRNA-seq aligning against mRNA refseq kwicher SOLiD 1 09-19-2011 04:45 AM
RNA-Seq: The sensitivity of massively parallel sequencing for detecting candidate inf Newsbot! Literature Watch 0 05-24-2011 03:00 AM
starBase: explore microRNA–mRNA interaction maps from CLIP-Seq and Degradome-Seq data yjhua2110 Literature Watch 1 02-02-2011 09:50 AM
mrna seq or directional mrna seq link1 Sample Prep / Library Generation 0 08-12-2010 06:58 PM

Reply
 
Thread Tools
Old 05-10-2010, 10:25 AM   #1
liux
Member
 
Location: Midwest

Join Date: Mar 2009
Posts: 30
Default Papers on the sensitivity of mRNA-seq?

is there an article discussing the sensitivity of mRNA-seq?

I am looking the answer for this question:

for a given transcriptome, how many reads of x length are needed to reliably discover a rare transcripts (say 1~2 copies / cell)?

Thanks!

Last edited by liux; 05-10-2010 at 02:11 PM.
liux is offline   Reply With Quote
Old 05-17-2010, 05:46 AM   #2
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Hi

I don't know of any papers, but is should be possible to calculate this yourself.

Let's say a typical cell has N transcript molecules, then the concentration of your rare transcript is roughly 1/N. If your sequencing run produces M reads (typically, M is up to 20 mio), the probability that a given read is your transcript is M/N.

The probability that none of the M reads show your sample is (1-M/N)^M, hence, the probability to see it at least once is, 1-((1-M/N)^M). If you say, you want to see it at least, say, k=10 times, you can easily calculate this with the Poisson distribution.

Now, how do you know how many transcripts there are in a cell, i.e., what is the value of N? For such questions, the following nice paper and its web site, that collects a lot of such numbers, might be useful: Phillips and Milo, A feeling for numbers in biology, PNAS, Vol. 106, 21465-71 (2009).

Finally, as you are looking for rare transcripts, you might also be interested in this new method to reduce the number of common transcripts, that a colleague happened to have shown me just an hour ago: Bogdanov et al., Normalizing cDNA Libraries, Curr Prot Mol Biol, 5.12.1, Apr 2010

Simon
Simon Anders is offline   Reply With Quote
Old 05-17-2010, 06:04 AM   #3
dariober
Senior Member
 
Location: Cambridge, UK

Join Date: May 2010
Posts: 311
Default

Quote:
Originally Posted by liux View Post
is there an article discussing the sensitivity of mRNA-seq?

I am looking the answer for this question:

for a given transcriptome, how many reads of x length are needed to reliably discover a rare transcripts (say 1~2 copies / cell)?

Thanks!
Hello,
See if Trapnell et al. 2010 (Nature Biotech) helps. Figure 4 shows how many reads you need to recover a transcript expressed at a given RPKM.
Maybe not exactly what you are asking but possibly you can get a feel for it.

Dario
dariober is offline   Reply With Quote
Old 05-17-2010, 11:27 AM   #4
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

Three apropos papers

Genome Biol. 2010 May 11;11(5):R50. [Epub ahead of print]
Modeling non-uniformity in short-read rates in RNA-Seq data.
Li J, Jiang H, Wong WH.

Abstract
ABSTRACT: After mapping, RNA-Seq data can be summarized by a sequence of read counts commonly modeled as Poisson variables with constant rates along each transcript, which actually fit data poorly. We suggest using variable rates for different positions, and propose two models to predict these rates based on local sequences. These models explain more than 50% of the variations and can lead to improved estimates of gene and isoform expressions for both Illumina and Applied Biosystems (ABI) data.

PMID: 20459815




BMC Bioinformatics. 2010 Apr 29;11 Suppl 3:S6.
Towards reliable isoform quantification using RNA-SEQ data.
Howard BE, Heber S.

Bioinformatics Research Center, North Carolina State University, Raleigh, 27606, USA. itsbehoward@hotmail.com
Abstract
BACKGROUND : In eukaryotes, alternative splicing often generates multiple splice variants from a single gene. Here we explore the use of RNA sequencing (RNA-Seq) datasets to address the isoform quantification problem. Given a set of known splice variants, the goal is to estimate the relative abundance of the individual variants. METHODS : Our method employs a linear models framework to estimate the ratios of known isoforms in a sample. A key feature of our method is that it takes into account the non-uniformity of RNA-Seq read positions along the targeted transcripts. RESULTS : Preliminary tests indicate that the model performs well on both simulated and real data. In two publicly available RNA-Seq datasets, we identified several alternatively-spliced genes with switch-like, on/off expression properties, as well as a number of other genes that varied more subtly in isoform expression. In many cases, genes exhibiting differential expression of alternatively spliced transcripts were not differentially expressed at the gene level. CONCLUSIONS : Given that changes in isoform expression level frequently involve a continuum of isoform ratios, rather than all-or-nothing expression, and that they are often independent of general gene expression changes, we anticipate that our research will contribute to revealing a so far uninvestigated layer of the transcriptome. We believe that, in the future, researchers will prioritize genes for functional analysis based not only on observed changes in gene expression levels, but also on changes in alternative splicing.

PMID: 20438653


BMC Genomics. 2010 May 5;11(1):282. [Epub ahead of print]
A comparison of massively parallel nucleotide sequencing with oligonucleotide microarrays for global transcription profiling.
Bradford JR, Hey Y, Yates T, Li Y, Pepper SD, Miller CJ.

Abstract
ABSTRACT: BACKGROUND: RNA-Seq exploits the rapid generation of gigabases of sequence data by Massively Parallel Nucleotide Sequencing, allowing for the mapping and digital quantification of whole transcriptomes. Whilst previous comparisons between RNA-Seq and microarrays have been performed at the level of gene expression, in this study we adopt a more fine-grained approach. Using RNA samples from a normal human breast epithelial cell line (MCF-10a) and a breast cancer cell line (MCF-7), we present a comprehensive comparison between RNA-Seq data generated on the Applied Biosystems SOLiD platform and data from Affymetrix Exon 1.0ST arrays. The use of Exon arrays makes it possible to assess the performance of RNA-Seq in two key areas: detection of expression at the granularity of individual exons, and discovery of transcription outside annotated loci. RESULTS: We found a high degree of correspondence between the two platforms in terms of exon-level fold changes and detection. For example, over 80% of exons detected in RNA-Seq were also detected on the Exon array, and 91% of exons flagged as changing from Absent to Present on at least one platform had fold-changes in the same direction. The greatest detection correspondence was seen when the read count threshold at which to flag exons Absent in the SOLiD data was set to t<1 suggesting that the background error rate is extremely low in RNA-Seq. We also found RNA-Seq more sensitive to detecting differentially expressed exons than the Exon array, reflecting the wider dynamic range achievable on the SOLiD platform. In addition, we find significant evidence of novel protein coding regions outside known exons, 93% of which map to Exon array probesets, and are able to infer the presence of thousands of novel transcripts through the detection of previously unreported exon-exon junctions. CONCLUSIONS: By focusing on exon-level expression, we present the most fine-grained comparison between the RNA-Seq and microarrays to date. Overall, our study demonstrates that data from a SOLiD RNA-Seq experiment are sufficient to generate results comparable to those produced from Affymetrix Exon arrays, even using only a single replicate from each platform, and when presented with a large genome.

PMID: 20444259
krobison is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:46 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO