Seqanswers Leaderboard Ad

**Simon Anders** · 05-17-2010, 04:46 AM

Hi

I don't know of any papers, but is should be possible to calculate this yourself.

Let's say a typical cell has N transcript molecules, then the concentration of your rare transcript is roughly 1/N. If your sequencing run produces M reads (typically, M is up to 20 mio), the probability that a given read is your transcript is M/N.

The probability that none of the M reads show your sample is (1-M/N)^M, hence, the probability to see it at least once is, 1-((1-M/N)^M). If you say, you want to see it at least, say, k=10 times, you can easily calculate this with the Poisson distribution.

Now, how do you know how many transcripts there are in a cell, i.e., what is the value of N? For such questions, the following nice paper and its web site, that collects a lot of such numbers, might be useful: Phillips and Milo, A feeling for numbers in biology, PNAS, Vol. 106, 21465-71 (2009).

Finally, as you are looking for rare transcripts, you might also be interested in this new method to reduce the number of common transcripts, that a colleague happened to have shown me just an hour ago: Bogdanov et al., Normalizing cDNA Libraries, Curr Prot Mol Biol, 5.12.1, Apr 2010

Simon

**dariober** · 05-17-2010, 05:04 AM

Originally posted by liux View Post

is there an article discussing the sensitivity of mRNA-seq?

I am looking the answer for this question:

for a given transcriptome, how many reads of x length are needed to reliably discover a rare transcripts (say 1~2 copies / cell)?

Thanks!

Hello,
See if Trapnell et al. 2010 (Nature Biotech) helps. Figure 4 shows how many reads you need to recover a transcript expressed at a given RPKM.
Maybe not exactly what you are asking but possibly you can get a feel for it.

Dario

**krobison** · 05-17-2010, 10:27 AM

Three apropos papers

Genome Biol. 2010 May 11;11(5):R50. [Epub ahead of print]
Modeling non-uniformity in short-read rates in RNA-Seq data.
Li J, Jiang H, Wong WH.

Abstract
ABSTRACT: After mapping, RNA-Seq data can be summarized by a sequence of read counts commonly modeled as Poisson variables with constant rates along each transcript, which actually fit data poorly. We suggest using variable rates for different positions, and propose two models to predict these rates based on local sequences. These models explain more than 50% of the variations and can lead to improved estimates of gene and isoform expressions for both Illumina and Applied Biosystems (ABI) data.

PMID: 20459815

BMC Bioinformatics. 2010 Apr 29;11 Suppl 3:S6.
Towards reliable isoform quantification using RNA-SEQ data.
Howard BE, Heber S.

Bioinformatics Research Center, North Carolina State University, Raleigh, 27606, USA. [email protected]
Abstract
BACKGROUND : In eukaryotes, alternative splicing often generates multiple splice variants from a single gene. Here we explore the use of RNA sequencing (RNA-Seq) datasets to address the isoform quantification problem. Given a set of known splice variants, the goal is to estimate the relative abundance of the individual variants. METHODS : Our method employs a linear models framework to estimate the ratios of known isoforms in a sample. A key feature of our method is that it takes into account the non-uniformity of RNA-Seq read positions along the targeted transcripts. RESULTS : Preliminary tests indicate that the model performs well on both simulated and real data. In two publicly available RNA-Seq datasets, we identified several alternatively-spliced genes with switch-like, on/off expression properties, as well as a number of other genes that varied more subtly in isoform expression. In many cases, genes exhibiting differential expression of alternatively spliced transcripts were not differentially expressed at the gene level. CONCLUSIONS : Given that changes in isoform expression level frequently involve a continuum of isoform ratios, rather than all-or-nothing expression, and that they are often independent of general gene expression changes, we anticipate that our research will contribute to revealing a so far uninvestigated layer of the transcriptome. We believe that, in the future, researchers will prioritize genes for functional analysis based not only on observed changes in gene expression levels, but also on changes in alternative splicing.

PMID: 20438653

BMC Genomics. 2010 May 5;11(1):282. [Epub ahead of print]
A comparison of massively parallel nucleotide sequencing with oligonucleotide microarrays for global transcription profiling.
Bradford JR, Hey Y, Yates T, Li Y, Pepper SD, Miller CJ.

Abstract
ABSTRACT: BACKGROUND: RNA-Seq exploits the rapid generation of gigabases of sequence data by Massively Parallel Nucleotide Sequencing, allowing for the mapping and digital quantification of whole transcriptomes. Whilst previous comparisons between RNA-Seq and microarrays have been performed at the level of gene expression, in this study we adopt a more fine-grained approach. Using RNA samples from a normal human breast epithelial cell line (MCF-10a) and a breast cancer cell line (MCF-7), we present a comprehensive comparison between RNA-Seq data generated on the Applied Biosystems SOLiD platform and data from Affymetrix Exon 1.0ST arrays. The use of Exon arrays makes it possible to assess the performance of RNA-Seq in two key areas: detection of expression at the granularity of individual exons, and discovery of transcription outside annotated loci. RESULTS: We found a high degree of correspondence between the two platforms in terms of exon-level fold changes and detection. For example, over 80% of exons detected in RNA-Seq were also detected on the Exon array, and 91% of exons flagged as changing from Absent to Present on at least one platform had fold-changes in the same direction. The greatest detection correspondence was seen when the read count threshold at which to flag exons Absent in the SOLiD data was set to t<1 suggesting that the background error rate is extremely low in RNA-Seq. We also found RNA-Seq more sensitive to detecting differentially expressed exons than the Exon array, reflecting the wider dynamic range achievable on the SOLiD platform. In addition, we find significant evidence of novel protein coding regions outside known exons, 93% of which map to Exon array probesets, and are able to infer the presence of thousands of novel transcripts through the detection of previously unreported exon-exon junctions. CONCLUSIONS: By focusing on exon-level expression, we present the most fine-grained comparison between the RNA-Seq and microarrays to date. Overall, our study demonstrates that data from a SOLiD RNA-Seq experiment are sufficient to generate results comparable to those produced from Affymetrix Exon arrays, even using only a single replicate from each platform, and when presented with a large genome.

PMID: 20444259

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Papers on the sensitivity of mRNA-seq?

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News