Unconfigured Ad

**cedance** · 11-11-2011, 04:23 AM

I'd guess it depends on the analysis you want to do on the data, or the purpose of your experiment. Generally, for snp-calling, this amount of reads is sufficient I'd suppose. However, if you are looking at gene expression, especially to detect low expressed genes' differential expression, then maybe more reads would help.

I'd love to see the fastqc results to see how good an RNA-Seq data could look like. The ones I am working with, while they are good after preprocessing (adapter clipping + quality trimming), I have never seen a library sequenced good enough by looking at the raw data.

Also, it would be great if you could tell how much of total RNA did you use and also a bit about pre-amplification of the library.. if it was performed, how many cycles etc...

Thank you.

**kopi-o** · 11-11-2011, 04:46 AM

This is a hotly debated topic, see e. g. http://blog.fejes.ca/?p=607 where Anthony Fejes discusses a paper claiming that 500 million reads are needed to estimate transcription levels ... There has been a kind of mini-trend lately with several papers claiming that RNA-seq is actually not that good compared to microarrays unless you have very deep coverage.

As cedance said, it really depends on what you are interested in. I have performed some simulations where I downsampled the data and looked at the resulting abundance estimates for isoforms from Cufflinks and other tools, and haven't seen that much difference beyond 10 million paired-end reads so far. Looking at the number of detected transcripts, it always grows with sequencing depth, but again the curve is almost flat after 10-20M reads in the cases I've looked at.

**adameur** · 11-11-2011, 07:04 AM

To make it even more complex, we have seen that polyA+ RNA gives a much higher fraction of reads mapping to exons compared to total RNA (rRNA depleted) where there are instead lots of intronic reads. Our explanation is that total RNA-seq captures lots of nascent transcripts that have not yet been fully transcribed while PolyA+ RNA-seq captures mainly mature transcripts (see http://dx.doi.org/10.1038/nsmb.2143).

So I think fewer reads are required for polyA+ RNA-seq compared to total RNA-seq if you are interested in mRNA expression.

**harryzs** · 11-11-2011, 08:58 AM

you should read this:

RNA-Seq Blog

http://rna-seqblog.com/information/how-many-reads-are-enough/

Transcriptome Research & Industry News

**pettervikman** · 11-18-2011, 01:50 AM

Thanks for all the answers. I've decided to resequence a couple of samples to a much higher depth as well as doing some data pooling to see how things look in our system. I'm assuming that the coveraqge needed it will be dependent on read length as well read depth and since we have 101 bp long reads we might be better off. I'm also uncertain regarding the number of transcripts to expect, we're working in a highly specialised celltype, not in a cell line, so I'm expecting less transcripts and far from all that could exist in comparison to the vast numbers found in the immortalised cell lines.

I'm also curious whether it's much dependent on the highly expressed genes that are in the sample since they "steal" a lot of the data being produced. I know that it's possible to select the genes that one is interested in but have any one tried to remove the genes that is uninteresting/highly expressed to increase the coverage of the other genes? This would allow for a higher coverage even of genes that you don't know exist in comparison to the positive selection when you only find what you expected to find.'

I've also (wanted to) attach a figure to show what I call high quality data since cedence asked for it but since it ask for an url to do it and I have those figures just on my computer I can't. Are there any nice (fast and simple) ways of doing this?

**cedance** · 11-18-2011, 02:00 AM

Pettervikman,
About posting images/urls to images, I use imageshack to upload images and paste the url here with the URL button.

**pettervikman** · 11-18-2011, 02:38 AM

A new try for the figures

**cedance** · 11-18-2011, 04:16 AM

That looks really great. Could you also post the plots for "Sequence duplication levels" and "per base sequence content"? These are the ones I am not quite satisfied with, with our data.

**pettervikman** · 11-18-2011, 04:58 AM

Here are per base content and duplication levels. Since we've used the poly A tail pulldown I'm not suprised of the increase in A/T initially. The duplication levels are much higher then I'd accept for a genomic project but since there's much less diversity from the transcriptome I'm fine with this. Consider that there are hard end points that really cant be changed (5' and 3' ends of transcripts) and between maybe 10-15 k transcripts to start with.

An other question though. After cufflinks using RABT (-g) the transcripts creation looks a lot nicer. That said does anyone know why some transcripts are labelled OK despite the fact that their FPKM_low is 0? I'm also wondering about transcripts labelled as FAIL that have the positive numbers in coverage, fpkm, fpkm_high.

To sum it up, why are there transcripts with positive numbers in coverage, fpkm, fpkm_high and 0 in fpkm_low sometime OK, LOWDATA or FAIL?

**cedance** · 11-18-2011, 05:06 AM

Thanks again. I am sorry I don't/haven't used cufflinks, yet.
1 more question!!: why is poly-A pulldown responsible for initial increase in A/T?

**kopi-o** · 11-18-2011, 05:09 AM

Petter, those data look super. Did you get them sequenced in Uppsala?

**pettervikman** · 11-18-2011, 05:23 AM

Thanks! They got sequenced here on "my" hiseq. We have a hiseq here on CRC in Malmö, and where part of Lund University/LUDC (Lund University Diabets Center).

The pulldown uses a poly T tail and this will bind somewhere in the poly A tail (just to be super clear). Hopefully close to the 3' end of the CDS/3'non coding. But if it binds further down there will be a few As or Ts sequenced before the actual sequencing, hence the slight increase of A/T.

**pmiguel** · 11-18-2011, 05:34 AM

Originally posted by cedance View Post

Thanks again. I am sorry I don't/haven't used cufflinks, yet.
1 more question!!: why is poly-A pulldown responsible for initial increase in A/T?

It isn't. The non-random base distribution in the first 10 bases is attributed to hexamer-primed 2nd strand synthesis. (The hexamers do not prime perfectly randomly.)

--
Phillip

**pettervikman** · 11-18-2011, 05:38 AM

Thanks pmiquel. Didn't know that. But I've heard that it's much more common in rna-seq experiments in comparison to dna seq, hence the poly a tail story. But your saying that it's only dependent on the 2nd strand syntesis?

Topics	Statistics	Last Post
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, Yesterday, 12:03 PM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 Yesterday, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, Yesterday, 11:40 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:40 AM
MetaBeeAI Helps Scientists Process Research Literature Faster by SEQadmin2 Started by SEQadmin2, 05-28-2026, 11:40 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 05-28-2026, 11:40 AM
Scientists Solve a 25-Year Mystery in RNA Interference by SEQadmin2 Started by SEQadmin2, 05-26-2026, 10:12 AM	0 responses 31 views 0 reactions	Last Post by SEQadmin2 05-26-2026, 10:12 AM

Unconfigured Ad

How many reads are acceptable from an RNA seq experiment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News