SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
What proportion of 'bad' quality reads are expected using HiSeq 2000 for RNA-Seq bob-loblaw Bioinformatics 2 07-02-2013 10:45 AM
RNASeq: Read length different from expected gogodidi RNA Sequencing 7 06-17-2013 10:31 PM
RNA kinds expected in RNA-seq results Kocur RNA Sequencing 3 05-22-2013 03:26 PM
all depths akvarel General 3 11-15-2011 02:04 AM
PubMed: Nucleotide Bias Observed with a Short SELEX RNA Aptamer Library. Newsbot! Literature Watch 0 07-29-2011 02:00 AM

Reply
 
Thread Tools
Old 08-28-2013, 06:40 PM   #1
rnastar
Member
 
Location: Boston, MA

Join Date: Aug 2013
Posts: 13
Default RNA-seq read depths: observed vs. expected

Dear all,

We recently submitted RNA-seq samples for sequencing to a local facility (20 cancer samples, 20 controls), where for each sample we sequenced approximately 100 million paired end 50 bp reads. However, after sequencing we found that several samples after sequencing only contained 6 million sequenced reads and others 50-80 million reads. Only about 5 samples had 90 million reads and above, with one sample having 180 million reads.

I am not sure what to make of this. There were some concerns regarding the RNA quality, of some of the samples, but I am not sure if that could lead to such low output. Our contact at the facility seems to suggest it is the RNA-quality, but I wanted to ask you experts just to be sure.

The fastQC analysis on the sequences do not show any significant issues in terms of quality, however it appears the filtering may be occuring during or just after the sequencing itself. If anyone has any ideas I would be much appreciative.
rnastar is offline   Reply With Quote
Old 08-28-2013, 07:19 PM   #2
Wallysb01
Senior Member
 
Location: San Francisco, CA

Join Date: Feb 2011
Posts: 286
Default

If you're pooling 40 samples together to spread across all the lanes then its very important to get the molar ratios correct. I suppose that much is obvious. Also, it is important that all the samples have a similar size distribution. Illumina tech just prefers inserts that are smaller, so if you had some samples with 250bp average sizes and some with 600, that could make a big difference in clustering efficiency of each sample. Now RNA integrity is an issue too, but of the three things RNA integrity shouldn't have as much effect on sequencing depth in pooled samples. If you got the other two things right, that just shouldn't be an issue in this case (it could lead to poor data for other reasons just not really relative sequencing depth).

Also, if only 5 samples came with in 90% of your expected depth of sequencing, I would suspect something went wrong with the actual sequencing. Either clustering didn't work well or barcodes weren't read correctly for a lot of the reads. Though I am curious, you say you have 40 samples total, and you're sequencing to 100M PE reads each, that would be 20 HiSeq lanes, as you shouldn't expect more than 200M PE reads per lane. Is that what you actually did? Or are you counting them like single end reads, leading to 400M total reads per lane?

Most people are now sequencing about 50M reads per sample (either 2x100, so really 100M reads but they are paired so statistically its still 50M, or 1x50). So, if most of your samples are around 50M-80M that should be fine.
Wallysb01 is offline   Reply With Quote
Old 08-28-2013, 07:48 PM   #3
rnastar
Member
 
Location: Boston, MA

Join Date: Aug 2013
Posts: 13
Default

Thank you for the reply! For clarification, I meant that we are getting only 50-80 million paired end reads, that is, only 100-160 million reads total for a given sample. It sounds like something went wrong with the sequencing but the facility may not want to tell us (this is not through illumina, a local university).

In terms of downstream analyses, we tried to look at alternative splicing (our main interest) using cuffdiff using all samples, and when we did so we found no significant alternative splicing events. When I filtered out samples that had less than 90 million paired end reads off the sequencer we got about 600 significant alternatively spliced genes and a lot of DE genes. I am wondering whether filtering out samples based on the resulting sequencing depth is the way to go, or if we should question the entire set to begin with. In mapping with Tophat, in almost all samples I am seeing a lot of reads mapping to multiple places in the genome. So if we had 200m sequenced reads (100m paired end reads) we observe almost 300-400m reads in the accepted_hits.bam file. This is all making me a bit nervous.
rnastar is offline   Reply With Quote
Old 08-28-2013, 08:43 PM   #4
Wallysb01
Senior Member
 
Location: San Francisco, CA

Join Date: Feb 2011
Posts: 286
Default

50-80M paired end reads per sample and 20 replicates for control and cancer cells is a huge data set for RNA-seq. Even if that's not what you paid for, you should be able to find plenty of differentially expressed isoforms, if they are there to find. And tophat -> cuffdiff is probably the best way to go with isoforms. Though the other option is to use DESeq for exon level tests, to find differentially expressed exons, then track them back to what isoforms they could be from. Its interesting you chose 2x50 reads for isoform tests. While its good you went for paired end, the extra 50bp on each read would have been pretty helpful when it comes to resolving isoforms.

I think you are right to set a read depth cut off to include your replicates. I'd suggest maybe 20M-30M PE reads. But it might depend on what your read depth per sample distribution looks like.

As for if there was a problem with the sequencing, do you know how many lanes you payed for? Without knowing that, its hard to judge just how wrong the sequencing might have went.
Wallysb01 is offline   Reply With Quote
Old 08-29-2013, 09:45 AM   #5
rnastar
Member
 
Location: Boston, MA

Join Date: Aug 2013
Posts: 13
Default

I just followed up on this, and it looks like we sequenced two individuals per lane, so we duplexed the sequencing. From what we are seeing, it looking like the variability in sequencing depth is specific to this set of samples, and not seen as much in other projects we have done.
rnastar is offline   Reply With Quote
Reply

Tags
low depth, low quality, rna-seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:50 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO