SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
low 260/230 ratios of rRNA depleted and purified plant sample Seq_Sai Sample Prep / Library Generation 1 03-20-2017 03:52 PM
RNA seq data filter low count before or after normalization qliu RNA Sequencing 4 01-10-2015 01:21 AM
RNA-seq: Strategy for filtering low count genes? LeonDK RNA Sequencing 1 11-03-2014 11:22 PM
per_sequence_GC_content in ribo-reduction RNA-seq data dfass RNA Sequencing 0 09-24-2013 10:14 AM

Reply
 
Thread Tools
Old 04-20-2017, 07:01 AM   #1
gabe_rosser
Junior Member
 
Location: London, UK

Join Date: Mar 2017
Posts: 6
Default Low gene count assignment for ribo-depleted RNA-Seq data

I originally posted this question on biostars but received little response. I'll be sure to update either post if I receive any more detail.

Until recently, we have used a poly(A) selection process to prepare our RNA-Seq libraries. In our last run we had to use a ribo-depletion approach instead, as we want to study some formalin-fixed (FF) material with degraded RNA. The facility use Illumina's Ribo-Zero kit. We otherwise kept the same sequencing parameters: paired-end 75bp reverse stranded on an Illumina HiSeq 4000.

Since we don't know how well the FF material represents the original tissue, we also sequenced a few frozen tissue samples, with the intention of comparing the two (though they are _not_ perfectly matched). In total we have 3 FF samples and 2 frozen samples.

Short version:

Both frozen tissue and FFPE results show a low number of reads being assigned to an exon. This is ~60% for FFPE and ~25% for frozen samples, which I did not expect. Is this an issue and can I still compare the two after normalisation for different effective library sizes?

More detail:

I ran the reads through my usual pipeline:

fastQC all looked OK, some highly duplicated sequences, probably rRNA associated, but nothing too major.
STAR alignment resulted in ~90% reads being uniquely assigned in all cases (similar to our poly(A) samples)
I had STAR run gene counts during alignment. The results differed from what I've typically seen in the poly(A) data in terms of the % of reads that assign to a (unique) gene.

Poly(A): we usually get 80-85%
Ribo-depleted FF samples: 24%, 24%, 26%
Ribo-depleted frozen samples: 58%, 59%
So in both cases the numbers assigned are far lower than for poly(A), and this is especially bad for the FF samples. Most of the reads that were not assigned belonged in the 'no feature' category, i.e. they didn't overlap with any exon.

It occurs to me that this difference is probably due to the larger variety of RNA species: poly(A) should enrich primarily for mRNA, while ribo-depletion leaves in ncRNA species, etc. Therefore fewer reads will be mRNA and fall within an exon for gene counting purposes. I ran ezBAMqc to check the distribution of the aligned reads in the BAM files:

FF sample



frozen sample



I dug out a similar plot for one of our poly(A) samples (below). The % intronic reads is indeed much lower.



My hypothesis: the FF library is dominated by species other than mRNA.

Does this sound like a reasonable explanation?

Is the very low proportion of exon-assigned counts a problem (other than being wasteful)?

Is it still reasonable to compare the gene counts of the FF and frozen samples? I would normalise for the total number of reads, but is that sufficient?

Thanks for any thoughts.
gabe_rosser is offline   Reply With Quote
Old 04-20-2017, 08:22 AM   #2
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,204
Default

What do you mean with your comment that the frozen and FF samples are "not perfectly matched".

The only unexpected aspect of these result to me are the differences between the FF and frozen number of intron-mapping reads. But given the large size of animal introns, it would not take a large percentage of non-spliced reads to produce a large number of intron-mapping reads.

But as to why the number of intron-mapping reads would be different between FF and frozen cell RNA preps -- that has me mystified. Unless the FF cells were treated with something that would stop transcription but allow transcript maturation to continue.

--
Phillip
pmiguel is offline   Reply With Quote
Old 04-21-2017, 01:14 AM   #3
gabe_rosser
Junior Member
 
Location: London, UK

Join Date: Mar 2017
Posts: 6
Default

Quote:
Originally Posted by pmiguel View Post
What do you mean with your comment that the frozen and FF samples are "not perfectly matched".
Sorry, that was a bit cryptic in retrospect! I mean that the samples are clinical; the surgeon was not attempting to capture the exact same tissue in the FF and frozen samples. So whilst they are 'patient matched', we can't be certain that their epigenetic profiles are comparable, because they may have differing tumour content and differing amounts of healthy cells. I wouldn't imagine that this could lead to such a large difference in the intronic content, though?

Quote:
The only unexpected aspect of these result to me are the differences between the FF and frozen number of intron-mapping reads. But given the large size of animal introns, it would not take a large percentage of non-spliced reads to produce a large number of intron-mapping reads.

But as to why the number of intron-mapping reads would be different between FF and frozen cell RNA preps -- that has me mystified. Unless the FF cells were treated with something that would stop transcription but allow transcript maturation to continue.
I'll check, but I think the formalin fixing process was fairly standard clinical practice. Also quite straightforward for frozen tissue: mash it up (a joyful task, I'm assured), extract total RNA, run library preparation.

Thanks for your thoughts.

Last edited by gabe_rosser; 04-21-2017 at 02:27 AM. Reason: Accidentally referred to frozen as FF
gabe_rosser is offline   Reply With Quote
Old 04-21-2017, 08:45 AM   #4
cmbetts
Member
 
Location: Bay Area

Join Date: Jun 2012
Posts: 85
Default

You should expect to see considerably lower counts to exons, mostly replaced by intron derived reads when comparing a ribo depletion to dT purified.
I'm surprised that your FF has lower exon counts than the FFPE, though. Generally, I've seen decreased exon and increased intron counts in FFPE samples, supposedly because retained introns are protected from degradation by the nuclear envelope.

Figure of expected mapping statistics from dT vs depletion from Clontech
cmbetts is offline   Reply With Quote
Old 04-24-2017, 02:10 AM   #5
gabe_rosser
Junior Member
 
Location: London, UK

Join Date: Mar 2017
Posts: 6
Default

Quote:
Originally Posted by cmbetts View Post
You should expect to see considerably lower counts to exons, mostly replaced by intron derived reads when comparing a ribo depletion to dT purified.
I'm surprised that your FF has lower exon counts than the FFPE, though. Generally, I've seen decreased exon and increased intron counts in FFPE samples, supposedly because retained introns are protected from degradation by the nuclear envelope.
Apologies, I think I've accidentally caused confusion by using the abbreviation FF = FFPE in my original post! This means my results do agree with what you've said: frozen has higher exon counts than FFPE.

Thanks for the plot, that's helpful.
gabe_rosser is offline   Reply With Quote
Reply

Tags
fastqc, poly(a), ribo-zero, rna-seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:43 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO