SEQanswers (
-   RNA Sequencing (
-   -   strange mapping bias inside outside genes in prokayotic RNASeq (

tokikake 04-20-2015 02:13 AM

strange mapping bias inside outside genes in prokayotic RNASeq
1 Attachment(s)
Hi everyone,

as it is mentioned in the title I have encountered a strange problem in a recent RNASeq experiment which I have never seen before.

Some background:
stranded RNASeq of prokaryotic mRNA; three samples (1 and 2 same condition; diff. harvest timepoint) with three replicates each; total RNA prep, DNAse digestion; rRNA depletion; library prep with dUTP method; Illumina sequencing; trimming; filtering of remaining tRNA/rRNA reads; mapping with bowtie2.

After read counting, normalization and diff. expression analysis with Bayseq I had a strange bias towards one sample, which seems to dominate the rest. In the end I found out, that the summarized readcounts (of mapped reads inside genes) are in 6 out of 9 samples significantly lower and thus affecting the results of the diff. Expression analysis. I never had such a problem before and checked randomly ~6 other RNASeq projects and even if the input read numbers differ, the nomalized reads counts are equally distributed.

To find out, what is the problem I counted all reads (sense and antisense reads inside and outside genes; see attachment). To my surprise there is a heavy bias towards the outside mapped genes in the problematic samples (2 and 3, all replicates).

Does anyone has a clue what is the reason for this?

I thought about DNA contamination, but then I would assume a equally distribution of reads sense and anti-sense. Or the dUTP library prep, because it is not sufficient for high GC organisms, but again there should be a equally distribution over the genome and of course over all samples. All samples, except 3.1 and 3.2, were treated (DNAse, depletion, library prep) in parallel.

I would really appreciate any hint or suggestion!


bastianwur 04-21-2015 04:49 AM

No clue what the biological cause could be.
What are your genes in this case? Protein coding sequences, or also noncoding RNA? If noncoding is included, does it also include noncoding besides tRNA and rRNA?
I ask because the rRNA read removal (under the assumption you don't do that by mapping do your reference, but with e.g. sortMeRNA) is not fully reliable (depending on how close your organism to anything in the database is), and you might still end up with a half ton of rRNA/tRNA. And if that's not the case, also tmRNA might have a very high expression value.

And what does the fastQC report say?

tokikake 04-23-2015 06:48 AM

Hi bastian,

thank you for your fast answer!
The rRNA/tRNA reads were filtered via mapping against the reference (and not with sortMeRNA). But the tmRNA was a good point. I added it to my annotations (so genes in general not only protein coding ones) and found up to 13% more reads mapping into genes. I try now to annotate the RNAs with rfam and infernal and hope to reduce the amount of outside mapped reads.
Nevertheless it is strange and I cannot see the same mapping bias in other projects, where I also didn't annotate the tmRNA (or any RNA prior to mapping, except tRNAs/rrNAs of course). Aber: Man lernt niemals aus!

Have anyone else ever tested the amount of mapped reads inside/outside genes? It would be interesting to know!

All times are GMT -8. The time now is 10:36 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.