SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Strange strand bias and unusual depths Turner Conrad Genomic Resequencing 3 04-28-2014 05:26 PM
RNAseq, GO enrichment, and length bias correction lukas1848 Bioinformatics 3 01-22-2014 04:13 AM
Help! Strange discrepancy in the mapping output diagen Bioinformatics 2 10-30-2012 02:10 AM
RNASeq removing PCR bias schaffer Bioinformatics 3 10-20-2011 06:00 PM
dealing with 5' end bias in RNASEQ PFS Bioinformatics 4 09-06-2011 10:27 PM

Reply
 
Thread Tools
Old 04-20-2015, 02:13 AM   #1
tokikake
Member
 
Location: Germany

Join Date: Nov 2011
Posts: 24
Default strange mapping bias inside outside genes in prokayotic RNASeq

Hi everyone,

as it is mentioned in the title I have encountered a strange problem in a recent RNASeq experiment which I have never seen before.

Some background:
stranded RNASeq of prokaryotic mRNA; three samples (1 and 2 same condition; diff. harvest timepoint) with three replicates each; total RNA prep, DNAse digestion; rRNA depletion; library prep with dUTP method; Illumina sequencing; trimming; filtering of remaining tRNA/rRNA reads; mapping with bowtie2.

After read counting, normalization and diff. expression analysis with Bayseq I had a strange bias towards one sample, which seems to dominate the rest. In the end I found out, that the summarized readcounts (of mapped reads inside genes) are in 6 out of 9 samples significantly lower and thus affecting the results of the diff. Expression analysis. I never had such a problem before and checked randomly ~6 other RNASeq projects and even if the input read numbers differ, the nomalized reads counts are equally distributed.

To find out, what is the problem I counted all reads (sense and antisense reads inside and outside genes; see attachment). To my surprise there is a heavy bias towards the outside mapped genes in the problematic samples (2 and 3, all replicates).

Does anyone has a clue what is the reason for this?

I thought about DNA contamination, but then I would assume a equally distribution of reads sense and anti-sense. Or the dUTP library prep, because it is not sufficient for high GC organisms, but again there should be a equally distribution over the genome and of course over all samples. All samples, except 3.1 and 3.2, were treated (DNAse, depletion, library prep) in parallel.

I would really appreciate any hint or suggestion!

Tokikake
Attached Files
File Type: txt readcounts.txt (1.6 KB, 5 views)
tokikake is offline   Reply With Quote
Old 04-21-2015, 04:49 AM   #2
bastianwur
Member
 
Location: Germany/Netherlands

Join Date: Feb 2014
Posts: 98
Default

No clue what the biological cause could be.
What are your genes in this case? Protein coding sequences, or also noncoding RNA? If noncoding is included, does it also include noncoding besides tRNA and rRNA?
I ask because the rRNA read removal (under the assumption you don't do that by mapping do your reference, but with e.g. sortMeRNA) is not fully reliable (depending on how close your organism to anything in the database is), and you might still end up with a half ton of rRNA/tRNA. And if that's not the case, also tmRNA might have a very high expression value.

And what does the fastQC report say?
bastianwur is offline   Reply With Quote
Old 04-23-2015, 06:48 AM   #3
tokikake
Member
 
Location: Germany

Join Date: Nov 2011
Posts: 24
Default

Hi bastian,

thank you for your fast answer!
The rRNA/tRNA reads were filtered via mapping against the reference (and not with sortMeRNA). But the tmRNA was a good point. I added it to my annotations (so genes in general not only protein coding ones) and found up to 13% more reads mapping into genes. I try now to annotate the RNAs with rfam and infernal and hope to reduce the amount of outside mapped reads.
Nevertheless it is strange and I cannot see the same mapping bias in other projects, where I also didn't annotate the tmRNA (or any RNA prior to mapping, except tRNAs/rrNAs of course). Aber: Man lernt niemals aus!

Have anyone else ever tested the amount of mapped reads inside/outside genes? It would be interesting to know!
tokikake is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:09 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO