Hi everyone,
as it is mentioned in the title I have encountered a strange problem in a recent RNASeq experiment which I have never seen before.
Some background:
stranded RNASeq of prokaryotic mRNA; three samples (1 and 2 same condition; diff. harvest timepoint) with three replicates each; total RNA prep, DNAse digestion; rRNA depletion; library prep with dUTP method; Illumina sequencing; trimming; filtering of remaining tRNA/rRNA reads; mapping with bowtie2.
After read counting, normalization and diff. expression analysis with Bayseq I had a strange bias towards one sample, which seems to dominate the rest. In the end I found out, that the summarized readcounts (of mapped reads inside genes) are in 6 out of 9 samples significantly lower and thus affecting the results of the diff. Expression analysis. I never had such a problem before and checked randomly ~6 other RNASeq projects and even if the input read numbers differ, the nomalized reads counts are equally distributed.
To find out, what is the problem I counted all reads (sense and antisense reads inside and outside genes; see attachment). To my surprise there is a heavy bias towards the outside mapped genes in the problematic samples (2 and 3, all replicates).
Does anyone has a clue what is the reason for this?
I thought about DNA contamination, but then I would assume a equally distribution of reads sense and anti-sense. Or the dUTP library prep, because it is not sufficient for high GC organisms, but again there should be a equally distribution over the genome and of course over all samples. All samples, except 3.1 and 3.2, were treated (DNAse, depletion, library prep) in parallel.
I would really appreciate any hint or suggestion!
Tokikake
as it is mentioned in the title I have encountered a strange problem in a recent RNASeq experiment which I have never seen before.
Some background:
stranded RNASeq of prokaryotic mRNA; three samples (1 and 2 same condition; diff. harvest timepoint) with three replicates each; total RNA prep, DNAse digestion; rRNA depletion; library prep with dUTP method; Illumina sequencing; trimming; filtering of remaining tRNA/rRNA reads; mapping with bowtie2.
After read counting, normalization and diff. expression analysis with Bayseq I had a strange bias towards one sample, which seems to dominate the rest. In the end I found out, that the summarized readcounts (of mapped reads inside genes) are in 6 out of 9 samples significantly lower and thus affecting the results of the diff. Expression analysis. I never had such a problem before and checked randomly ~6 other RNASeq projects and even if the input read numbers differ, the nomalized reads counts are equally distributed.
To find out, what is the problem I counted all reads (sense and antisense reads inside and outside genes; see attachment). To my surprise there is a heavy bias towards the outside mapped genes in the problematic samples (2 and 3, all replicates).
Does anyone has a clue what is the reason for this?
I thought about DNA contamination, but then I would assume a equally distribution of reads sense and anti-sense. Or the dUTP library prep, because it is not sufficient for high GC organisms, but again there should be a equally distribution over the genome and of course over all samples. All samples, except 3.1 and 3.2, were treated (DNAse, depletion, library prep) in parallel.
I would really appreciate any hint or suggestion!
Tokikake
Comment