I have been looking at some Illumina RNA-seq samples (84bp single read, bovine samples, 28 samples, 5 timepoints - developmental timecourse).
In 9 of the samples from this experiment we are seeing a high rate of reads aligning to intergenic regions. The high background means that the RPKM values for transcripts are dramatically different between the "good" samples and the "Bad" samples.
The bad samples are not clustering to a single group - although all samples in one timepoint are affected - and none in another. (The remaining bad samples are scattered among the remaining 3 groups.)
Has anyone else come across this problem before?
Could it be caused by problems in tissue collection or sample prep?
Alternatively could there be a biological explanation for this?
Are there any suggested lab or bioinformatic approaches to dealing with this problem?
We are evaluating applying normalization techniques to the samples to make the samples more comparable -does anyone have any thoughts on the validity of this approach?
In 9 of the samples from this experiment we are seeing a high rate of reads aligning to intergenic regions. The high background means that the RPKM values for transcripts are dramatically different between the "good" samples and the "Bad" samples.
The bad samples are not clustering to a single group - although all samples in one timepoint are affected - and none in another. (The remaining bad samples are scattered among the remaining 3 groups.)
Has anyone else come across this problem before?
Could it be caused by problems in tissue collection or sample prep?
Alternatively could there be a biological explanation for this?
Are there any suggested lab or bioinformatic approaches to dealing with this problem?
We are evaluating applying normalization techniques to the samples to make the samples more comparable -does anyone have any thoughts on the validity of this approach?
Comment