I've discovered that my RNAseq libraries contain different levels of a very abundant transcript that is derived from mitochondria but which seems to be present even after polyA selection. It represents about 10-20% of the reads in most libraries, although in one library it represents as few as 3% of the reads and in another it represents 50% of the reads. Clustering of the libraries using DEseq indicates that the library with 50% contamination is an outlier and it doesn't cluster with the other replicates. I'm concerned that this transcript is going to skew the normalization procedure used by DEseq and I wonder if it would be best to remove the counts for this gene before running DEseq? How are people dealing with libraries that have unusually high levels of ribosomal rRNA contamination?
Cheers
Cheers
Comment