Hi guys,
I am currently struggling over an issue which I am not really sure how to solve.
In brief, I have run 4 sessions of PE seq on human cells (3 cohorts, Young, Old and Control). I have followed the Tophat-Cufflinks-Cuffmerge-Cuffdiff pipeline and visualized the data on Cummerbund.
When running the MDS plot I see 3 different clusters, and I can clearly assess that the samples cluster according to the sequencing session rather than the proper cohort. I've checked the quality of the data several times, and all the logs look ok. The only "weird" behavior is the properly mate reads rate, that for one group is on average 80%, for the other 72% and the third 60%.
Could this thing alone determine such a strange clustering or is rather a "batch effect"? Any suggestions for solving it?
BTW, I also have HTSeq counts of this data...do you thing I should use that on a different program, such as DeSeq or EdgeR???
Please help!!!
Manu
I am currently struggling over an issue which I am not really sure how to solve.
In brief, I have run 4 sessions of PE seq on human cells (3 cohorts, Young, Old and Control). I have followed the Tophat-Cufflinks-Cuffmerge-Cuffdiff pipeline and visualized the data on Cummerbund.
When running the MDS plot I see 3 different clusters, and I can clearly assess that the samples cluster according to the sequencing session rather than the proper cohort. I've checked the quality of the data several times, and all the logs look ok. The only "weird" behavior is the properly mate reads rate, that for one group is on average 80%, for the other 72% and the third 60%.
Could this thing alone determine such a strange clustering or is rather a "batch effect"? Any suggestions for solving it?
BTW, I also have HTSeq counts of this data...do you thing I should use that on a different program, such as DeSeq or EdgeR???
Please help!!!
Manu
Comment