Seqanswers Leaderboard Ad

**dpryan** · 06-30-2014, 06:57 AM

That's a pretty classic batch effect. Use the SVA package (specifically, ComBat()) with DESeq2/edgeR/limma and you'll get more meaningful results. As to why this occurred, who knows. I've seen prominent library creation date batch effects before, so if the libraries were made on different dates then that could certainly be the original source of the problem.

**emolinari** · 06-30-2014, 07:34 AM

Originally posted by dpryan View Post

That's a pretty classic batch effect. Use the SVA package (specifically, ComBat()) with DESeq2/edgeR/limma and you'll get more meaningful results. As to why this occurred, who knows. I've seen prominent library creation date batch effects before, so if the libraries were made on different dates then that could certainly be the original source of the problem.

Thanks dpryan for your comment,
The libraries where indeed prepped on different dates, and cluster accordingly (with the only exception of one sample, that was prepped alone one day and clusters with the big group on the right). I will try to do what you suggest...still you don't think that the properly paired mates rate has an influence in determining the clustering? I am asking because maybe there is a way to try to fix it...

Thanks again!

**dpryan** · 06-30-2014, 07:50 AM

That could be the cause as well (my guess would be that it's not, but that's just a guess). Just subset the alignments to contain only properly paired alignments and then look at the PCA plot. If the clustering goes away then you know that's the cause and will have also solved the problem (though you'd be throwing information away, so you still might get slightly better results using ComBat()).

**emolinari** · 06-30-2014, 08:04 AM

Originally posted by dpryan View Post

That could be the cause as well (my guess would be that it's not, but that's just a guess). Just subset the alignments to contain only properly paired alignments and then look at the PCA plot. If the clustering goes away then you know that's the cause and will have also solved the problem (though you'd be throwing information away, so you still might get slightly better results using ComBat()).

I see what you mean, I'll try to that! A quick question though: should I use FPKM values or HTSeq?

**dpryan** · 06-30-2014, 08:09 AM

HTSeq (or featureCounts, which is faster).

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Batch Effect

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News