Unconfigured Ad

**Simon Anders** · 05-21-2010, 08:34 AM

I'm not quite sure what you mean by switching. Are you now comparing treatment from batch 1 with control from batch 2?

But two answer your question: Both DESeq and edgeR adjust for library size. While edgeR uses the library sizes that you tell it, DESeq tries to estimate them from the data.

To see whether this worked well, I'd suggest that you choose pairs of samples and divide all the counts from one sample by the size factor for this sample (for DESeq; for edgeR, take the total read count) and do likewise for the other. Then plot one against the other in a log-log scatter plot and mark the diagonal (with abline(a=0,b=1) ). Check that the points scatter symmetrically around the diagonal. Do this for a couple of sample pairs.

In my experience, however, the library size normalisation works well and is unlikely to be the culprit.

A good idea might be to check sample distances: With DESeq, make a CountDataSet containing all 12 of your samples. The perform a variance stabilizing transformation, get a distance matrix for the variance transformed matrix and plot it as a heatmap. I have described this procedure in the DESeq vignette. If all is well, the replicates should cluster together. If a sample does not cluster with its replicates, you might want to exclude it from the analysis.

Lastly, have a look at the scvPlots in your four batch-condition combinations. What is the raw SCV value in the region of highest count density, i.e., at the peak of the black density curve? Is it maybe much larger in some cases than in others?

Cheers
Simon

**markrobinsonca** · 05-23-2010, 04:01 PM

A couple comments from the edgeR camp ...

I agree with Simon that just a pairs() plots of read counts is a useful initial diagnostic, especially if you think you might have sample switching (I didn't fully understand what was switched from your description). Also, M-vs-A plots (edgeR does 'smear' plots) would be quite useful.

One clarification of what Simon said with respect to edgeR. While its true that edgeR uses the library sizes "that you tell it", there is a function in there for calculating normalization factors from the data -- calcNormFactors() -- and a description in the manual of how to build that into your library sizes. I haven't compared directly, but its roughly similar to the DEseq calculation for this. The normalization (which is beyond just accounting for library size) is described at:

Application Unavailable | Springer Nature

http://genomebiology.com/2010/11/3/R25/abstract

Another alternative to explore sample relations is the plotMDS.dge() function in edgeR. This is essentially a principal components plot, but specific to count data.

Hope that helps.

Cheers,
Mark

**Simon Anders** · 05-26-2010, 04:45 AM

Hi Mark

It seems I haven looked into the edgeR vignette for a while and missed that you added a size estimation by now.

You are right, your and our scheme are very much the same. We all had the same idea of looking at the quotient between individual gene counts and taking some robust location estimator of their distribution. The only difference is that you used a trimmed mean and we went all the way to maximal trimming, i.e., used the median. This definitely shouldn't make much of a difference.

Simon

Topics	Statistics	Last Post
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, Today, 05:37 AM	0 responses 5 views 0 reactions	Last Post by SEQadmin2 Today, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 50 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 109 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM

Unconfigured Ad

Can DESeq and edgeR deal with in-balanced RNA-seq data?

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News