what do you understand by normalization of RNA seq data? what are the tools available for it?
Unconfigured Ad
Collapse
X
-
hi, do you mean experiment normalization or data normalization for quantification analysis?
If it is for cDNA libraries normalization, one of application is duplex-specific nuclease (DSN), which is based on the kinetics of cDNA reassociation. (refers to: P. A. Zhulidov, etc. al., A Method for the Preparation of Normalized cDNA Libraries Enriched with Full-Length Sequences. Russian Journal of Bioorganic Chemistry, Vol. 31, No. 2, 2005. and Irina Shagina, etc. al., Normalization of genomic DNA using duplex-specific nuclease. BioTechniques 48:455-459, June 2010)
Or the later, there is two general formulas for RNA-seq data normalization: RPKM (reads per kilobase per millions of reads mapped) and FPKM (fragments per kilobase per million mapped fragments), and an useful tool - Cufflinks. You can follow the previous post in SEQanswer to find more details: RNA-seq and normalization numbers (http://seqanswers.com/forums/showthr...p?t=586&page=1)
-
-
hi,everyoneOriginally posted by harshinamdar View Posthi BENM,
i meant the later one.
thank you for providing the link to this old post. that what i was looking for.thanks.
i want to use TMM method to normalization,but i encounter a question ,how can i get the normalized counts after TMM ,thank you very much.
Comment
-
-
You can use EdgeR to get TMM normalized data using calcNormFactors() in R.Originally posted by luoye View Posthi,everyone
i want to use TMM method to normalization,but i encounter a question ,how can i get the normalized counts after TMM ,thank you very much.
What do you want to use the normalized data as input for?
Comment
-
-
hi chadn737Originally posted by chadn737 View PostYou can use EdgeR to get TMM normalized data using calcNormFactors() in R.
What do you want to use the normalized data as input for?
thank you very much for your reply,I mean is that when i use EdgeR to get TMM calcNormFactors() in R to nomalization ,i want to see the difference
between normalized data and the raw data .for example ,In DESeq, you get normalized counts by dividing the raw counts by the appropriate size factor.but in edgeR ,how can i do this normalized counts ?
thank you
Comment
-
-
sorry,i can not understand what you mean,can you tell me some more detail?Originally posted by chadn737 View PostDo the same thing with the normalization factors from EdgeR. You can even feed DESeq the normalization factors from EdgeR by using sizeFactors(cds)= normalization factors from EdgeR
did you mean is: cds=calcNormFactors(cds) ,sizeFactors(cds)?
thank you very much.
Comment
-
-
When you first use DESeq, you combine a table of counts and a list of conditions to create a count data setOriginally posted by luoye View Postsorry,i can not understand what you mean,can you tell me some more detail?
did you mean is: cds=calcNormFactors(cds) ,sizeFactors(cds)?
thank you very much.
You can give the count data set your own size factors usingCode:cds <- newCountDataSet(countTable,conditions)
If you wanted to use TMM normalized sizeFactors from EdgeR rather than those given by DESeq then you can first:Code:sizeFactors(cds) <- #input
and then give this to the count data set:Code:x <- calcNormFactors(as.matrix(countTable)
Code:sizeFactors(cds) <- x
Comment
-
-
thank you very much,i do as you say,but the result is not what i expect.Originally posted by chadn737 View PostWhen you first use DESeq, you combine a table of counts and a list of conditions to create a count data set
You can give the count data set your own size factors usingCode:cds <- newCountDataSet(countTable,conditions)
If you wanted to use TMM normalized sizeFactors from EdgeR rather than those given by DESeq then you can first:Code:sizeFactors(cds) <- #input
and then give this to the count data set:Code:x <- calcNormFactors(as.matrix(countTable)
Code:sizeFactors(cds) <- x
Comment
-
-
size factors in DESeq and edgeR
Yes, both DESeq and edgeR have functions to normalize the data. However, it's wrong to assign the size factors calculated in edgeR to DESeq, though conceptually fine at first sight. Because in DESEq, the size factor is used to 'transform' the raw reads into a 'common' ground, and you can use the normalized counts for differential analysis. But the size factor in edgeR adjusts the library size so that the gene abundence (=counts/"effective library size", and "effective library size = "library size" * "size factor") is comparable across samples.Originally posted by chadn737 View PostWhen you first use DESeq, you combine a table of counts and a list of conditions to create a count data set
You can give the count data set your own size factors usingCode:cds <- newCountDataSet(countTable,conditions)
If you wanted to use TMM normalized sizeFactors from EdgeR rather than those given by DESeq then you can first:Code:sizeFactors(cds) <- #input
and then give this to the count data set:Code:x <- calcNormFactors(as.matrix(countTable)
Code:sizeFactors(cds) <- x
To illustrate this point, see example below.
> sizeFactors( deseq )Code:# data y <- x <- rep(1,100) y[1] <- 101 xy <- data.frame(x=x,y=y) #edgeR edger <- DGEList(counts=xy) edger <- calcNormFactors(edger) edger$samples #DESeq deseq = newCountDataSet( xy, conditions=c("c1","c2") ) deseq = estimateSizeFactors( deseq ) sizeFactors( deseq )
x y
1 1
> edger$samples
group lib.size norm.factors
x 1 100 1.4142
y 1 200 0.7071Last edited by Shanrong; 02-13-2013, 07:36 PM.
Comment
-
-
Thank you for this. For my own work I have not done this, but in a project where I am a collaborator, the statistician in the group did use the EdgeR normalized data for input into DESeq. I know it gives very different results, and have avoided it in my own work because the DESeq size factors seemed to give more conservative results and I prefer working with fewer genes that I am very confident in than more genes of lower confidence. I'll have to bring this up on the project that I am collaborating on.Originally posted by Shanrong View PostYes, both DESeq and edgeR have functions to normalize the data. However, it's wrong to assign the size factors calculated in edgeR to DESeq, though conceptually fine at first sight. Because in DESEq, the size factor is used to 'transform' the raw reads into a 'common' ground, and you can use the normalized counts for differential analysis. But the size factor in edgeR adjusts the library size so that the gene abundence (=counts/"effective library size", and "effective library size = "library size" * "size factor") is comparable across samples.
To illustrate this point, see example below.
> sizeFactors( deseq )Code:# data y <- x <- rep(1,100) y[1] <- 101 xy <- data.frame(x=x,y=y) #edgeR edger <- DGEList(counts=xy) edger <- calcNormFactors(edger) edger$samples #DESeq deseq = newCountDataSet( xy, conditions=c("c1","c2") ) deseq = estimateSizeFactors( deseq ) sizeFactors( deseq )
x y
1 1
> edger$samples
group lib.size norm.factors
x 1 100 1.4142
y 1 200 0.7071
Comment
-
-
Hi everyone,
I'm dealing with the two normalization methods DESeq and edgeR.
I have two conditions and only one replicate per condition (I know, bad experimental design...) and I tried to normalize the raw counts.
With bot the normalization methods I obtain size factors very different:
-using DESeq 0,095 for one library and 10,85 for the other.
-using edgeR 0,14 and 7,2 respectively.
Obviously, by dividing the raw counts for the corrisponding size factor, the raw counts drammatically change, sometimes inverting the starting conditions (an upregulated gene become dowregulated).
Does it make sense?
do you think it's correct to use this normalization methods despite the weird results??
Thank you all
Comment
-
-
Looks like you have a huge difference (is that a 20-50 fold difference?) in read count between conditions, this is a problem because the normalization will significantly amplify the noise of the smaller sample making the (already unreliable without replicates) data less reliable.Originally posted by Marianna85 View PostHi everyone,
I'm dealing with the two normalization methods DESeq and edgeR.
I have two conditions and only one replicate per condition (I know, bad experimental design...) and I tried to normalize the raw counts.
With bot the normalization methods I obtain size factors very different:
-using DESeq 0,095 for one library and 10,85 for the other.
-using edgeR 0,14 and 7,2 respectively.
Obviously, by dividing the raw counts for the corrisponding size factor, the raw counts drammatically change, sometimes inverting the starting conditions (an upregulated gene become dowregulated).
Does it make sense?
do you think it's correct to use this normalization methods despite the weird results??
Thank you all
But yes, you need to normalize. Is more sequencing an option?
Comment
-
-
Hi Jeremy,
yes I have a huge difference in read count between conditions: 36 vs 64 million of total reads.
I do not have other options unfortunately and I want to make a simple differential espression analysis, maybe with few differential expressed genes.
I know that, without replicates, it's difficult to make a DE analysis and I don't want to reach false conclusions. I know I have to be very conservative to say something really reliable...but how??
In your opinion, can I discard some genes, for example those with a low count reads, and make the normalization for the remaining ones?

Thanks a lot

Marianna
Comment
-
Latest Articles
Collapse
-
by SEQadmin2
Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.
The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
...-
Channel: Articles
Yesterday, 10:05 AM -
-
by SEQadmin2
With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.
Introduction
Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...-
Channel: Articles
05-22-2026, 06:42 AM -
-
by SEQadmin2
Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.
Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...-
Channel: Articles
05-06-2026, 09:04 AM -
ad_right_rmr
Collapse
News
Collapse
| Topics | Statistics | Last Post | ||
|---|---|---|---|---|
|
Started by SEQadmin2, Yesterday, 12:03 PM
|
0 responses
19 views
0 reactions
|
Last Post
by SEQadmin2
Yesterday, 12:03 PM
|
||
|
Started by SEQadmin2, Yesterday, 11:40 AM
|
0 responses
14 views
0 reactions
|
Last Post
by SEQadmin2
Yesterday, 11:40 AM
|
||
|
Started by SEQadmin2, 05-28-2026, 11:40 AM
|
0 responses
29 views
0 reactions
|
Last Post
by SEQadmin2
05-28-2026, 11:40 AM
|
||
|
Started by SEQadmin2, 05-26-2026, 10:12 AM
|
0 responses
31 views
0 reactions
|
Last Post
by SEQadmin2
05-26-2026, 10:12 AM
|
Comment