SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
RNA seq data normalization question slny Bioinformatics 35 10-19-2016 05:32 AM
RNA-seq and normalization numbers zee Bioinformatics 52 12-12-2012 05:44 AM
Use of DSN normalization in SOLiD RNA-seq? daughart RNA Sequencing 4 03-09-2012 12:40 PM
RNA-Seq: GC-Content Normalization for RNA-Seq Data. Newsbot! Literature Watch 0 12-20-2011 02:00 AM
clarification of rna-seq normalization frymor Bioinformatics 3 08-06-2011 06:07 PM

Reply
 
Thread Tools
Old 07-04-2011, 10:35 PM   #1
harshinamdar
harry
 
Location: India

Join Date: Jun 2010
Posts: 14
Default RNA Seq normalization

what do you understand by normalization of RNA seq data? what are the tools available for it?
harshinamdar is offline   Reply With Quote
Old 07-05-2011, 12:19 AM   #2
BENM
Member
 
Location: PRC

Join Date: May 2009
Posts: 33
Default

hi, do you mean experiment normalization or data normalization for quantification analysis?

If it is for cDNA libraries normalization, one of application is duplex-specific nuclease (DSN), which is based on the kinetics of cDNA reassociation. (refers to: P. A. Zhulidov, etc. al., A Method for the Preparation of Normalized cDNA Libraries Enriched with Full-Length Sequences. Russian Journal of Bioorganic Chemistry, Vol. 31, No. 2, 2005. and Irina Shagina, etc. al., Normalization of genomic DNA using duplex-specific nuclease. BioTechniques 48:455-459, June 2010)

Or the later, there is two general formulas for RNA-seq data normalization: RPKM (reads per kilobase per millions of reads mapped) and FPKM (fragments per kilobase per million mapped fragments), and an useful tool - Cufflinks. You can follow the previous post in SEQanswer to find more details: RNA-seq and normalization numbers (http://seqanswers.com/forums/showthr...p?t=586&page=1)
BENM is offline   Reply With Quote
Old 07-05-2011, 02:30 AM   #3
harshinamdar
harry
 
Location: India

Join Date: Jun 2010
Posts: 14
Default

hi BENM,
i meant the later one.
thank you for providing the link to this old post. that what i was looking for.thanks.
harshinamdar is offline   Reply With Quote
Old 12-27-2012, 06:05 PM   #4
luoye
Junior Member
 
Location: 厦门

Join Date: Dec 2012
Posts: 6
Default

Quote:
Originally Posted by harshinamdar View Post
hi BENM,
i meant the later one.
thank you for providing the link to this old post. that what i was looking for.thanks.
hi,everyone
i want to use TMM method to normalization,but i encounter a question ,how can i get the normalized counts after TMM ,thank you very much.
luoye is offline   Reply With Quote
Old 12-27-2012, 09:35 PM   #5
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

Quote:
Originally Posted by luoye View Post
hi,everyone
i want to use TMM method to normalization,but i encounter a question ,how can i get the normalized counts after TMM ,thank you very much.
You can use EdgeR to get TMM normalized data using calcNormFactors() in R.

What do you want to use the normalized data as input for?
chadn737 is offline   Reply With Quote
Old 12-28-2012, 12:20 AM   #6
luoye
Junior Member
 
Location: 厦门

Join Date: Dec 2012
Posts: 6
Default

Quote:
Originally Posted by chadn737 View Post
You can use EdgeR to get TMM normalized data using calcNormFactors() in R.

What do you want to use the normalized data as input for?
hi chadn737
thank you very much for your reply,I mean is that when i use EdgeR to get TMM calcNormFactors() in R to nomalization ,i want to see the difference
between normalized data and the raw data .for example ,In DESeq, you get normalized counts by dividing the raw counts by the appropriate size factor.but in edgeR ,how can i do this normalized counts ?
thank you
luoye is offline   Reply With Quote
Old 12-28-2012, 06:20 AM   #7
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

Do the same thing with the normalization factors from EdgeR. You can even feed DESeq the normalization factors from EdgeR by using sizeFactors(cds)= normalization factors from EdgeR
chadn737 is offline   Reply With Quote
Old 12-28-2012, 04:39 PM   #8
luoye
Junior Member
 
Location: 厦门

Join Date: Dec 2012
Posts: 6
Default

Quote:
Originally Posted by chadn737 View Post
Do the same thing with the normalization factors from EdgeR. You can even feed DESeq the normalization factors from EdgeR by using sizeFactors(cds)= normalization factors from EdgeR
sorry,i can not understand what you mean,can you tell me some more detail?
did you mean is: cds=calcNormFactors(cds) ,sizeFactors(cds)?
thank you very much.
luoye is offline   Reply With Quote
Old 12-31-2012, 07:59 AM   #9
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

Quote:
Originally Posted by luoye View Post
sorry,i can not understand what you mean,can you tell me some more detail?
did you mean is: cds=calcNormFactors(cds) ,sizeFactors(cds)?
thank you very much.
When you first use DESeq, you combine a table of counts and a list of conditions to create a count data set

Code:
cds <- newCountDataSet(countTable,conditions)
You can give the count data set your own size factors using

Code:
sizeFactors(cds) <- #input
If you wanted to use TMM normalized sizeFactors from EdgeR rather than those given by DESeq then you can first:

Code:
x <- calcNormFactors(as.matrix(countTable)
and then give this to the count data set:

Code:
sizeFactors(cds) <- x
chadn737 is offline   Reply With Quote
Old 12-31-2012, 08:16 PM   #10
luoye
Junior Member
 
Location: 厦门

Join Date: Dec 2012
Posts: 6
Default

Quote:
Originally Posted by chadn737 View Post
When you first use DESeq, you combine a table of counts and a list of conditions to create a count data set

Code:
cds <- newCountDataSet(countTable,conditions)
You can give the count data set your own size factors using

Code:
sizeFactors(cds) <- #input
If you wanted to use TMM normalized sizeFactors from EdgeR rather than those given by DESeq then you can first:

Code:
x <- calcNormFactors(as.matrix(countTable)
and then give this to the count data set:

Code:
sizeFactors(cds) <- x
thank you very much,i do as you say,but the result is not what i expect.
luoye is offline   Reply With Quote
Old 02-13-2013, 06:28 PM   #11
Shanrong
Junior Member
 
Location: San Diego

Join Date: Dec 2012
Posts: 7
Default size factors in DESeq and edgeR

Quote:
Originally Posted by chadn737 View Post
When you first use DESeq, you combine a table of counts and a list of conditions to create a count data set

Code:
cds <- newCountDataSet(countTable,conditions)
You can give the count data set your own size factors using

Code:
sizeFactors(cds) <- #input
If you wanted to use TMM normalized sizeFactors from EdgeR rather than those given by DESeq then you can first:

Code:
x <- calcNormFactors(as.matrix(countTable)
and then give this to the count data set:

Code:
sizeFactors(cds) <- x
Yes, both DESeq and edgeR have functions to normalize the data. However, it's wrong to assign the size factors calculated in edgeR to DESeq, though conceptually fine at first sight. Because in DESEq, the size factor is used to 'transform' the raw reads into a 'common' ground, and you can use the normalized counts for differential analysis. But the size factor in edgeR adjusts the library size so that the gene abundence (=counts/"effective library size", and "effective library size = "library size" * "size factor") is comparable across samples.

To illustrate this point, see example below.

Code:
# data
y <- x <- rep(1,100)
y[1] <- 101  
xy <- data.frame(x=x,y=y)

#edgeR
edger <- DGEList(counts=xy)
edger <- calcNormFactors(edger)
edger$samples

#DESeq
deseq = newCountDataSet( xy, conditions=c("c1","c2") )
deseq = estimateSizeFactors( deseq )
sizeFactors( deseq )
> sizeFactors( deseq )
x y
1 1

> edger$samples
group lib.size norm.factors
x 1 100 1.4142
y
1 200 0.7071

Last edited by Shanrong; 02-13-2013 at 06:36 PM.
Shanrong is offline   Reply With Quote
Old 02-13-2013, 07:22 PM   #12
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

Quote:
Originally Posted by Shanrong View Post
Yes, both DESeq and edgeR have functions to normalize the data. However, it's wrong to assign the size factors calculated in edgeR to DESeq, though conceptually fine at first sight. Because in DESEq, the size factor is used to 'transform' the raw reads into a 'common' ground, and you can use the normalized counts for differential analysis. But the size factor in edgeR adjusts the library size so that the gene abundence (=counts/"effective library size", and "effective library size = "library size" * "size factor") is comparable across samples.

To illustrate this point, see example below.

Code:
# data
y <- x <- rep(1,100)
y[1] <- 101  
xy <- data.frame(x=x,y=y)

#edgeR
edger <- DGEList(counts=xy)
edger <- calcNormFactors(edger)
edger$samples

#DESeq
deseq = newCountDataSet( xy, conditions=c("c1","c2") )
deseq = estimateSizeFactors( deseq )
sizeFactors( deseq )
> sizeFactors( deseq )
x y
1 1

> edger$samples
group lib.size norm.factors
x 1 100 1.4142
y
1 200 0.7071
Thank you for this. For my own work I have not done this, but in a project where I am a collaborator, the statistician in the group did use the EdgeR normalized data for input into DESeq. I know it gives very different results, and have avoided it in my own work because the DESeq size factors seemed to give more conservative results and I prefer working with fewer genes that I am very confident in than more genes of lower confidence. I'll have to bring this up on the project that I am collaborating on.
chadn737 is offline   Reply With Quote
Old 03-03-2013, 12:49 PM   #13
Marianna85
Member
 
Location: Italy

Join Date: Mar 2012
Posts: 32
Default

Hi everyone,
I'm dealing with the two normalization methods DESeq and edgeR.
I have two conditions and only one replicate per condition (I know, bad experimental design...) and I tried to normalize the raw counts.
With bot the normalization methods I obtain size factors very different:
-using DESeq 0,095 for one library and 10,85 for the other.
-using edgeR 0,14 and 7,2 respectively.

Obviously, by dividing the raw counts for the corrisponding size factor, the raw counts drammatically change, sometimes inverting the starting conditions (an upregulated gene become dowregulated).

Does it make sense?
do you think it's correct to use this normalization methods despite the weird results??

Thank you all
Marianna85 is offline   Reply With Quote
Old 03-03-2013, 08:32 PM   #14
Jeremy
Senior Member
 
Location: Pathum Thani, Thailand

Join Date: Nov 2009
Posts: 190
Default

Quote:
Originally Posted by Marianna85 View Post
Hi everyone,
I'm dealing with the two normalization methods DESeq and edgeR.
I have two conditions and only one replicate per condition (I know, bad experimental design...) and I tried to normalize the raw counts.
With bot the normalization methods I obtain size factors very different:
-using DESeq 0,095 for one library and 10,85 for the other.
-using edgeR 0,14 and 7,2 respectively.

Obviously, by dividing the raw counts for the corrisponding size factor, the raw counts drammatically change, sometimes inverting the starting conditions (an upregulated gene become dowregulated).

Does it make sense?
do you think it's correct to use this normalization methods despite the weird results??

Thank you all
Looks like you have a huge difference (is that a 20-50 fold difference?) in read count between conditions, this is a problem because the normalization will significantly amplify the noise of the smaller sample making the (already unreliable without replicates) data less reliable.
But yes, you need to normalize. Is more sequencing an option?
Jeremy is offline   Reply With Quote
Old 03-04-2013, 01:23 AM   #15
Marianna85
Member
 
Location: Italy

Join Date: Mar 2012
Posts: 32
Default

Hi Jeremy,
yes I have a huge difference in read count between conditions: 36 vs 64 million of total reads.
I do not have other options unfortunately and I want to make a simple differential espression analysis, maybe with few differential expressed genes.
I know that, without replicates, it's difficult to make a DE analysis and I don't want to reach false conclusions. I know I have to be very conservative to say something really reliable...but how??
In your opinion, can I discard some genes, for example those with a low count reads, and make the normalization for the remaining ones?

Thanks a lot
Marianna
Marianna85 is offline   Reply With Quote
Old 03-04-2013, 10:27 AM   #16
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Quote:
Originally Posted by Marianna85 View Post
With bot the normalization methods I obtain size factors very different:
-using DESeq 0,095 for one library and 10,85 for the other.
-using edgeR 0,14 and 7,2 respectively.
This is because edgeR gives is norm factors relative to the total read count. To get expression values on a scale comparable across sample, you have to divide the counts
- for DESeq, just by the size factod
- for edgeR, by the total read counts and by the normalization factors
Simon Anders is offline   Reply With Quote
Old 03-04-2013, 11:57 AM   #17
Marianna85
Member
 
Location: Italy

Join Date: Mar 2012
Posts: 32
Default

Hi Simon,
happy to read your answer

Quote:
Originally Posted by Simon Anders View Post
This is because edgeR gives is norm factors relative to the total read count. To get expression values on a scale comparable across sample, you have to divide the counts
- for DESeq, just by the size factod
- for edgeR, by the total read counts and by the normalization factors
Just an example to better understand.
For DESeq
gene A raw counts; 5 reads in library 1 - 70 reads in library 2
library 1:36 million reads size factor: 0.09
library 2:64 million reads size factor: 10
gene A normalized counts lib 1=5/0.09 - lib2=70/10

For edgeR
gene A raw counts; 5 reads in library 1 - 70 reads in library 2
library 1:36 million reads size factor: 0.14
library 2:64 million reads size factor: 7.2
gene A normalized counts lib 1=(5/36 million)/0.14 - lib2=(70/64million)/7.2

in this case with a very huge difference in library size, it seems better to normalize with edgeR. Isn't it?

Thanks a lot.
I really appreciate your answer.

Marianna
Marianna85 is offline   Reply With Quote
Old 03-04-2013, 04:36 PM   #18
Shanrong
Junior Member
 
Location: San Diego

Join Date: Dec 2012
Posts: 7
Default

I am using both edgeR and DESeq to analyze my own dataset. In general, the fold change reported by both are close (should be, right?).

If your orignal librares have 36 vs 64 million of total reads, your normalization factors from both methods are very weird (at least quite unusual). Please check whether you analyze your dataset properly.

The idea of normalization behind edgeR and DESeq is very similar to each other (but implementation is different). In practice, I don't see one method is superior than the other. However, it is definitely a mistake if we feed DESeq with the normalization factor from edgeR, and vice versa.
Shanrong is offline   Reply With Quote
Old 03-04-2013, 04:46 PM   #19
Jeremy
Senior Member
 
Location: Pathum Thani, Thailand

Join Date: Nov 2009
Posts: 190
Default

Something is going wrong somewhere, 36M and 64M reads should give normalization factors with less than a 2-fold difference. The normalization factors you listed had a 50 fold difference and suggest a much greater difference in read count.

Last edited by Jeremy; 03-04-2013 at 04:50 PM.
Jeremy is offline   Reply With Quote
Old 03-12-2013, 08:48 AM   #20
Marianna85
Member
 
Location: Italy

Join Date: Mar 2012
Posts: 32
Default

Quote:
Originally Posted by Jeremy View Post
Something is going wrong somewhere, 36M and 64M reads should give normalization factors with less than a 2-fold difference. The normalization factors you listed had a 50 fold difference and suggest a much greater difference in read count.
Hi Jeremy,
in fact I was surprised to obtain such a difference...
I've not yet understood which is the mistake in the size factor calculation.
Marianna85 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:01 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO