Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • harshinamdar
    harry
    • Jun 2010
    • 14

    RNA Seq normalization

    what do you understand by normalization of RNA seq data? what are the tools available for it?
  • BENM
    Member
    • May 2009
    • 33

    #2
    hi, do you mean experiment normalization or data normalization for quantification analysis?

    If it is for cDNA libraries normalization, one of application is duplex-specific nuclease (DSN), which is based on the kinetics of cDNA reassociation. (refers to: P. A. Zhulidov, etc. al., A Method for the Preparation of Normalized cDNA Libraries Enriched with Full-Length Sequences. Russian Journal of Bioorganic Chemistry, Vol. 31, No. 2, 2005. and Irina Shagina, etc. al., Normalization of genomic DNA using duplex-specific nuclease. BioTechniques 48:455-459, June 2010)

    Or the later, there is two general formulas for RNA-seq data normalization: RPKM (reads per kilobase per millions of reads mapped) and FPKM (fragments per kilobase per million mapped fragments), and an useful tool - Cufflinks. You can follow the previous post in SEQanswer to find more details: RNA-seq and normalization numbers (http://seqanswers.com/forums/showthr...p?t=586&page=1)

    Comment

    • harshinamdar
      harry
      • Jun 2010
      • 14

      #3
      hi BENM,
      i meant the later one.
      thank you for providing the link to this old post. that what i was looking for.thanks.

      Comment

      • luoye
        Junior Member
        • Dec 2012
        • 6

        #4
        Originally posted by harshinamdar View Post
        hi BENM,
        i meant the later one.
        thank you for providing the link to this old post. that what i was looking for.thanks.
        hi,everyone
        i want to use TMM method to normalization,but i encounter a question ,how can i get the normalized counts after TMM ,thank you very much.

        Comment

        • chadn737
          Senior Member
          • Jan 2009
          • 392

          #5
          Originally posted by luoye View Post
          hi,everyone
          i want to use TMM method to normalization,but i encounter a question ,how can i get the normalized counts after TMM ,thank you very much.
          You can use EdgeR to get TMM normalized data using calcNormFactors() in R.

          What do you want to use the normalized data as input for?

          Comment

          • luoye
            Junior Member
            • Dec 2012
            • 6

            #6
            Originally posted by chadn737 View Post
            You can use EdgeR to get TMM normalized data using calcNormFactors() in R.

            What do you want to use the normalized data as input for?
            hi chadn737
            thank you very much for your reply,I mean is that when i use EdgeR to get TMM calcNormFactors() in R to nomalization ,i want to see the difference
            between normalized data and the raw data .for example ,In DESeq, you get normalized counts by dividing the raw counts by the appropriate size factor.but in edgeR ,how can i do this normalized counts ?
            thank you

            Comment

            • chadn737
              Senior Member
              • Jan 2009
              • 392

              #7
              Do the same thing with the normalization factors from EdgeR. You can even feed DESeq the normalization factors from EdgeR by using sizeFactors(cds)= normalization factors from EdgeR

              Comment

              • luoye
                Junior Member
                • Dec 2012
                • 6

                #8
                Originally posted by chadn737 View Post
                Do the same thing with the normalization factors from EdgeR. You can even feed DESeq the normalization factors from EdgeR by using sizeFactors(cds)= normalization factors from EdgeR
                sorry,i can not understand what you mean,can you tell me some more detail?
                did you mean is: cds=calcNormFactors(cds) ,sizeFactors(cds)?
                thank you very much.

                Comment

                • chadn737
                  Senior Member
                  • Jan 2009
                  • 392

                  #9
                  Originally posted by luoye View Post
                  sorry,i can not understand what you mean,can you tell me some more detail?
                  did you mean is: cds=calcNormFactors(cds) ,sizeFactors(cds)?
                  thank you very much.
                  When you first use DESeq, you combine a table of counts and a list of conditions to create a count data set

                  Code:
                  cds <- newCountDataSet(countTable,conditions)
                  You can give the count data set your own size factors using

                  Code:
                  sizeFactors(cds) <- #input
                  If you wanted to use TMM normalized sizeFactors from EdgeR rather than those given by DESeq then you can first:

                  Code:
                  x <- calcNormFactors(as.matrix(countTable)
                  and then give this to the count data set:

                  Code:
                  sizeFactors(cds) <- x

                  Comment

                  • luoye
                    Junior Member
                    • Dec 2012
                    • 6

                    #10
                    Originally posted by chadn737 View Post
                    When you first use DESeq, you combine a table of counts and a list of conditions to create a count data set

                    Code:
                    cds <- newCountDataSet(countTable,conditions)
                    You can give the count data set your own size factors using

                    Code:
                    sizeFactors(cds) <- #input
                    If you wanted to use TMM normalized sizeFactors from EdgeR rather than those given by DESeq then you can first:

                    Code:
                    x <- calcNormFactors(as.matrix(countTable)
                    and then give this to the count data set:

                    Code:
                    sizeFactors(cds) <- x
                    thank you very much,i do as you say,but the result is not what i expect.

                    Comment

                    • Shanrong
                      Junior Member
                      • Dec 2012
                      • 7

                      #11
                      size factors in DESeq and edgeR

                      Originally posted by chadn737 View Post
                      When you first use DESeq, you combine a table of counts and a list of conditions to create a count data set

                      Code:
                      cds <- newCountDataSet(countTable,conditions)
                      You can give the count data set your own size factors using

                      Code:
                      sizeFactors(cds) <- #input
                      If you wanted to use TMM normalized sizeFactors from EdgeR rather than those given by DESeq then you can first:

                      Code:
                      x <- calcNormFactors(as.matrix(countTable)
                      and then give this to the count data set:

                      Code:
                      sizeFactors(cds) <- x
                      Yes, both DESeq and edgeR have functions to normalize the data. However, it's wrong to assign the size factors calculated in edgeR to DESeq, though conceptually fine at first sight. Because in DESEq, the size factor is used to 'transform' the raw reads into a 'common' ground, and you can use the normalized counts for differential analysis. But the size factor in edgeR adjusts the library size so that the gene abundence (=counts/"effective library size", and "effective library size = "library size" * "size factor") is comparable across samples.

                      To illustrate this point, see example below.

                      Code:
                      # data
                      y <- x <- rep(1,100)
                      y[1] <- 101  
                      xy <- data.frame(x=x,y=y)
                      
                      #edgeR
                      edger <- DGEList(counts=xy)
                      edger <- calcNormFactors(edger)
                      edger$samples
                      
                      #DESeq
                      deseq = newCountDataSet( xy, conditions=c("c1","c2") )
                      deseq = estimateSizeFactors( deseq )
                      sizeFactors( deseq )
                      > sizeFactors( deseq )
                      x y
                      1 1

                      > edger$samples
                      group lib.size norm.factors
                      x 1 100 1.4142
                      y
                      1 200 0.7071
                      Last edited by Shanrong; 02-13-2013, 07:36 PM.

                      Comment

                      • chadn737
                        Senior Member
                        • Jan 2009
                        • 392

                        #12
                        Originally posted by Shanrong View Post
                        Yes, both DESeq and edgeR have functions to normalize the data. However, it's wrong to assign the size factors calculated in edgeR to DESeq, though conceptually fine at first sight. Because in DESEq, the size factor is used to 'transform' the raw reads into a 'common' ground, and you can use the normalized counts for differential analysis. But the size factor in edgeR adjusts the library size so that the gene abundence (=counts/"effective library size", and "effective library size = "library size" * "size factor") is comparable across samples.

                        To illustrate this point, see example below.

                        Code:
                        # data
                        y <- x <- rep(1,100)
                        y[1] <- 101  
                        xy <- data.frame(x=x,y=y)
                        
                        #edgeR
                        edger <- DGEList(counts=xy)
                        edger <- calcNormFactors(edger)
                        edger$samples
                        
                        #DESeq
                        deseq = newCountDataSet( xy, conditions=c("c1","c2") )
                        deseq = estimateSizeFactors( deseq )
                        sizeFactors( deseq )
                        > sizeFactors( deseq )
                        x y
                        1 1

                        > edger$samples
                        group lib.size norm.factors
                        x 1 100 1.4142
                        y
                        1 200 0.7071
                        Thank you for this. For my own work I have not done this, but in a project where I am a collaborator, the statistician in the group did use the EdgeR normalized data for input into DESeq. I know it gives very different results, and have avoided it in my own work because the DESeq size factors seemed to give more conservative results and I prefer working with fewer genes that I am very confident in than more genes of lower confidence. I'll have to bring this up on the project that I am collaborating on.

                        Comment

                        • Marianna85
                          Member
                          • Mar 2012
                          • 32

                          #13
                          Hi everyone,
                          I'm dealing with the two normalization methods DESeq and edgeR.
                          I have two conditions and only one replicate per condition (I know, bad experimental design...) and I tried to normalize the raw counts.
                          With bot the normalization methods I obtain size factors very different:
                          -using DESeq 0,095 for one library and 10,85 for the other.
                          -using edgeR 0,14 and 7,2 respectively.

                          Obviously, by dividing the raw counts for the corrisponding size factor, the raw counts drammatically change, sometimes inverting the starting conditions (an upregulated gene become dowregulated).

                          Does it make sense?
                          do you think it's correct to use this normalization methods despite the weird results??

                          Thank you all

                          Comment

                          • Jeremy
                            Senior Member
                            • Nov 2009
                            • 190

                            #14
                            Originally posted by Marianna85 View Post
                            Hi everyone,
                            I'm dealing with the two normalization methods DESeq and edgeR.
                            I have two conditions and only one replicate per condition (I know, bad experimental design...) and I tried to normalize the raw counts.
                            With bot the normalization methods I obtain size factors very different:
                            -using DESeq 0,095 for one library and 10,85 for the other.
                            -using edgeR 0,14 and 7,2 respectively.

                            Obviously, by dividing the raw counts for the corrisponding size factor, the raw counts drammatically change, sometimes inverting the starting conditions (an upregulated gene become dowregulated).

                            Does it make sense?
                            do you think it's correct to use this normalization methods despite the weird results??

                            Thank you all
                            Looks like you have a huge difference (is that a 20-50 fold difference?) in read count between conditions, this is a problem because the normalization will significantly amplify the noise of the smaller sample making the (already unreliable without replicates) data less reliable.
                            But yes, you need to normalize. Is more sequencing an option?

                            Comment

                            • Marianna85
                              Member
                              • Mar 2012
                              • 32

                              #15
                              Hi Jeremy,
                              yes I have a huge difference in read count between conditions: 36 vs 64 million of total reads.
                              I do not have other options unfortunately and I want to make a simple differential espression analysis, maybe with few differential expressed genes.
                              I know that, without replicates, it's difficult to make a DE analysis and I don't want to reach false conclusions. I know I have to be very conservative to say something really reliable...but how??
                              In your opinion, can I discard some genes, for example those with a low count reads, and make the normalization for the remaining ones?

                              Thanks a lot
                              Marianna

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                Yesterday, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 12:03 PM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, Yesterday, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...