Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNA Seq normalization

    what do you understand by normalization of RNA seq data? what are the tools available for it?

  • #2
    hi, do you mean experiment normalization or data normalization for quantification analysis?

    If it is for cDNA libraries normalization, one of application is duplex-specific nuclease (DSN), which is based on the kinetics of cDNA reassociation. (refers to: P. A. Zhulidov, etc. al., A Method for the Preparation of Normalized cDNA Libraries Enriched with Full-Length Sequences. Russian Journal of Bioorganic Chemistry, Vol. 31, No. 2, 2005. and Irina Shagina, etc. al., Normalization of genomic DNA using duplex-specific nuclease. BioTechniques 48:455-459, June 2010)

    Or the later, there is two general formulas for RNA-seq data normalization: RPKM (reads per kilobase per millions of reads mapped) and FPKM (fragments per kilobase per million mapped fragments), and an useful tool - Cufflinks. You can follow the previous post in SEQanswer to find more details: RNA-seq and normalization numbers (http://seqanswers.com/forums/showthr...p?t=586&page=1)

    Comment


    • #3
      hi BENM,
      i meant the later one.
      thank you for providing the link to this old post. that what i was looking for.thanks.

      Comment


      • #4
        Originally posted by harshinamdar View Post
        hi BENM,
        i meant the later one.
        thank you for providing the link to this old post. that what i was looking for.thanks.
        hi,everyone
        i want to use TMM method to normalization,but i encounter a question ,how can i get the normalized counts after TMM ,thank you very much.

        Comment


        • #5
          Originally posted by luoye View Post
          hi,everyone
          i want to use TMM method to normalization,but i encounter a question ,how can i get the normalized counts after TMM ,thank you very much.
          You can use EdgeR to get TMM normalized data using calcNormFactors() in R.

          What do you want to use the normalized data as input for?

          Comment


          • #6
            Originally posted by chadn737 View Post
            You can use EdgeR to get TMM normalized data using calcNormFactors() in R.

            What do you want to use the normalized data as input for?
            hi chadn737
            thank you very much for your reply,I mean is that when i use EdgeR to get TMM calcNormFactors() in R to nomalization ,i want to see the difference
            between normalized data and the raw data .for example ,In DESeq, you get normalized counts by dividing the raw counts by the appropriate size factor.but in edgeR ,how can i do this normalized counts ?
            thank you

            Comment


            • #7
              Do the same thing with the normalization factors from EdgeR. You can even feed DESeq the normalization factors from EdgeR by using sizeFactors(cds)= normalization factors from EdgeR

              Comment


              • #8
                Originally posted by chadn737 View Post
                Do the same thing with the normalization factors from EdgeR. You can even feed DESeq the normalization factors from EdgeR by using sizeFactors(cds)= normalization factors from EdgeR
                sorry,i can not understand what you mean,can you tell me some more detail?
                did you mean is: cds=calcNormFactors(cds) ,sizeFactors(cds)?
                thank you very much.

                Comment


                • #9
                  Originally posted by luoye View Post
                  sorry,i can not understand what you mean,can you tell me some more detail?
                  did you mean is: cds=calcNormFactors(cds) ,sizeFactors(cds)?
                  thank you very much.
                  When you first use DESeq, you combine a table of counts and a list of conditions to create a count data set

                  Code:
                  cds <- newCountDataSet(countTable,conditions)
                  You can give the count data set your own size factors using

                  Code:
                  sizeFactors(cds) <- #input
                  If you wanted to use TMM normalized sizeFactors from EdgeR rather than those given by DESeq then you can first:

                  Code:
                  x <- calcNormFactors(as.matrix(countTable)
                  and then give this to the count data set:

                  Code:
                  sizeFactors(cds) <- x

                  Comment


                  • #10
                    Originally posted by chadn737 View Post
                    When you first use DESeq, you combine a table of counts and a list of conditions to create a count data set

                    Code:
                    cds <- newCountDataSet(countTable,conditions)
                    You can give the count data set your own size factors using

                    Code:
                    sizeFactors(cds) <- #input
                    If you wanted to use TMM normalized sizeFactors from EdgeR rather than those given by DESeq then you can first:

                    Code:
                    x <- calcNormFactors(as.matrix(countTable)
                    and then give this to the count data set:

                    Code:
                    sizeFactors(cds) <- x
                    thank you very much,i do as you say,but the result is not what i expect.

                    Comment


                    • #11
                      size factors in DESeq and edgeR

                      Originally posted by chadn737 View Post
                      When you first use DESeq, you combine a table of counts and a list of conditions to create a count data set

                      Code:
                      cds <- newCountDataSet(countTable,conditions)
                      You can give the count data set your own size factors using

                      Code:
                      sizeFactors(cds) <- #input
                      If you wanted to use TMM normalized sizeFactors from EdgeR rather than those given by DESeq then you can first:

                      Code:
                      x <- calcNormFactors(as.matrix(countTable)
                      and then give this to the count data set:

                      Code:
                      sizeFactors(cds) <- x
                      Yes, both DESeq and edgeR have functions to normalize the data. However, it's wrong to assign the size factors calculated in edgeR to DESeq, though conceptually fine at first sight. Because in DESEq, the size factor is used to 'transform' the raw reads into a 'common' ground, and you can use the normalized counts for differential analysis. But the size factor in edgeR adjusts the library size so that the gene abundence (=counts/"effective library size", and "effective library size = "library size" * "size factor") is comparable across samples.

                      To illustrate this point, see example below.

                      Code:
                      # data
                      y <- x <- rep(1,100)
                      y[1] <- 101  
                      xy <- data.frame(x=x,y=y)
                      
                      #edgeR
                      edger <- DGEList(counts=xy)
                      edger <- calcNormFactors(edger)
                      edger$samples
                      
                      #DESeq
                      deseq = newCountDataSet( xy, conditions=c("c1","c2") )
                      deseq = estimateSizeFactors( deseq )
                      sizeFactors( deseq )
                      > sizeFactors( deseq )
                      x y
                      1 1

                      > edger$samples
                      group lib.size norm.factors
                      x 1 100 1.4142
                      y
                      1 200 0.7071
                      Last edited by Shanrong; 02-13-2013, 07:36 PM.

                      Comment


                      • #12
                        Originally posted by Shanrong View Post
                        Yes, both DESeq and edgeR have functions to normalize the data. However, it's wrong to assign the size factors calculated in edgeR to DESeq, though conceptually fine at first sight. Because in DESEq, the size factor is used to 'transform' the raw reads into a 'common' ground, and you can use the normalized counts for differential analysis. But the size factor in edgeR adjusts the library size so that the gene abundence (=counts/"effective library size", and "effective library size = "library size" * "size factor") is comparable across samples.

                        To illustrate this point, see example below.

                        Code:
                        # data
                        y <- x <- rep(1,100)
                        y[1] <- 101  
                        xy <- data.frame(x=x,y=y)
                        
                        #edgeR
                        edger <- DGEList(counts=xy)
                        edger <- calcNormFactors(edger)
                        edger$samples
                        
                        #DESeq
                        deseq = newCountDataSet( xy, conditions=c("c1","c2") )
                        deseq = estimateSizeFactors( deseq )
                        sizeFactors( deseq )
                        > sizeFactors( deseq )
                        x y
                        1 1

                        > edger$samples
                        group lib.size norm.factors
                        x 1 100 1.4142
                        y
                        1 200 0.7071
                        Thank you for this. For my own work I have not done this, but in a project where I am a collaborator, the statistician in the group did use the EdgeR normalized data for input into DESeq. I know it gives very different results, and have avoided it in my own work because the DESeq size factors seemed to give more conservative results and I prefer working with fewer genes that I am very confident in than more genes of lower confidence. I'll have to bring this up on the project that I am collaborating on.

                        Comment


                        • #13
                          Hi everyone,
                          I'm dealing with the two normalization methods DESeq and edgeR.
                          I have two conditions and only one replicate per condition (I know, bad experimental design...) and I tried to normalize the raw counts.
                          With bot the normalization methods I obtain size factors very different:
                          -using DESeq 0,095 for one library and 10,85 for the other.
                          -using edgeR 0,14 and 7,2 respectively.

                          Obviously, by dividing the raw counts for the corrisponding size factor, the raw counts drammatically change, sometimes inverting the starting conditions (an upregulated gene become dowregulated).

                          Does it make sense?
                          do you think it's correct to use this normalization methods despite the weird results??

                          Thank you all

                          Comment


                          • #14
                            Originally posted by Marianna85 View Post
                            Hi everyone,
                            I'm dealing with the two normalization methods DESeq and edgeR.
                            I have two conditions and only one replicate per condition (I know, bad experimental design...) and I tried to normalize the raw counts.
                            With bot the normalization methods I obtain size factors very different:
                            -using DESeq 0,095 for one library and 10,85 for the other.
                            -using edgeR 0,14 and 7,2 respectively.

                            Obviously, by dividing the raw counts for the corrisponding size factor, the raw counts drammatically change, sometimes inverting the starting conditions (an upregulated gene become dowregulated).

                            Does it make sense?
                            do you think it's correct to use this normalization methods despite the weird results??

                            Thank you all
                            Looks like you have a huge difference (is that a 20-50 fold difference?) in read count between conditions, this is a problem because the normalization will significantly amplify the noise of the smaller sample making the (already unreliable without replicates) data less reliable.
                            But yes, you need to normalize. Is more sequencing an option?

                            Comment


                            • #15
                              Hi Jeremy,
                              yes I have a huge difference in read count between conditions: 36 vs 64 million of total reads.
                              I do not have other options unfortunately and I want to make a simple differential espression analysis, maybe with few differential expressed genes.
                              I know that, without replicates, it's difficult to make a DE analysis and I don't want to reach false conclusions. I know I have to be very conservative to say something really reliable...but how??
                              In your opinion, can I discard some genes, for example those with a low count reads, and make the normalization for the remaining ones?

                              Thanks a lot
                              Marianna

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X