Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by Marianna85 View Post
    With bot the normalization methods I obtain size factors very different:
    -using DESeq 0,095 for one library and 10,85 for the other.
    -using edgeR 0,14 and 7,2 respectively.
    This is because edgeR gives is norm factors relative to the total read count. To get expression values on a scale comparable across sample, you have to divide the counts
    - for DESeq, just by the size factod
    - for edgeR, by the total read counts and by the normalization factors

    Comment


    • #17
      Hi Simon,
      happy to read your answer

      Originally posted by Simon Anders View Post
      This is because edgeR gives is norm factors relative to the total read count. To get expression values on a scale comparable across sample, you have to divide the counts
      - for DESeq, just by the size factod
      - for edgeR, by the total read counts and by the normalization factors
      Just an example to better understand.
      For DESeq
      gene A raw counts; 5 reads in library 1 - 70 reads in library 2
      library 1:36 million reads size factor: 0.09
      library 2:64 million reads size factor: 10
      gene A normalized counts lib 1=5/0.09 - lib2=70/10

      For edgeR
      gene A raw counts; 5 reads in library 1 - 70 reads in library 2
      library 1:36 million reads size factor: 0.14
      library 2:64 million reads size factor: 7.2
      gene A normalized counts lib 1=(5/36 million)/0.14 - lib2=(70/64million)/7.2

      in this case with a very huge difference in library size, it seems better to normalize with edgeR. Isn't it?

      Thanks a lot.
      I really appreciate your answer.

      Marianna

      Comment


      • #18
        I am using both edgeR and DESeq to analyze my own dataset. In general, the fold change reported by both are close (should be, right?).

        If your orignal librares have 36 vs 64 million of total reads, your normalization factors from both methods are very weird (at least quite unusual). Please check whether you analyze your dataset properly.

        The idea of normalization behind edgeR and DESeq is very similar to each other (but implementation is different). In practice, I don't see one method is superior than the other. However, it is definitely a mistake if we feed DESeq with the normalization factor from edgeR, and vice versa.

        Comment


        • #19
          Something is going wrong somewhere, 36M and 64M reads should give normalization factors with less than a 2-fold difference. The normalization factors you listed had a 50 fold difference and suggest a much greater difference in read count.
          Last edited by Jeremy; 03-04-2013, 05:50 PM.

          Comment


          • #20
            Originally posted by Jeremy View Post
            Something is going wrong somewhere, 36M and 64M reads should give normalization factors with less than a 2-fold difference. The normalization factors you listed had a 50 fold difference and suggest a much greater difference in read count.
            Hi Jeremy,
            in fact I was surprised to obtain such a difference...
            I've not yet understood which is the mistake in the size factor calculation.

            Comment


            • #21
              Have you already looked at a scatter plot comparing the counts for the two samples? This should clarify what is going on.

              Comment


              • #22
                Originally posted by Simon Anders View Post
                Have you already looked at a scatter plot comparing the counts for the two samples? This should clarify what is going on.
                Simon, do you mean the estimateDispersions?

                Comment


                • #23
                  This is the script I used

                  CountTable=read.table("decEggs.txt", header=TRUE, row.names=1 )
                  head(CountTable)
                  decDesign = data.frame(row.names = colnames( CountTable ), condition = c( "stripped", "spawned" ), libType = c( "paired-end", "paired-end" ) )
                  decDesign
                  pairedSamples = decDesign$libType == "paired-end"
                  condition = decDesign$condition[ pairedSamples ]
                  library( "DESeq" )
                  cds = newCountDataSet( CountTable, condition )
                  cds = estimateSizeFactors( cds )
                  sizeFactors( cds )
                  head( counts( cds, normalized=TRUE ) )


                  and the size factors have a 100 fold difference...
                  what should I do???

                  Comment


                  • #24

                    Comment


                    • #25
                      Originally posted by Marianna85 View Post
                      Simon, do you mean the estimateDispersions?
                      No, I mean a scatter plot of the reads.

                      Try, e.g.,

                      Code:
                      plot( log10( 1 + counts(cds)[1,] ), log10( 1 + counts(cds)[2,] ), pch="." )
                      to plot the raw, unnormalized read counts of the second sample versus the first on a log scale.

                      Comment


                      • #26
                        Hi Simon,
                        the plot seems empty...
                        may I change the axis scale?

                        Comment


                        • #27
                          This will be hard to debug via the forum. You may need to get some local help.

                          To try one thing: If you simply type "counts(cds)", you get your table of raw counts (or, if you just want the first 100 lines, try "head( counts(cds), 100 )". Check whether they make sense.

                          Comment


                          • #28
                            Sorry, I made a type. It's

                            Code:
                            plot( log10( 1 + counts(cds)[,1] ), log10( 1 + counts(cds)[,2] ), pch="." )

                            Comment


                            • #29
                              of course! I defined the cds rows, not the columns.
                              So this is the plot...



                              something strange in your opinion??

                              Comment


                              • #30
                                ".emf"? That's Windows extended metafile, right? Haven't seen this graphics file format in ten years, and frankly, I have no idea how to open it. Could you use something more common, please, maybe png?

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                27 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                30 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                26 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                52 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X