Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cuffdiff normalization using 2 conditions

    Hi,

    I have one doubt. In my project, i have reads from 2 conditions (Control/Infected) for leaf and root. I am using Cuffdiff to normalize the data and make differential gene expression, but i saw something.

    I testes my datas in two different forms:

    1. In Cuffdiff's parameters i put 4 conditions ( LeafControl, LeafInfected, RootControl and RootInfected), each condition has 3 replicates, and the Cuffdiff show 350 differentialy expressed genes for LeafControl and LeafInfected.

    2. In Cuffdiff's parameters i put 2 conditions ( LeafControl and LeafInfected), each condition has 3 replicates, and the Cuffdiff show 101 differentialy expressed genes.

    Why? I think that is the way that him normalizes. In first form the normalization includes all conditions, including Leaf and Root replicates, and the second use only one condition.

    In my opinion, if you want to see the DE between one condition (Leaf or. Root), you have to normalize the read mapping coming from Cufflinks separadaly.

    Somente could help me and say why i have different values for the same thing? What is the right value, 350 or 101? I don't if it helps, but the results have 98 genes in common.

  • #2
    Cuffdiff only does pairwise comparisons (two conditions at a time). For a more complex experimental desgn you may need to use more powerful software like DESeq2, which will let you fit an additive model (read count ~ tissue + treatment), or an interaction model, etc.

    Comment


    • #3
      Right, but Cuffdiff normalizes different when he has 4 conditions and 2 conditions, right? I think this is the cause of different values of DE genes.

      Thanks for ur answer.

      Comment


      • #4
        I think you're talking about significance testing, not normalization, so of course there are more significant results when you provide more data.

        Comment


        • #5
          The cufflinks manual discusses the normalization and dispersion estimation methods (all the way at the bottom at http://cufflinks.cbcb.umd.edu/manual.html). There are actually multiple options to choose from.
          Cuffdiff works by modeling the variance in fragment counts across replicates as a function of the mean fragment count across replicates. Strictly speaking, models a quantitity called dispersion - the variance present in a group of samples beyond what is expected from a simple Poisson model of RNA_Seq. You can control how Cuffdiff constructs its model of dispersion in locus fragment counts. Each condition that has replicates can receive its own model, or Cuffdiff can use a global model for all conditions. All of these policies are identical to those used by DESeq (Anders and Huber, Genome Biology, 2010)

          Dispersion Method Description
          pooled Each replicated condition is used to build a model, then these models are averaged to provide a single global model for all conditions in the experiment. (Default)
          per-condition Each replicated condition receives its own model. Only available when all conditions have replicates.
          blind All samples are treated as replicates of a single global "condition" and used to build one model.
          poisson The Poisson model is used, where the variance in fragment count is predicted to equal the mean across replicates. Not recommended.

          Which method you choose largely depends on whether you expect variability in each group of samples to be similar. For example, if you are comparing two groups, A and B, where A has low cross-replicate variability and B has high variability, it may be best to choose per-condition. However, if the conditions have similar levels of variability, you might stick with the default, which sometimes provides a more robust model, especially in cases where each group has few replicates. Finally, if you only have a single replicate in each condition, you must use blind, which treats all samples in the experiment as replicates of a single condition. This method works well when you expect the samples to have very few differentially expressed genes. If there are many differentially expressed genes, Cuffdiff will construct an overly conservative model and you may not get any significant calls. In this case, you will need more replicates in your experiment.

          Comment


          • #6
            Hello all..

            Am trying to process RNASeq sample which i got. I exactly followed the method mentioned in the Nature Protocol ("Trapnell et al,2012") and now am in confusion at the cuffdiff step.
            So anyone pls suggest the command for getting my desired output.

            I need Cuffdiff to generate output for each sample (seperate FPKM values for each replicate also)


            When i executed the cuffdiff as in the below line, i got the replicate merged output. I mean two replicates are merged and ultimately output for a single control, tretment 1 and tretment 2.

            cuffdiff -o phos -b Syn.fa -p 8 -L c1,2t,4t -u merged_phos/merged.gtf ./ctrl_rep-1/accepted_hits.bam,./ctrl_rep-2/accepted_hits.bam \./treat_1_rep-1/accepted_hits.bam,./treat_1_rep-2/accepted_hits.bam \./treat_2_rep-1/accepted_hits.bam,./treat_2_rep-2/accepted_hits.bam



            My samples are as follows,

            ctrl_rep-1
            ctrl_rep-2

            treat_1_rep-1
            treat_1_rep-2

            treat_2_rep-1
            treat_2_rep-2



            Thanks
            Han

            Comment


            • #7
              Cuffdiff is very limited in the kinds of comparisons it can do. It doesn't let you see inter-replicate variation like you see inter-group variation. If you want to do a more powerful analysis like that, you need to switch software. I would use featureCounts + DESeq2 for this.

              That will also give you better normalizations than FPKM (DESeq2's variance-stabilizing transformation and regularized log) if you want to do more than just significance testing. Here is the inventor of FPKM explaining why you shouldn't use FPKM: https://www.youtube.com/watch?v=5NiFibnbE8o&t=30m38s
              Last edited by jwfoley; 08-12-2014, 06:05 AM.

              Comment


              • #8
                Hi Han.

                Cuffdiff don't make an output with FPKM per replciates. He has one output file where show exactly the FPKM per conditions. You only have to parse the file and divided them in samples that you want.

                Or, for one fast analysis, you could run Cuffdiff using only:
                ctrl_rep-1
                ctrl_rep-2

                treat_1_rep-1
                treat_1_rep-2

                for see the difference between both samples.

                One day using Cuffdiff, I analyzed the differential gene expression using all samples that i had (Root_ctrl,Root_treat, Leaf_ctrl and Leaf_treat), and after i run Cuffdiff using only Leaf data (Ctrl and treat).

                When i analyzed the differential genes expressed of Leaf between this two analysis cases, the number was different. Because the normalization and dispersion method are changed, when you remove or insert sampĺes.



                Lucas

                Comment


                • #9
                  Thanks for making me aware of limitations of cuffdiff.
                  Based on instructions, i modified the strategy as follows...Kindly tell me am correct or not.

                  Input Sam/Bam file to featureCounts. Then the count table (generated as output of feature count) is given as input to DESeq2 for analyzing expression of each sample including replicates of conditions.

                  Han,
                  ROK

                  Comment


                  • #10
                    Yes, that's the idea. Of course you'll also need a GTF for featureCounts. You can use the transcripts.gtf from Cufflinks, though of course you'll get a lot of unannotated transcripts this way; or you can use a database annotation, which will be missing a lot of transcripts or parts of transcripts.

                    Comment


                    • #11
                      Hi jwfoley,

                      Thank you very much for the quick reply..

                      Han

                      Comment


                      • #12
                        Following the suggestion, I obtained count matrix from featureCounts. However i have 2 questions to ask

                        1. In the read count process, only 47% reads are successfully aligned to meta-feature "gene". Is that low value?

                        2. In the DESeq2 analysis, i face problem in setting the input criteria for ctrl and treatment because of my lack of knowledge in R. My sample are,

                        control-1 drought 2days-1 drought 4 days-1
                        control-2 drought 2days-2 drought 4 days-2

                        I tried to follow a method explained in the manual by Love et al.,. and i saw a sample code for inputting and setting count matrix as follows,

                        1.library("pasilla")
                        2.library("Biobase")
                        3.data("pasillaGenes")
                        4.countData <- counts(pasillaGenes)
                        5.colData <- pData(pasillaGenes)[,c("condition","type")]

                        6.dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~ condition)

                        7.dds$condition <- factor(dds$condition,levels=c("untreated","treated"))

                        Since am using two drought treated samples, i think i should modify the line 5 and line 7. Can anyone suggest how to set those parameters.

                        i modified header of count matrix as gene_id untreated1 untreated2 treated1 treated2 treated3 treated4

                        Thanks,

                        Han
                        Last edited by anikng; 08-18-2014, 11:25 PM.

                        Comment


                        • #13
                          Lines 1 through 5 are all for importing an example data set. If you want to use your data instead of the example, you don't need any of those.

                          You need to import your own data, create your own data frame of factors, and set your own model design, then use DESeqDataSetFromMatrix to create a DESeqDataSet object and proceed normally.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM
                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 03-27-2024, 06:37 PM
                          0 responses
                          15 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-27-2024, 06:07 PM
                          0 responses
                          13 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-22-2024, 10:03 AM
                          0 responses
                          53 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-21-2024, 07:32 AM
                          0 responses
                          70 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X