Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comparing multiple samples with cuffdiff

    I have samples of 4 different conditions. Let's say untreated/treated and tissue A/tissue B.
    #1: tissue A untreated
    #2: tissue B untreated
    #3: tissue A treated
    #4: tissue B treated.

    I need to compare between the treated vs untreated samples and also between samples.
    What is the proper way to use cuffdiff to compare the samples?
    Run cuffdiff with the pair of samples to compare or run it with all samples and use commeRbund to select the pairs I want to compare?

    For example, if I want to compare condition #3 and #4, do I need to run cuffdiff with #3 and #4 or run it with all samples and select #3 and #4 using cummeRbund?

    Thanks,

  • #2
    I would use htseq-count and then DESeq2 (or edgeR), since DESeq2 allows you to use a multi-factor design.

    Here is an example for a model:
    ~ tissue + treatment + tissue:treatment

    Cuffdiff does not allow you to do a multi-factor design.

    ---

    If you just want to compare individual samples with Cuffdiff, follow the new workflow.
    Run Cuffquant on all the samples.
    Specify the sample names in the samples file.
    Specify the contrasts of interest in the contrasts file (eg. condition #3 vs condition #4)
    Run Cuffdiff with the samples file and the contrasts file.

    The results will all be in the same files, with the comparisons one after the other.
    If you want, you can make separate files by extracting the results for each comparison with CummeRbund.

    Comment


    • #3
      Thank you for the DESeq2 suggestion. I'll check it out.

      Regarding Cuffdiff and the contrasts file, I'd like to ask one more question.

      If I want to compare condition #1 vs condition #2 and condition #3 vs condition #4, then I can either
      1) include all samples from the 4 conditions and specify the comparisons
      in the contrasts file,
      or
      2) run Cuffdiff twice with condition #1 vs condition #2 and condition #3 vs condition #4 separately.

      I assumed the two results from Cuffdiff 1) and 2) are the same but they are not identical. I found a thread that are the same question as mine with no answer.

      Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc



      Originally posted by blancha View Post
      I would use htseq-count and then DESeq2 (or edgeR), since DESeq2 allows you to use a multi-factor design.

      Here is an example for a model:
      ~ tissue + treatment + tissue:treatment

      Cuffdiff does not allow you to do a multi-factor design.

      ---

      If you just want to compare individual samples with Cuffdiff, follow the new workflow.
      Run Cuffquant on all the samples.
      Specify the sample names in the samples file.
      Specify the contrasts of interest in the contrasts file (eg. condition #3 vs condition #4)
      Run Cuffdiff with the samples file and the contrasts file.

      The results will all be in the same files, with the comparisons one after the other.
      If you want, you can make separate files by extracting the results for each comparison with CummeRbund.

      Comment


      • #4
        I just noticed in your experimental design that you don't have any replicates.
        A multi-factor design may therefore not be appropriate in your case.
        You may even want to ignore the p-values calculated by Cuffdiff or DESeq2 altogether, and only take into account the fold changes (not the estimated fold changes calculated by DESeq2, but the actual fold changes).

        To try and answer your question directly, I'd need more information.
        Both ways of running Cuffdiff are correct, incidentally, so you're not making any mistakes either way.
        What version of Cufflinks are you using?
        What are the differences in the results between running Cuffdiff in the 2 manners you described? Are the FPKM values different, or just the p-values in the comparisons?

        One possibility is that, in the absence of replicates, Cuffdiff is calculating the variance levels across all the conditions, resulting in different results when you include less conditions. It's just a hypothesis. Cuffdiff is a bit of a black box, so others may come up with better answers.

        I would be more concerned with the absence of replicates in your experimental design, than with Cuffdiff's quirks. To get more consistent results than with Cuffdiff, you can always use htseq-count to count the reads, and then DESeq2 to normalize the counts relative to the total number of reads. But, the determination of the significance of the differences between the samples will always be unreliable in the absence of replicates.

        Comment


        • #5
          Originally posted by blancha View Post
          I just noticed in your experimental design that you don't have any replicates.
          A multi-factor design may therefore not be appropriate in your case.
          You may even want to ignore the p-values calculated by Cuffdiff or DESeq2 altogether, and only take into account the fold changes (not the estimated fold changes calculated by DESeq2, but the actual fold changes).
          Yes, you are right. This experiment has no replicates unfortunately. But I have another experiments which have replicates. The multi-factor design will be useful for that.

          Originally posted by blancha View Post
          To try and answer your question directly, I'd need more information.
          Both ways of running Cuffdiff are correct, incidentally, so you're not making any mistakes either way.
          It comforts me.

          Originally posted by blancha View Post
          What version of Cufflinks are you using?
          What are the differences in the results between running Cuffdiff in the 2 manners you described? Are the FPKM values different, or just the p-values in the comparisons?
          Cufflinks 2.2.1.

          The FPKM, p-values, and q_values are all different. But the values and the treands are similar but not identical. As a result, the genes marked as significant are different.

          Here are the snippets of the gene_exp.diff of the two Cuffdiff runs - between two conditions and four conditions

          1) Between two conditions.
          test_id gene_id gene locus sample_1 sample_2 status
          1 XLOC_000001 XLOC_000001 Lypla1 chr1:4797973-4836816 dmso1hr treat1hr OK
          2 XLOC_000002 XLOC_000002 Tcea1 chr1:4847405-5060366 dmso1hr treat1hr OK
          3 XLOC_000003 XLOC_000003 Atp6v1h chr1:5073171-5152630 dmso1hr treat1hr OK
          4 XLOC_000004 XLOC_000004 Oprk1 chr1:5578573-5596214 dmso1hr treat1hr NOTEST
          5 XLOC_000005 XLOC_000005 Rb1cc1 chr1:6199977-6266185 dmso1hr treat1hr OK
          6 XLOC_000006 XLOC_000006 Fam150a chr1:6349411-6384812 dmso1hr treat1hr NOTEST
          value_1 value_2 log2.fold_change. test_stat p_value q_value significant
          1 21.4748 21.7251 0.0167203 0.0807323 0.9345 0.9996 no
          2 57.7808 58.5267 0.0185049 0.0665288 0.9472 0.9996 no
          3 60.2174 58.9854 -0.0298249 -0.1459160 0.8805 0.9996 no
          4 0.0000 0.0000 0.0000000 0.0000000 1.0000 1.0000 no
          5 10.6696 11.6501 0.1268330 0.6153200 0.5282 0.9996 no
          6 0.0000 0.0000 0.0000000 0.0000000 1.0000 1.0000 no

          2) Four conditions
          test_id gene_id gene locus sample_1 sample_2 status
          1 XLOC_000001 XLOC_000001 Lypla1 chr1:4797973-4836816 dmso1hr treat1hr OK
          2 XLOC_000002 XLOC_000002 Tcea1 chr1:4847405-5060366 dmso1hr treat1hr OK
          3 XLOC_000003 XLOC_000003 Atp6v1h chr1:5073171-5152630 dmso1hr treat1hr OK
          4 XLOC_000004 XLOC_000004 Oprk1 chr1:5578573-5596214 dmso1hr treat1hr NOTEST
          5 XLOC_000005 XLOC_000005 Rb1cc1 chr1:6199977-6266185 dmso1hr treat1hr OK
          6 XLOC_000006 XLOC_000006 Fam150a chr1:6349411-6384812 dmso1hr treat1hr NOTEST
          value_1 value_2 log2.fold_change. test_stat p_value q_value significant
          1 21.3881 21.5543 0.0111661 0.0193608 0.98325 0.99965 no
          2 57.5476 58.0665 0.0129507 0.0167425 0.98480 0.99965 no
          3 59.9744 58.5215 -0.0353790 -0.0630910 0.94635 0.99965 no
          4 0.0000 0.0000 0.0000000 0.0000000 1.00000 1.00000 no
          5 10.6266 11.5585 0.1212790 0.2202230 0.81570 0.99965 no
          6 0.0000 0.0000 0.0000000 0.0000000 1.00000 1.00000 no

          [/QUOTE]

          Originally posted by blancha View Post

          One possibility is that, in the absence of replicates, Cuffdiff is calculating the variance levels across all the conditions, resulting in different results when you include less conditions. It's just a hypothesis. Cuffdiff is a bit of a black box, so others may come up with better answers.
          I think you are right. Strange thing is I've got different results in another experiments which had replicates.
          I looked up the manual again and found how the dispersion is calculated.
          I think this may explain the situation.

          Code:
          Dispersion Method	Description
          pooled	Each replicated condition is used to build a model, then these models are averaged to provide a single global model for all conditions in the experiment. (Default)
          per-condition	Each replicated condition receives its own model. Only available when all conditions have replicates.
          blind	All samples are treated as replicates of a single global "condition" and used to build one model.
          poisson	The Poisson model is used, where the variance in fragment count is predicted to equal the mean across replicates. Not recommended.

          Originally posted by blancha View Post
          I would be more concerned with the absence of replicates in your experimental design, than with Cuffdiff's quirks. To get more consistent results than with Cuffdiff, you can always use htseq-count to count the reads, and then DESeq2 to normalize the counts relative to the total number of reads. But, the determination of the significance of the differences between the samples will always be unreliable in the absence of replicates.
          Thank you very much for your answers. It helps me understand Cuffdiff more and know about DESeq2.

          Comment


          • #6
            contrast file for cuffdiff

            Hi,
            I have a number of RNA-Seq samples classified into three groups "1", "2" and "3".
            Following the new workflow (cuffquant) I have provided cuffdiff with a sample sheet and have received the results of the three contrasts "1vs2", "1vs3" and "2vs3". So far so good...
            But now I would like to test "1vs2+3", "2vs1+3" and "3vs1+2" - so each group in contrast to the other two.
            Can I do this using a contrast file? The example in the manual does not suggest this...
            Or do I have to provide three new samplesheets with two groups in each (e.g. "1" and "2+3" for one of the paired contrasts)?
            Thanks,
            Jakob

            Comment


            • #7
              Jakob, you may get more joy posting this as a new question rather than appending it to a thread that hasn't been posted to in several months..

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X