Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Inconsistency in Cuffdiff results

    Hi all,

    I use cuffdiff to compare my RNA-Seq samples, and the result I got is inconsistent.

    For example, I have three samples, S1, S2, and S3. I ran cuffdiff for a pair first S1 vs. S2. Then, I ran cuffdiff for all three samples. Since cuffdiff does pair-wise, it reports all pairs. S1 vs. S2; S1 vs. S3; and S2 vs. S3.

    The results I got for S1 vs. S2 from these two runs are different. I assume they should be the same. I'm wondering is there anything I did wrong? or cuffdiff considers more factors when sample is more?

    Thanks,
    Xiaoyu

  • #2
    I have noticied this as well. I also notice that in the new version of cufflinks (2.0.2), cuffdiff produces a file with the individual RPKM values for replicates. I have 14 disease samples and 6 controls so when I run cuffdiff I have two conditions with replicates (14 and 6 in each condition). If I run the analysis disease v control I get different individual RPKM values than if I split the disease samples into "drug responder" and "drug non-responder" and re-run cuffdiff with 3 conditions (responder, non-responder, control). I would expect the individual RPKM values to be the same irrespective of the number of conditions.

    Or am I misunderstanding something?

    Thanks
    Helen

    Comment


    • #3
      Originally posted by hlwright View Post
      I have noticied this as well. I also notice that in the new version of cufflinks (2.0.2), cuffdiff produces a file with the individual RPKM values for replicates. I have 14 disease samples and 6 controls so when I run cuffdiff I have two conditions with replicates (14 and 6 in each condition). If I run the analysis disease v control I get different individual RPKM values than if I split the disease samples into "drug responder" and "drug non-responder" and re-run cuffdiff with 3 conditions (responder, non-responder, control). I would expect the individual RPKM values to be the same irrespective of the number of conditions.

      Or am I misunderstanding something?

      Thanks
      Helen
      I guess my problem is not exactly as yours, but similar. For your case, after you split the disease sample, each sample has different number of replicates than the first time you run the experiment. For my case, I have exact same sample, just adding one sample for the second run.

      Comment


      • #4
        Xiaoyu

        Yes I have a different number of replicates when I run the analysis the second time, so I might expect that the gene RPKM value in the genes.fpkm_tracking file (one RPKM for each condition/gene) would be different. However, would I not expect the individual RPKM values (in the genes.read_group_tracking file) for each sample to be the same no matter how the analysis was run?

        Helen

        Comment


        • #5
          Honestly, I don't know the answer... But, if you are checking the Cuffdiff result, I guess, they might be different, since your replicates are different, and cuffdiff will do normalization differently ...

          Comment


          • #6
            Can you share with us the command you use to run the cuffdiff ?

            Comment


            • #7
              This is the command I used. Thanks

              Code:
              cuffdiff -p 8 -o dfout -L S1,S2,S3 merged.gtf ./S1/accepted_hits.bam ./S2_R1/accepted_hits.bam,./S2_R2/accepted_hits.bam ./S3_R1/accepted_hits.bam,./S3_R2_accepted_hits.bam

              Comment


              • #8
                Originally posted by potato84 View Post
                Hi all,

                I use cuffdiff to compare my RNA-Seq samples, and the result I got is inconsistent.

                For example, I have three samples, S1, S2, and S3. I ran cuffdiff for a pair first S1 vs. S2. Then, I ran cuffdiff for all three samples. Since cuffdiff does pair-wise, it reports all pairs. S1 vs. S2; S1 vs. S3; and S2 vs. S3.

                The results I got for S1 vs. S2 from these two runs are different. I assume they should be the same. I'm wondering is there anything I did wrong? or cuffdiff considers more factors when sample is more?

                Thanks,
                Xiaoyu
                Keep in mind that in the absence of replicates, cuffdiff uses the pooled conditions to derive its dispersion estimate. So your dispersions estimates may be very different when you ran with only a pair of samples versus running with all three. That will inherently affect your estimates of significance when computing differences between pairs of samples. So you would not expect to get the same significance for those analyses.
                Michael Black, Ph.D.
                ScitoVation LLC. RTP, N.C.

                Comment


                • #9
                  Thank you for answering my questions, mbblack.

                  Then shall I expect to get the same result for S2 vs. S3? I have replicates for both samples.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin


                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                    Yesterday, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  55 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  45 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  55 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X