Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cuffdiff with biological replicates problem

    I am trying to perform the cufflinks/cuffdiff workflow on the Galaxy server. I am comparing expression at 2 times (Day 2 and Day 6) and 2 conditions at each time (P and D). I have 2 biological replicates for each condition at Day 2, 3 replicates for each condition at Day 6.

    for example, group labels are:

    D2P (2 samples)
    D2D (2 samples)
    D6P (3 samples)
    D6D (3 samples)

    I ran Cufflinks v2.1.1 and Cuffdiff v2.1.1 then downloaded the 11 output files for cummerbund analyses. Most of the analyses I carry out in cummerbund provide reasonable output, such as density plots, heat maps, PC analyses. My problem is that I cannot do any cummerbund analyses that demonstrate differences in the replicate samples. For example, I can make a density plot or dendrogram, but if I use replicates='T' or replicates=T I get error messages.

    > cuff<-readCufflinks()
    > cuff
    CuffSet instance with:
    4 samples
    26911 genes
    65068 isoforms
    30546 TSS
    30084 CDS
    161466 promoters
    213180 splicing
    117294 relCDS

    > cd<-csDendro(genes(cuff)) works

    > cdr<-csDendro(genes(cuff), replicates='T')
    Error in sqliteExecStatement(con, statement, bind.data) :
    RS-DBI driver: (error in statement: near "from": syntax error)

    I have rebuilt the database and tried different syntax for replicates but find that any time I try to include replicates I fail. When I run pairwise volcano plots of triplicate samples, I get great looking plots with some genes showing very large (>20 log2(fold change)) and significant (>4 -log10(p)) differences as can be seen in the tabular diff data in excel, but only black "not significant" dots. This makes me think that either cuffdiff is not correctly characterizing the replicates, my db is still somehow incorrect, or I am not using cummerbund correctly.

    The last clue is that none of the 11 cuffdiff output files have any notation with replica names. There are no columns labeled with the individual sample bam file names with counts. The online cuffdiff manual suggests there should be 4 read counts files that show replicate data but I have never seen these files.

    Any help would be welcome.

    mike
    Last edited by mshamblott; 07-21-2013, 02:17 PM. Reason: Clarified question

  • #2
    Sorry mike, I've never used cuffdiff on Galaxy before so I'm not sure what expect from it. If you ran cuffdiff on all of these samples at the same time (prompting it to perform all pair wise comparisons) it may refer to your conditions as q1, q2, q3, and q4. It probably would not include the BAM file names.
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment


    • #3
      Missing replicates-Still a mystery

      I see no evidence that any replica calculations were made. None of the tabular files have "qn" notations or .bam names and CummerBund >replicates(cuff) returns an empty set. From this, I presume that the problem is in my cufflinks run.

      Alternately, I may not be running cummerbund on a complete dataset. When you run cummerbund do you supply it with the 11 data files below or also include so of the other files that cufflinks generates like the read group files? These other files are not available for downloading from Galaxy as far as I can see.

      isoforms.fpkm_tracking
      genes.fpkm_tracking
      cds.fpkm_tracking
      tss_groups.fpkm_tracking

      isoform_exp.diff
      gene_exp.diff
      tss_group_exp.diff
      cds_exp.diff
      CDS.diff
      promoters.diff

      thanks

      mike

      Comment


      • #4
        I have run several experiments with Galaxy Cuffdiff and local computer Cuffdiff. The results demonstrate that if you use biological replicates (you should) AND you want to use cummeRbund as the next step in your workflow (you should) then you have to run Cuffdiff locally, not on Galaxy. Cuffdiff generates several read_group_tracking files that cannot be downloaded from Galaxy (as of 7/31/2013). CummeRbund requires these files to analyze replicate data.

        The clearest test of this is to run >replicates(cuff) in cummeRbund. If this returns an empty set cummerbund is not using replicates.

        I hope this thread helps.

        Comment


        • #5
          Read_group_tracking files not enough?

          Originally posted by mshamblott View Post
          Cuffdiff generates several read_group_tracking files that cannot be downloaded from Galaxy (as of 7/31/2013). CummeRbund requires these files to analyze replicate data.
          Now (20/Nov/13) Galaxy produces the "read_group_tracking" files. However, I wonder if this files are enough to use 'replicates=T' in cummeRbund or we also need "read_groups.info" files. I've tried 'replicates(cuff)' with the 'read_group_tracking' files in my directory and I get back an empty set. At this time I don't see other option that run cuffdiff locally as you suggested.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          30 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          32 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          53 views
          0 likes
          Last Post seqadmin  
          Working...
          X