Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cummeRbund get significantly differentially expressed genes

    Hello,

    I am conducting an RNA-Seq analysis to identify significantly differentially expressed genes (SDEGs) between 5 conditions. I followed the Tuxedo protocol which is tophat/cufflinks/cuffmerge/cuffdiff/cummeRbund (http://www.nature.com/nprot/journal/...t.2012.016.pdf) and I am at the final part where I need to determine the number of SDEGs.
    I followed the cummeRbund instructions in the manual (http://compbio.mit.edu/cummeRbund/manual_2_0.html) and here's my output:
    Code:
    >cuff_data<-readCufflinks(genome="genome.fa",gtfFile="merged.gtf")
    CuffSet instance with:
    	 5 samples
    	 16011 genes
    	 36460 isoforms
    	 24184 TSS
    	 15312 CDS
    	 160110 promoters
    	 241840 splicing
    	 135440 relCDS
    >gene.diff<-diffData(genes(cuff_data))
    > sig.gene.diff<-subset(gene.diff, significant=="yes")
    > nrow(sig.gene.diff)
    [1] 4500
    > mySigGeneIds<-getSig(cuff_data,alpha=0.05,level='genes')
    > length(mySigGeneIds)
    [1] 1386
    > mySigIsoformIds<-getSig(cuff_data,alpha=0.05,level='isoforms')
    > length(mySigIsoformIds)
    [1] 900
    So when I use diffData() and filter only significant results using subset() I get a total of 4500 SDEGs (this is for all pairwise comparisons that I can extract separately using another subset() with sample names) BUT when I use the getSig() method with alpha=0.05, I only get 1386 gene IDs (not 4500), so that's less than a third from what I get with diffData. I tested at "isoform" level to see if it would match but it didn't.
    The part in the cummeRbund manual about getSig() says "a alpha value can be provided on which to filter the resulting list (the default is 0.05 to match the default of cuffdiff)."
    From what I understood, the diffData() method is supposed to just extract the information from cuffdiff without making additional tests so shouldn't I get the same number of genes?
    I would be greatful if someone could explain me clearly the difference between those two methods and which one should be used to determine SDEGs.
    Thanks a lot for your help,
    Oli

  • #2
    I realized that if I use the getSig() method one pair of conditions at a time with the command:
    cond1_vs_cond2_getSig<-getSig(cuff_data,x="cond1",y="cond2",alpha=0.05,level="genes")
    and sum the length of the vectors (=number of SDEGs) for the 10 possible pairwise comparisons (I have 5 conditions) I get a total of 4773 SDEGs.
    So I think that, with 5 conditions, using getSig(cuff_data,alpha=0.05,level='genes') will correct for multiple testing going through all conditions and this is why I get a smaller number of genes, so I shouldn't use it to determine SDEGs for a particular pair of conditions, can someone comfirm this?
    On the other hand I am still not sure whether I should use diffData() + filter only significant results + filter by pairwise comparisons OR if I should use getSig(cuff_data,x="cond1",y="cond2",alpha=0.05,level="genes") for each pair of conditions.
    I thank anyone who may help me with this!!
    Last edited by Olioli; 11-25-2014, 10:22 AM.

    Comment


    • #3
      After more than 400 views nobody answered or helped me at all,
      I am sure that someone should be able to help me on this forum, I will be glad to answer any question you may have.
      Last edited by Olioli; 01-09-2015, 05:14 AM.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      27 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      31 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      27 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Working...
      X