Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • running cufflinks with a genome annotation as a strict reference or as a guide.

    Hi folks, I am running cufflinks in two ways: 1-with a genome annotation as a strict reference, 2-a genome annotation as a guide. these are options -G/-g.
    After cufflinks, I run cuffcompare and cuffdiff, using the exact same options in the two runs. For cuffdiff I use a minimum alignment of 10 and quartile normalization.
    this is an example of my cuffdiff-transcript differential expression testing outputs:

    RUN1=annotation as a STRICT reference (option G)
    test_id gene_id gene locus status sample 1 sample 2 ln(fold_change) test_stat p_value q_value significant
    TCONS_00000808 XLOC_000674 OBP51 2L:26101763-26135649 OK 0 3.59443 -1.79769e+308 -1.79769e+308 0.00048021 0.00886047 yes
    TCONS_00000244 XLOC_000192 AGAP005079 2L:9735320-9746596 OK 0 12.3904 -1.79769e+308 -1.79769e+308 1.40E-10 1.18E-08 yes
    TCONS_00000245 XLOC_000192 AGAP005079 2L:9735320-9746596 OK 0 26.4766 -1.79769e+308 -1.79769e+308 1.03E-11 1.09E-09 yes
    TCONS_00001658 XLOC_001422 AGAP007633 2L:48492236-48511972 OK 0 4.76816 -1.79769e+308 -1.79769e+308 4.54E-09 3.07E-07 yes




    RUN2=annotation as a GUIDE (option g)
    test_id gene_id gene locus status sample 1 sample2 ln(fold_change) test_stat p_value q_value significant
    TCONS_00001429 XLOC_000713 OBP51 2L:26101763-26135649 OK 0 1.05238 1.79769e+308 1.79769e+308 0.0100919 0.0436812 yes
    TCONS_00001430 XLOC_000713 OBP51 2L:26101763-26135649 OK 0 1.87352 1.79769e+308 1.79769e+308 0.00140353 0.0102479 yes
    TCONS_00006075 XLOC_003687 AGAP005079 2L:9725746-9755495 OK 461420 0 -1.79769e+308 -1.79769e+308 0.00735375 0.0350816 yes
    TCONS_00009083 XLOC_001472 AGAP007633 2L:48492236-48512012 OK 0.94828 10.0326 2.35895 -7.9951 1.33E-15 2.11E-13 yes


    can somebody help me understand:
    1-why there is so much difference?
    2-which is the most stringent option? which transcripts should I consider as deferentially expressed? I get 1094 transcripts significantly deferentially expressed in the first run and many many more in the second run and the overlap is minimal. what should i use?

    thanks

  • #2
    Some Thoughts

    It depends on how well you feel your genome is annotated. If I feel the genes I am interested in are contained within the annotation for my organism then I use strict.

    I sometimes will run the guide or remove the reads that aligned to annotated genes then use a denovo cufflinks to identify potential genes that might have been missed by the annotation and could be of interest to my particular biological question.

    denovo cufflinks however tends to find exons rather than full genes and will look for differential expression of the exons which could possibly explain why you are finding a significantly higher number of differentially expressed "genes" in your guided cufflinks output. Unfortunately, exon comparisons can also lead to the unfortunate case where due to poor sampling one exons suggests a significant increase in expression between two conditions while a different exon of the same gene shows a significant decrease in expression.




    Originally posted by maryb View Post
    Hi folks, I am running cufflinks in two ways: 1-with a genome annotation as a strict reference, 2-a genome annotation as a guide. these are options -G/-g.
    After cufflinks, I run cuffcompare and cuffdiff, using the exact same options in the two runs. For cuffdiff I use a minimum alignment of 10 and quartile normalization.
    this is an example of my cuffdiff-transcript differential expression testing outputs:

    RUN1=annotation as a STRICT reference (option G)
    test_id gene_id gene locus status sample 1 sample 2 ln(fold_change) test_stat p_value q_value significant
    TCONS_00000808 XLOC_000674 OBP51 2L:26101763-26135649 OK 0 3.59443 -1.79769e+308 -1.79769e+308 0.00048021 0.00886047 yes
    TCONS_00000244 XLOC_000192 AGAP005079 2L:9735320-9746596 OK 0 12.3904 -1.79769e+308 -1.79769e+308 1.40E-10 1.18E-08 yes
    TCONS_00000245 XLOC_000192 AGAP005079 2L:9735320-9746596 OK 0 26.4766 -1.79769e+308 -1.79769e+308 1.03E-11 1.09E-09 yes
    TCONS_00001658 XLOC_001422 AGAP007633 2L:48492236-48511972 OK 0 4.76816 -1.79769e+308 -1.79769e+308 4.54E-09 3.07E-07 yes




    RUN2=annotation as a GUIDE (option g)
    test_id gene_id gene locus status sample 1 sample2 ln(fold_change) test_stat p_value q_value significant
    TCONS_00001429 XLOC_000713 OBP51 2L:26101763-26135649 OK 0 1.05238 1.79769e+308 1.79769e+308 0.0100919 0.0436812 yes
    TCONS_00001430 XLOC_000713 OBP51 2L:26101763-26135649 OK 0 1.87352 1.79769e+308 1.79769e+308 0.00140353 0.0102479 yes
    TCONS_00006075 XLOC_003687 AGAP005079 2L:9725746-9755495 OK 461420 0 -1.79769e+308 -1.79769e+308 0.00735375 0.0350816 yes
    TCONS_00009083 XLOC_001472 AGAP007633 2L:48492236-48512012 OK 0.94828 10.0326 2.35895 -7.9951 1.33E-15 2.11E-13 yes


    can somebody help me understand:
    1-why there is so much difference?
    2-which is the most stringent option? which transcripts should I consider as deferentially expressed? I get 1094 transcripts significantly deferentially expressed in the first run and many many more in the second run and the overlap is minimal. what should i use?

    thanks

    Comment


    • #3
      1.79769e+308

      what does in fold change of 1.79769e+308 mean? Can anyone explain that?
      Thanks!

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 08:47 AM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      54 views
      0 likes
      Last Post seqadmin  
      Working...
      X