Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • running cufflinks with a genome annotation as a strict reference or as a guide.

    Hi folks, I am running cufflinks in two ways: 1-with a genome annotation as a strict reference, 2-a genome annotation as a guide. these are options -G/-g.
    After cufflinks, I run cuffcompare and cuffdiff, using the exact same options in the two runs. For cuffdiff I use a minimum alignment of 10 and quartile normalization.
    this is an example of my cuffdiff-transcript differential expression testing outputs:

    RUN1=annotation as a STRICT reference (option G)
    test_id gene_id gene locus status sample 1 sample 2 ln(fold_change) test_stat p_value q_value significant
    TCONS_00000808 XLOC_000674 OBP51 2L:26101763-26135649 OK 0 3.59443 -1.79769e+308 -1.79769e+308 0.00048021 0.00886047 yes
    TCONS_00000244 XLOC_000192 AGAP005079 2L:9735320-9746596 OK 0 12.3904 -1.79769e+308 -1.79769e+308 1.40E-10 1.18E-08 yes
    TCONS_00000245 XLOC_000192 AGAP005079 2L:9735320-9746596 OK 0 26.4766 -1.79769e+308 -1.79769e+308 1.03E-11 1.09E-09 yes
    TCONS_00001658 XLOC_001422 AGAP007633 2L:48492236-48511972 OK 0 4.76816 -1.79769e+308 -1.79769e+308 4.54E-09 3.07E-07 yes




    RUN2=annotation as a GUIDE (option g)
    test_id gene_id gene locus status sample 1 sample2 ln(fold_change) test_stat p_value q_value significant
    TCONS_00001429 XLOC_000713 OBP51 2L:26101763-26135649 OK 0 1.05238 1.79769e+308 1.79769e+308 0.0100919 0.0436812 yes
    TCONS_00001430 XLOC_000713 OBP51 2L:26101763-26135649 OK 0 1.87352 1.79769e+308 1.79769e+308 0.00140353 0.0102479 yes
    TCONS_00006075 XLOC_003687 AGAP005079 2L:9725746-9755495 OK 461420 0 -1.79769e+308 -1.79769e+308 0.00735375 0.0350816 yes
    TCONS_00009083 XLOC_001472 AGAP007633 2L:48492236-48512012 OK 0.94828 10.0326 2.35895 -7.9951 1.33E-15 2.11E-13 yes


    can somebody help me understand:
    1-why there is so much difference?
    2-which is the most stringent option? which transcripts should I consider as deferentially expressed? I get 1094 transcripts significantly deferentially expressed in the first run and many many more in the second run and the overlap is minimal. what should i use?

    thanks

  • #2
    Some Thoughts

    It depends on how well you feel your genome is annotated. If I feel the genes I am interested in are contained within the annotation for my organism then I use strict.

    I sometimes will run the guide or remove the reads that aligned to annotated genes then use a denovo cufflinks to identify potential genes that might have been missed by the annotation and could be of interest to my particular biological question.

    denovo cufflinks however tends to find exons rather than full genes and will look for differential expression of the exons which could possibly explain why you are finding a significantly higher number of differentially expressed "genes" in your guided cufflinks output. Unfortunately, exon comparisons can also lead to the unfortunate case where due to poor sampling one exons suggests a significant increase in expression between two conditions while a different exon of the same gene shows a significant decrease in expression.




    Originally posted by maryb View Post
    Hi folks, I am running cufflinks in two ways: 1-with a genome annotation as a strict reference, 2-a genome annotation as a guide. these are options -G/-g.
    After cufflinks, I run cuffcompare and cuffdiff, using the exact same options in the two runs. For cuffdiff I use a minimum alignment of 10 and quartile normalization.
    this is an example of my cuffdiff-transcript differential expression testing outputs:

    RUN1=annotation as a STRICT reference (option G)
    test_id gene_id gene locus status sample 1 sample 2 ln(fold_change) test_stat p_value q_value significant
    TCONS_00000808 XLOC_000674 OBP51 2L:26101763-26135649 OK 0 3.59443 -1.79769e+308 -1.79769e+308 0.00048021 0.00886047 yes
    TCONS_00000244 XLOC_000192 AGAP005079 2L:9735320-9746596 OK 0 12.3904 -1.79769e+308 -1.79769e+308 1.40E-10 1.18E-08 yes
    TCONS_00000245 XLOC_000192 AGAP005079 2L:9735320-9746596 OK 0 26.4766 -1.79769e+308 -1.79769e+308 1.03E-11 1.09E-09 yes
    TCONS_00001658 XLOC_001422 AGAP007633 2L:48492236-48511972 OK 0 4.76816 -1.79769e+308 -1.79769e+308 4.54E-09 3.07E-07 yes




    RUN2=annotation as a GUIDE (option g)
    test_id gene_id gene locus status sample 1 sample2 ln(fold_change) test_stat p_value q_value significant
    TCONS_00001429 XLOC_000713 OBP51 2L:26101763-26135649 OK 0 1.05238 1.79769e+308 1.79769e+308 0.0100919 0.0436812 yes
    TCONS_00001430 XLOC_000713 OBP51 2L:26101763-26135649 OK 0 1.87352 1.79769e+308 1.79769e+308 0.00140353 0.0102479 yes
    TCONS_00006075 XLOC_003687 AGAP005079 2L:9725746-9755495 OK 461420 0 -1.79769e+308 -1.79769e+308 0.00735375 0.0350816 yes
    TCONS_00009083 XLOC_001472 AGAP007633 2L:48492236-48512012 OK 0.94828 10.0326 2.35895 -7.9951 1.33E-15 2.11E-13 yes


    can somebody help me understand:
    1-why there is so much difference?
    2-which is the most stringent option? which transcripts should I consider as deferentially expressed? I get 1094 transcripts significantly deferentially expressed in the first run and many many more in the second run and the overlap is minimal. what should i use?

    thanks

    Comment


    • #3
      1.79769e+308

      what does in fold change of 1.79769e+308 mean? Can anyone explain that?
      Thanks!

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      30 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      32 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Working...
      X