Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • High level of novel exons/introns in cuffcompare data

    Hello, all.

    I am new to the forums and the whole of bioinformatics (I've been at it two weeks), but I have done a good deal of reading and have been playing around with the tophat->cufflinks pipeline.

    Currently, I have RNA-seq libraries constructed from pineal glands of 3 aged patients. I am attempting to identify novel transcripts in this relatively small library, at which point I will move up to a larger library.

    However, after assembling the transcripts with cufflinks (using the latest Ensembl human genome as my reference for the RABT), running cuffcompare to compare this pooled data back to the same Ensembl genome results in a very high percentage of transcripts identified as novel. Specifically, 47.4% of exons and 25.8% of introns are identified as novel, as are 82.4% of loci.

    Now, I am fairly certain these numbers cannot be correct. I recognize that we expect to find some number of new annotations, but this seems ludicrously high. I was wondering

    1) What could account for this very high report of novel transcripts? Could it just be lousy coverage resulting in many sparse transcripts being 'false positives'? I know that we did not have large amounts of RNA from these pineal glands (they're small, of course). If it's the data that is indeed the problem, how could I demonstrate this fact?

    2) Do you have any suggestions on enhancing this method to identify novel transcripts? I had hoped to use Cuffcompare's 'j' tag to look at possible novel transcripts, but I am either getting identity to the reference genome (code =) or totally unknown transcripts (code u) at this point, with very little exception. I had had an idea to run the cufflinks assembly with the reference genome listed as both a reference and the mask...I think I will try that out and see how it works until I get a better idea.

    Hopefully this is enough information. I look forward to any advice the sages of this forum can give.

    -RP

  • #2
    Have you looked at the location of some of the predictions in IGV or another browser? You might get a better idea of why these metrics are so high by doing so.

    Comment


    • #3
      hello Devon,

      how i can identify novel transcript when i run cuffcompare ???
      tophat -o output arabidopsis.fa file1_R1.fq file1_R2.fq
      cufflinks -o output accepted_hits.bam
      cuffmerge -s arabidopsis.fa assemblies.txt
      assemblies.txt(transcripts_1.gtf........transcripts_n.gtf)
      cuffcompare -s arabidopsis.fa -r known_annotation.gtf merged.gtf

      when i run this command i didn't get any FPKM values in the output file !! so please any one suggest that how can i identify novel transcripts??
      and output file-
      ref_gene_id ref_id class_code cuff_gene_id cuff_id FMI FPKM FPKM_conf_lo FPKM_conf_hi cov len major_iso_id ref_match_len
      ANAC001 AT1G01010.1 = XLOC_000001 TCONS_00000002 0 0.000000 0.000000 0.000000 0.000000 1694 TCONS_00000002 1688
      ANAC001 AT1G01010.1 j XLOC_000001 TCONS_00000001 0 0.000000 0.000000 0.000000 0.000000 1674 TCONS_00000002 1688
      DCL1 AT1G01040.1 j XLOC_000002 TCONS_00000004 0 0.000000 0.000000 0.000000 0.000000 6611 TCONS_00000004 6251
      DCL1 AT1G01040.1 = XLOC_000002 TCONS_00000003 0 0.000000 0.000000 0.000000 0.000000 6251 TCONS_00000004 6251
      DCL1 AT1G01040.2 = XLOC_000002 TCONS_00000005 0 0.000000 0.000000 0.000000 0.000000 5984 TCONS_00000004 5877
      AT1G01073 AT1G01073.1 = XLOC_000003 TCONS_00000006 0 0.000000 0.000000 0.000000 0.000000 111 TCONS_00000006 111
      IQD18 AT1G01110.2 = XLOC_000004 TCONS_00000007 0 0.000000 0.000000 0.000000 0.000000 1782 TCONS_00000007 1782
      AT1G01115 AT1G01115.1 = XLOC_000005 TCONS_00000008 0 0.000000 0.000000 0.000000 0.000000 117 TCONS_00000008 117
      GIF2 AT1G01160.1 = XLOC_000006 TCONS_00000009 0 0.000000 0.000000 0.000000 0.000000 1045 TCONS_00000010 1045
      GIF2 AT1G01160.2 = XLOC_000006 TCONS_00000010 0 0.000000 0.000000 0.000000 0.000000 1129 TCONS_00000010 1129
      AT1G01180 AT1G01180.1 = XLOC_000007 TCONS_00000011 0 0.000000 0.000000 0.000000 0.000000 1176 TCONS_00000011 1176
      MIR165A AT1G01183.1 x XLOC_000008 TCONS_00000012 0 0.000000 0.000000 0.000000 0.000000 651 TCONS_00000012 101
      F6F3.2 AT1G01210.1 = XLOC_000009 TCONS_00000013 0 0.000000 0.000000 0.000000 0.000000 616 TCONS_00000013 616
      FKGP AT1G01220.1 = XLOC_000010 TCONS_00000014 0 0.000000 0.000000 0.000000 0.000000 3532 TCONS_00000014 3532
      Last edited by am@i; 04-15-2014, 04:27 AM.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:37 PM
      0 responses
      10 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 06:07 PM
      0 responses
      9 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      50 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      67 views
      0 likes
      Last Post seqadmin  
      Working...
      X