Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cuffmerge output: merged.gtf and transcript.gtf always vary ?

    Hi there,

    Could anyone please help me to understand the difference between two output files of cuffmerge, merged.gtf (which is supposed to be final output) and trasncript.gtf.

    I used cufflink to reconstruct a gene model from 3 time points and finally merged them using cuffmerge (with reference annotation). Two output files of cuffmerge are trasncript.gtf and merged.gtf. among others. Isn't the trasncript.gtf and merged.gtf suppose to be similar, or why aren't they similar?

    I converted both .gtf files into .bed, and overlapped them, i found around 2000 transcripts from transcript.gtf were missing in merged.gtf ? Then i looked into those missing cases (in merged.gtf), those trasncripts have nice coverage and gene model is also present in individual time point.

    So, why do we see this difference ?

    Thank you for your help in advance !!

    regards
    Chirag

  • #2
    i'm not sure of the answer but the transcripts.gtf file may be an intermediate step in the process of generating merged.gtf.

    All I know is the file you want to use after running cuffmerge is merged.gtf.
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment


    • #3
      The Tophat paper that was published in Nature Protocols says to use the merged.gtf, but when consulting the users guide on the cufflinks website it says to use the <transcripts.gtf>. For that reason I am still suspicious of what file to use.

      Comment


      • #4
        merged.gtf is the file to be used.
        transcript.gtf file can contain multiple transcripts which are redundant (ie, have similat splicing patterns), which cuffmerge merges them.

        Comment


        • #5
          It would be nice if somebody can explain what transcripts.gtf is.

          Comment


          • #6
            In my opinion: Eg: If a gene has multiple isoforms (which are coming multiple samples), all these isoforms will be in the trasncripts.gtf. If any isoforms have the exact splicing signals, then cuffmerge merges them and those redundant transcripts are not anymore on merged.gtf. So, i would suggest, or i rather use, merged.gtf.

            Secondly, try to filter merged.gtf based on read count, since many transcripts come up in merged.gtf (which is basically coming from reference gene annotation).

            Comment


            • #7
              I have same question. Transcripts missing in the merged.gtf are not only isoforms, but also genes. I assembled my RNA-seq data de nove and blasted result against the genes annotated in the two files, respectively. And I found that genes in transcripts.gtf fils have more hits than that in merged.gtf. Therefore, I think those genes in transcripts.gtf (not in merged.gtf) are not abundant.
              github:
              https://github.com/Bioinformatics-and-Genomics

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin


                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                Yesterday, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              39 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              41 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              35 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              55 views
              0 likes
              Last Post seqadmin  
              Working...
              X