Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • capricy
    Senior Member
    • Apr 2012
    • 125

    annotate cufflink assembled transcripts with reference gtf

    Hello, there,

    1. I did genome-guided de novo transcripts assembly for my RNAseq data using cufflinks. Here .sam file is from STAR mapping

    cufflinks -p 8 /mapping/mapped.sam

    2. I then merged the resultant gtf files from the same tissue to have merged.gtf without including reference.gtf

    cuffmerge -p 8 gtf.filelist.DeNovo

    3. I tried to find the closest gene id for those de novo assembled transcripts

    cuffcompare merged.gtf -r reference.gtf

    What I've found is that none of my de novo assembled transcripts are mapped to the reference gtf even though some introns are apparently identical between the merged.gtf and reference.gtf

    for example:

    from the cufflinks merged.gtf, I have

    more XLOC_005458.gtf
    chr2 Cufflinks exon 25289899 25290661 . . . gene_id "XLOC_005458"; transcript_id "TCONS_00010739"; exon_number "1"; oId "CUFF.5451.1"; tss_i
    d "TSS7438";
    chr2 Cufflinks exon 25290738 25290883 . . . gene_id "XLOC_005458"; transcript_id "TCONS_00010739"; exon_number "2"; oId "CUFF.5451.1"; tss_i
    d "TSS7438";
    chr2 Cufflinks exon 25290976 25291190 . . . gene_id "XLOC_005458"; transcript_id "TCONS_00010739"; exon_number "3"; oId "CUFF.5451.1"; tss_i
    d "TSS7438";
    chr2 Cufflinks exon 25289938 25290082 . . . gene_id "XLOC_005458"; transcript_id "TCONS_00010740"; exon_number "1"; oId "CUFF.5451.2"; tss_i
    d "TSS7438";
    chr2 Cufflinks exon 25290388 25291177 . . . gene_id "XLOC_005458"; transcript_id "TCONS_00010740"; exon_number "2"; oId "CUFF.5451.2"; tss_i
    d "TSS7438";

    from the reference.gtf, I have:
    2 ensembl_havana CDS 25289989 25290661 . + 0 ccds_id "CCDS15763"; exon_number "1"; gene_biotype "protein_coding"; gene_id "ENSMUSG00000026961"; gene_name "Lrrc26"; gene_source "ensembl_havana"; gene_version "6"; havana_gene "OTTMUSG00000011934"; havana_gene_version "1"; havana_transcript "OTTMUST00000028197"; havana_transcript_version "1"; p_id "P45943"; protein_id "ENSMUSP00000028337"; protein_version "6"; tag "basic"; transcript_biotype "protein_coding"; transcript_id "ENSMUST00000028337"; transcript_name "Lrrc26-001"; transcript_source "ensembl_havana"; transcript_support_level "1"; transcript_version "6"; tss_id "TSS86428";

    2 ensembl_havana CDS 25290738 25291057 . + 2 ccds_id "CCDS15763"; exon_number "2"; gene_biotype "protein_coding"; gene_id "ENSMUSG00000026961"; gene_name "Lrrc26"; gene_source "ensembl_havana"; gene_version "6"; havana_gene "OTTMUSG00000011934"; havana_gene_version "1"; havana_transcript "OTTMUST00000028197"; havana_transcript_version "1"; p_id "P45943"; protein_id "ENSMUSP00000028337"; protein_version "6"; tag "basic"; transcript_biotype "protein_coding"; transcript_id "ENSMUST00000028337"; transcript_name "Lrrc26-001"; transcript_source "ensembl_havana"; transcript_support_level "1"; transcript_version "6"; tss_id "TSS86428";

    Apparently the same intron (25290661 .. 25290738) exists in both the de novo assemble transcript and the reference. So my question is why the XLOC_005458 from cufflinks output is not mapped to the Lrrc26 in reference.gtf even though they share the same gene region?

    Thanks for any inputs!

    C.
    Last edited by capricy; 02-01-2017, 08:15 AM.
  • shunyip
    Member
    • Oct 2013
    • 20

    #2
    RNA molecules can suffer from degradation. However, introns are identified by splice junctions and are often in the middle of the RNA reads. So, it is more likely for introns to be identified correctly. If you want all genes to be mapped very similar to the reference, you might need higher sequencing depth and/or higher quality data.

    Comment

    • capricy
      Senior Member
      • Apr 2012
      • 125

      #3
      Then what is the easy way to annotate those assembled transcripts? I meant, I would like to find the closest reference gene IDs for the transcripts.

      Thanks.

      C.

      Comment

      • shunyip
        Member
        • Oct 2013
        • 20

        #4
        Hi Capricy,

        You can supply your gene annotation (reference.gtf) to Cufflinks during assembly, using the -g argument.
        Or you can use bedtools intersect to overlap and combine your merged.gtf and reference.gtf. Here is its document. You need to convert the gtf files into bed files for this method.


        I hope this helps,

        Comment

        • capricy
          Senior Member
          • Apr 2012
          • 125

          #5
          According to the cuffcompare document, if I use -r <reference.gtf>, the output should be able to identify the overlapped transfrags. But it did not in my case.

          Just wonder if there is something wrong with my steps?

          C.

          Comment

          • capricy
            Senior Member
            • Apr 2012
            • 125

            #6
            I didn't use -g since I only would like to see the de novo assembled transfrags.

            Comment

            • shunyip
              Member
              • Oct 2013
              • 20

              #7
              Then, it would seem that an easy way for you is to use bedtools.

              You can convert a gtf file to bed file using:
              Code:
              cut -f 1,4,5,9 yourfile.gtf > yourfile.bed
              This extracts the 1st, 4th, 5th and 9th columns from the gtf files and write them to a new file.

              Then, you can use bedtools intersect to overlap the two files.
              It seems that the -loj and -wao arguments suit your case well. You can take a look.

              Comment

              Latest Articles

              Collapse

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              24 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              29 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              39 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              61 views
              0 reactions
              Last Post SEQadmin2  
              Working...