Hello, there,
1. I did genome-guided de novo transcripts assembly for my RNAseq data using cufflinks. Here .sam file is from STAR mapping
cufflinks -p 8 /mapping/mapped.sam
2. I then merged the resultant gtf files from the same tissue to have merged.gtf without including reference.gtf
cuffmerge -p 8 gtf.filelist.DeNovo
3. I tried to find the closest gene id for those de novo assembled transcripts
cuffcompare merged.gtf -r reference.gtf
What I've found is that none of my de novo assembled transcripts are mapped to the reference gtf even though some introns are apparently identical between the merged.gtf and reference.gtf
for example:
from the cufflinks merged.gtf, I have
more XLOC_005458.gtf
chr2 Cufflinks exon 25289899 25290661 . . . gene_id "XLOC_005458"; transcript_id "TCONS_00010739"; exon_number "1"; oId "CUFF.5451.1"; tss_i
d "TSS7438";
chr2 Cufflinks exon 25290738 25290883 . . . gene_id "XLOC_005458"; transcript_id "TCONS_00010739"; exon_number "2"; oId "CUFF.5451.1"; tss_i
d "TSS7438";
chr2 Cufflinks exon 25290976 25291190 . . . gene_id "XLOC_005458"; transcript_id "TCONS_00010739"; exon_number "3"; oId "CUFF.5451.1"; tss_i
d "TSS7438";
chr2 Cufflinks exon 25289938 25290082 . . . gene_id "XLOC_005458"; transcript_id "TCONS_00010740"; exon_number "1"; oId "CUFF.5451.2"; tss_i
d "TSS7438";
chr2 Cufflinks exon 25290388 25291177 . . . gene_id "XLOC_005458"; transcript_id "TCONS_00010740"; exon_number "2"; oId "CUFF.5451.2"; tss_i
d "TSS7438";
from the reference.gtf, I have:
2 ensembl_havana CDS 25289989 25290661 . + 0 ccds_id "CCDS15763"; exon_number "1"; gene_biotype "protein_coding"; gene_id "ENSMUSG00000026961"; gene_name "Lrrc26"; gene_source "ensembl_havana"; gene_version "6"; havana_gene "OTTMUSG00000011934"; havana_gene_version "1"; havana_transcript "OTTMUST00000028197"; havana_transcript_version "1"; p_id "P45943"; protein_id "ENSMUSP00000028337"; protein_version "6"; tag "basic"; transcript_biotype "protein_coding"; transcript_id "ENSMUST00000028337"; transcript_name "Lrrc26-001"; transcript_source "ensembl_havana"; transcript_support_level "1"; transcript_version "6"; tss_id "TSS86428";
2 ensembl_havana CDS 25290738 25291057 . + 2 ccds_id "CCDS15763"; exon_number "2"; gene_biotype "protein_coding"; gene_id "ENSMUSG00000026961"; gene_name "Lrrc26"; gene_source "ensembl_havana"; gene_version "6"; havana_gene "OTTMUSG00000011934"; havana_gene_version "1"; havana_transcript "OTTMUST00000028197"; havana_transcript_version "1"; p_id "P45943"; protein_id "ENSMUSP00000028337"; protein_version "6"; tag "basic"; transcript_biotype "protein_coding"; transcript_id "ENSMUST00000028337"; transcript_name "Lrrc26-001"; transcript_source "ensembl_havana"; transcript_support_level "1"; transcript_version "6"; tss_id "TSS86428";
Apparently the same intron (25290661 .. 25290738) exists in both the de novo assemble transcript and the reference. So my question is why the XLOC_005458 from cufflinks output is not mapped to the Lrrc26 in reference.gtf even though they share the same gene region?
Thanks for any inputs!
C.
1. I did genome-guided de novo transcripts assembly for my RNAseq data using cufflinks. Here .sam file is from STAR mapping
cufflinks -p 8 /mapping/mapped.sam
2. I then merged the resultant gtf files from the same tissue to have merged.gtf without including reference.gtf
cuffmerge -p 8 gtf.filelist.DeNovo
3. I tried to find the closest gene id for those de novo assembled transcripts
cuffcompare merged.gtf -r reference.gtf
What I've found is that none of my de novo assembled transcripts are mapped to the reference gtf even though some introns are apparently identical between the merged.gtf and reference.gtf
for example:
from the cufflinks merged.gtf, I have
more XLOC_005458.gtf
chr2 Cufflinks exon 25289899 25290661 . . . gene_id "XLOC_005458"; transcript_id "TCONS_00010739"; exon_number "1"; oId "CUFF.5451.1"; tss_i
d "TSS7438";
chr2 Cufflinks exon 25290738 25290883 . . . gene_id "XLOC_005458"; transcript_id "TCONS_00010739"; exon_number "2"; oId "CUFF.5451.1"; tss_i
d "TSS7438";
chr2 Cufflinks exon 25290976 25291190 . . . gene_id "XLOC_005458"; transcript_id "TCONS_00010739"; exon_number "3"; oId "CUFF.5451.1"; tss_i
d "TSS7438";
chr2 Cufflinks exon 25289938 25290082 . . . gene_id "XLOC_005458"; transcript_id "TCONS_00010740"; exon_number "1"; oId "CUFF.5451.2"; tss_i
d "TSS7438";
chr2 Cufflinks exon 25290388 25291177 . . . gene_id "XLOC_005458"; transcript_id "TCONS_00010740"; exon_number "2"; oId "CUFF.5451.2"; tss_i
d "TSS7438";
from the reference.gtf, I have:
2 ensembl_havana CDS 25289989 25290661 . + 0 ccds_id "CCDS15763"; exon_number "1"; gene_biotype "protein_coding"; gene_id "ENSMUSG00000026961"; gene_name "Lrrc26"; gene_source "ensembl_havana"; gene_version "6"; havana_gene "OTTMUSG00000011934"; havana_gene_version "1"; havana_transcript "OTTMUST00000028197"; havana_transcript_version "1"; p_id "P45943"; protein_id "ENSMUSP00000028337"; protein_version "6"; tag "basic"; transcript_biotype "protein_coding"; transcript_id "ENSMUST00000028337"; transcript_name "Lrrc26-001"; transcript_source "ensembl_havana"; transcript_support_level "1"; transcript_version "6"; tss_id "TSS86428";
2 ensembl_havana CDS 25290738 25291057 . + 2 ccds_id "CCDS15763"; exon_number "2"; gene_biotype "protein_coding"; gene_id "ENSMUSG00000026961"; gene_name "Lrrc26"; gene_source "ensembl_havana"; gene_version "6"; havana_gene "OTTMUSG00000011934"; havana_gene_version "1"; havana_transcript "OTTMUST00000028197"; havana_transcript_version "1"; p_id "P45943"; protein_id "ENSMUSP00000028337"; protein_version "6"; tag "basic"; transcript_biotype "protein_coding"; transcript_id "ENSMUST00000028337"; transcript_name "Lrrc26-001"; transcript_source "ensembl_havana"; transcript_support_level "1"; transcript_version "6"; tss_id "TSS86428";
Apparently the same intron (25290661 .. 25290738) exists in both the de novo assemble transcript and the reference. So my question is why the XLOC_005458 from cufflinks output is not mapped to the Lrrc26 in reference.gtf even though they share the same gene region?
Thanks for any inputs!
C.
Comment