SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Cufflinks, differentially expressed genes statsteam Bioinformatics 5 11-15-2013 11:28 AM
novel genes and transcripts in cuffdiff output asling RNA Sequencing 0 07-27-2011 12:52 AM
Cufflinks merging nearby genes xinchen Bioinformatics 2 01-24-2011 06:28 PM
Cufflinks merges adjacent genes proteomania Bioinformatics 1 11-20-2010 02:58 PM
inconsistent gene names in genes.expr - Cufflinks Boel Bioinformatics 2 04-14-2010 05:16 AM

Reply
 
Thread Tools
Old 12-03-2010, 10:53 AM   #1
silin284
Member
 
Location: ny

Join Date: Jul 2009
Posts: 23
Default Bug? duplicated genes in cufflinks output genes.expr

Hi

When i supplied a reference gtf to cufflinks (-G), i found there are duplicated geneID in the output "genes.expr". That is a bit weird to me and it is very rare (3 out of 50k genes). I checked those 3 and it turns out that cufflink consider their isoforms as individual genes but still use the same gene_id supplied in the gtf file. All these 3 genes have a common characteristics. The genome positions of each isoform's transcript/exon/CDS are completely different. I guess cufflink use this information to judge whether different transcripts belongs to the same gene instead of using the gene_id information supplied in gtf.

I can remove them by hand but is there a way to "force" cufflinks to recognize them as a single gene?

cheers
silin

original GTF file
chr06 SZ transcript 3851140 3853473 . + . gene_id "Os06g07923"; transcript_id "Os06g07923.2";
chr06 SZ CDS 3851140 3851247 . + 0 gene_id "Os06g07923"; transcript_id "Os06g07923.2";
chr06 SZ CDS 3853062 3853304 . + 0 gene_id "Os06g07923"; transcript_id "Os06g07923.2";
chr06 SZ exon 3853305 3853473 . + . gene_id "Os06g07923"; transcript_id "Os06g07923.2";
###
chr06 SZ transcript 3851392 3852964 . + . gene_id "Os06g07923"; transcript_id "Os06g07923.1";
chr06 SZ exon 3851392 3851900 . + . gene_id "Os06g07923"; transcript_id "Os06g07923.1";
chr06 SZ CDS 3851901 3852434 . + 0 gene_id "Os06g07923"; transcript_id "Os06g07923.1";
chr06 SZ exon 3852435 3852964 . + . gene_id "Os06g07923"; transcript_id "Os06g07923.1";

cufflinks output "genes.expr"
Os06g07923 141826 chr06 3851139 3853473 0 0 0 OK
Os06g07923 141826 chr06 3851391 3852964 0 0 0 OK
silin284 is offline   Reply With Quote
Old 12-13-2011, 07:12 AM   #2
apadr007
Member
 
Location: washington DC

Join Date: Oct 2011
Posts: 21
Default

I have the same question. Why is cufflinks repeating genes?
apadr007 is offline   Reply With Quote
Old 02-24-2012, 01:48 AM   #3
kenphi
Junior Member
 
Location: Heidelberg, Germany

Join Date: Nov 2009
Posts: 2
Default

Dear silin

I think this is because in your reference annotation there are "unrelated" transcripts annotated to the same gene. I noticed that this happens, when there are independent transcript groups, i.e. groups of transcripts that do not overlap in exon coordinates. The can be side-by-side or one in the intron of the other. Some examples are in Ensembl 64

ENSMUSG00000086255
ENSMUSG00000062352
ENSMUSG00000021879
ENSMUSG00000033705
ENSMUSG00000087461
ENSMUSG00000022105
ENSMUSG00000073791
ENSMUSG00000052675
ENSMUSG00000055407
ENSMUSG00000056856
ENSMUSG00000027203

In some of these cases, I would say that Ensembl didn't follow its own guidelines, to assign the same gene identifier to transcripts with overlapping position, because there are clearly independent clusters.

I keep them and use the gene_id column of cufflinks to make tables unique.

Philip
kenphi is offline   Reply With Quote
Old 05-17-2014, 11:19 PM   #4
emanlee
Member
 
Location: Xi'an

Join Date: Apr 2013
Posts: 15
Default

Another thread on this issue:
http://seqanswers.com/forums/showthread.php?t=5224

A solution based on mgogol's code:
https://sourceforge.net/projects/col...?source=navbar
emanlee is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:32 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO