Unconfigured Ad

**epi** · 03-05-2012, 07:09 AM

To all Cufflinks users and author, can anyone please comment if I have stated my problem clearly.

Unfortunately it may be a serious bug in this software, which is causing it to mix FPKMs between different genes. Not only it obviously makes cufflinks highly unreliable, but a puts a question mark over the published findings.

Thanks for your attention and replies.

**jcgrenier** · 03-09-2012, 12:12 PM

Hi,

The option "-g" that you are using is usually used to discover new transcripts. So the CUFF.* names that you got there are for those newly found transcripts.
If you only want to do it on known transcripts, use the "-G" option.

That is, I don't know why CuffMerge gives us certain oId with CUFF.* names.

**jcgrenier** · 03-09-2012, 12:17 PM

For the tss_id, there is probably a bug, because it's numerated in ascending order but the TSS number as nothing to do with the gene itself.

***Update***
Cuffmerge merge together multiple transcripts.gtf files coming from different analyses. The TSS information is not present in that file. Maybe it doesn't use the information contained in the given reference GTF file and just rename the tss_id...

**Cole Trapnell** · 03-12-2012, 05:10 AM

As far as I can tell, Cufflinks is working correctly and as described in the manual. As noted above, you want -G, not -g.

**Luyi Tian** · 03-12-2012, 06:37 AM

There is a nature protocol published several days before, which may help you:
"Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks"

**epi** · 03-12-2012, 07:48 AM

Thanks you all for replies, and I apologize if I was panicking more than needed. Perhaps I should explain my question a bit further.

As I understand, -g adds new transcripts "in addition to" existing ones, which is my goal under this analysis.

From the manual:
Output will include all reference transcripts as well as any novel genes and isoforms that are assembled

While for many other genes, it is using the original gene name, for Trib3, it is converting it to CUFF.9451, and after cuffmerge to CUFF.7092.2. This is the source of all confusion, from manual and published protocol it is not clear why it should do that. After all, the annotation of Trib3 transcript and its exons is identical to CUFF.9451. They why this behaves like a novel gene now.

From jcgrenier comments, I went back and found that it is indeed correct that transcript_id are incremented in ascending order, which has made a chance identity of IDs between chr2 and chrY, could very well be other cases which I don't know of.

Actually, apart form a specific case, the real question here is how come cufflinks is not using the transcript_ID, p_ID and in some cases gene_ID from the original GFF as it is supposed to.
Thanks again for comments.

**Cole Trapnell** · 03-13-2012, 04:36 AM

Originally posted by epi View Post

Actually, apart form a specific case, the real question here is how come cufflinks is not using the transcript_ID, p_ID and in some cases gene_ID from the original GFF as it is supposed to.

Hmm, that's annoying... can you show us how you're running cuffmerge? One limitation of the current version of Cuffmerge is that it doesn't do any of its own ORF prediction. So new isoforms of coding genes won't get assigned a new p_id or attached to an existing one. New genes will not be declared as coding or non-coding (this is a hard problem in general). However, they will get grouped under a tss_id. The thing is, Cuffmerge will reassign tss_ids just like it reassigns transcript_ids, because those also have to be unique. The only thing it really tries to propogate from the reference in terms of metadata is the gene short name.

**epi** · 03-15-2012, 08:40 AM

Originally posted by Cole Trapnell View Post

Hmm, that's annoying... can you show us how you're running cuffmerge? One limitation of the current version of Cuffmerge is that it doesn't do any of its own ORF prediction. So new isoforms of coding genes won't get assigned a new p_id or attached to an existing one. New genes will not be declared as coding or non-coding (this is a hard problem in general). However, they will get grouped under a tss_id. The thing is, Cuffmerge will reassign tss_ids just like it reassigns transcript_ids, because those also have to be unique. The only thing it really tries to propogate from the reference in terms of metadata is the gene short name.

Hi Cole, Thanks again for replying. While your comment clarifies some things, it also raises some more question. First here is my command

Code:

cuffmerge -o test1 -g genes.gtf -p 10 -s ~/path/mm9 assemblies1.txt

Assemblies1.txt contains absolute path to cufflinks gtf

Code:

path-to-dir/transcripts_sample1.gtf
path-to-dir/transcripts_sample2.gtf
.
.
.

The behavior you described is what I am observing as well. As I mentioned earlier, I am using -g and not -G to allow known as well as novel genes. Should I not expect it to preserve transcript, TSS and pids for the known genes.

Also I am still unclear why there are new predictions (with CUFF. IDs) that are identical to existing annotations. I have manually looked at one of the test runs and all new predictions (with CUFF. IDs) were original genes/transcripts in the source GTF annotation (I must have looked only about 20-25 ). So far I have not encountered any novel predictions which do not correspond to existing annotation. In some cases, original annotation still exists and in some cases not. Similar to the example about Trib3 I posted above.

**Lee Sam** · 04-05-2012, 09:46 AM

Has anyone figured this issue out? I'm running a small study with cancer samples and I don't even see TP53 annotated in the genes.fpkm_tracking using either the Illumina-provided UCSC or Ensembl files (with chromosome names edited for compatibility). I'm running using the "-g" option as well and I was expecting that existing genes observed by cufflinks would have entries with the annotated gene names.

**AsoBioInfo** · 05-14-2012, 08:50 PM

Hi, I also encountered the same issue. The ./cufflinks also gave me some *.CUFF id's.
I used -g option, as I am interested in novel transcripts. How can you get all novel transcripts or genes?

**epi** · 05-15-2012, 05:38 AM

Thanks for commenting here, it is good to know I am not the only one in the this boat. I have pinged Cole again, hope he writes something here.

**goudurix** · 06-04-2012, 03:31 AM

Hi all,
Epi, did you find a workaround ? I ran into the same problem. Running cufflinks with -g options, some original gene_ids were converted to new identifiers sharing the CUFF prefix.
Il2ra (NM_008367) became CUFF.265243...

Regards

**andremho** · 06-07-2012, 12:37 AM

Hi, does anyone have an answer to this question?

I am fairly new to this game, but I am running a small RNA-seq study on cancer samples.
I have mapped the reads using tophat, and quantified transcript/gene expression using cufflinks and the -g option with ENSEMBL annotation .gtf file.

However, I also don't find many of the expected expressed gene IDs in my genes.fpkm_tracking file, like TP53 etc. But when i do a manual inspection of the assembly using IGV, I clearly see that these genes are highly covered by reads....

Also, in my isoforms.fpkm_tracking file I can find some of the transcript Ids for these genes, but as i said not their gene IDs..

Can anyone explain this/give a solution to my problem?

**manianslab** · 06-29-2012, 12:29 PM

Unfortunately, cuffdiff does not seem to carry over the original transcript ids from the merged.gtf file in to the output files. However, you can either use Excel (vlookup) or a perlscript or awk to populate an additional field in your 'diff' output files by searching merged.gtf. The original transcript ids are listed as 'oId' in the merged.gtf created by cuffmerge.

I will post a script if I find time to get one done.

hope this helps!

Topics	Statistics	Last Post
Study Captures the First Moments of DNA Replication by SEQadmin2 Started by SEQadmin2, 07-24-2026, 12:17 PM	0 responses 28 views 0 reactions	Last Post by SEQadmin2 07-24-2026, 12:17 PM
Chemotherapy Leaves Detectable DNA Signatures in Childhood Tumors by SEQadmin2 Started by SEQadmin2, 07-23-2026, 11:41 AM	0 responses 21 views 0 reactions	Last Post by SEQadmin2 07-23-2026, 11:41 AM
Single-Cell Atlases Skew Toward European Ancestry, Analysis Finds by SEQadmin2 Started by SEQadmin2, 07-20-2026, 11:10 AM	0 responses 210 views 0 reactions	Last Post by SEQadmin2 07-20-2026, 11:10 AM
UC San Diego Bioengineers Map Gene Function in Human Stem Cells by SEQadmin2 Started by SEQadmin2, 07-13-2026, 10:26 AM	0 responses 78 views 0 reactions	Last Post by SEQadmin2 07-13-2026, 10:26 AM

Unconfigured Ad

Cufflinks annotation handling business

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News