SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
tophat/cufflinks no gene names or annotations showing up sagarc88 Bioinformatics 2 10-05-2015 11:27 AM
Long-distance post-doc position scami Academic/Non-Profit Jobs 3 05-07-2012 08:58 AM
converting UCSC gene names to Hugo Symbol names efoss Bioinformatics 2 07-16-2011 12:41 PM
Post doc on environmental genomics/metagenomics v_kisand Academic/Non-Profit Jobs 0 09-24-2010 12:48 AM
inconsistent gene names in genes.expr - Cufflinks Boel Bioinformatics 2 04-14-2010 05:16 AM

Reply
 
Thread Tools
Old 03-06-2013, 05:48 AM   #1
amcloon
Member
 
Location: Germany

Join Date: Sep 2012
Posts: 15
Default A post-doc and the case of the disappearing gene names (in Cufflinks/cuffdiff)

I've successfully gotten scatter plots, etc. in cummeRbund, but then when I try to actually dig into the data (trying to see if individual transcripts match the patterns we've seen previously with qPCR, convincing myself and my PI that this RNAseq is working) I don't see any gene ID in the cuff_diff files.

I don't have a .gtf file of my genome, only a .gff file and it sounds beyond my abilities to change it to a .gtf, and cuffmerge only takes .gtf files for annotation. You can run cuffmerge without a "-r <ref.gtf>" but if I do this, will I never have annotations in my later files?

Can I use cuffcompare with my .gff annotations to get gene names in the cuff_diff output?

Also, if I don't have specific gene annotations in my cuffdiff file, then what are showing up as my points in the cummeRbund scatter plots?

Thanks for your help,
Anna
amcloon is offline   Reply With Quote
Old 03-07-2013, 08:45 AM   #2
amcloon
Member
 
Location: Germany

Join Date: Sep 2012
Posts: 15
Default Not very satisfying possible workaround

Well, I've continued playing around and there are still many things that confuse me, but if you also have this problem (annotations in a .gff file and you want to have an annotated list for cuffdiff) it seems like you can get your gene annotations to show up in a cuffdiff output (and can make shiny graphs in cummeRbund) by using cuffcompare instead of cuffmerge to make a list of transcripts since cuffcompare will take .gff files.
amcloon is offline   Reply With Quote
Old 03-13-2013, 04:17 PM   #3
benjamir
Junior Member
 
Location: Houston, TX

Join Date: Nov 2012
Posts: 4
Default

The tuxedo suite is confusing and I feel R is more so.

The points on your cummeRbund plots are connected to genes or transcripts in gene_exp.diff or isoform_exp.diff, respectively. I used cuffmerge/cuffdiff. There are corresponding genes_fpkm.tracking and isoforms_fpkm.tracking files which connect cufflink's internal test_ids and gene_ids to either reference transcript identifiers (like refseq) or hugo gene symbols. Thus they are there, but it is just not that obvious.

cummeRbund has a function to add annotation to object's database, but I haven't used it yet. I will say the cummeRbund documentation at bioconductor is more thorough than at the MIT web site.

I found it helpful to first review the cufflinks manual carefully. I am also reading any post from the cufflinks developer, Cole Trapnell, for further insights.
benjamir is offline   Reply With Quote
Old 04-09-2013, 04:39 AM   #4
Illuminoid
Junior Member
 
Location: Geneva, Switzerland

Join Date: Oct 2011
Posts: 2
Default

I was having a similar problem and I couldn't figure out how to get my gene names listed in the cuffdiff output. With lots of messing about, I found that cuffmerge was the program that does this using the gtf file. Once I had the gtf file, everything was great. I think it is a much cleaner method to use cuffmerge (which does cuffcompare for you anyway).

I highly recommend trying to convert your gff file to gtf for future analyses. Maybe you have seen it already, but it seems that cufflinks already has a utility (gffread) to convert between gff and gtf formats. I have not used it myself, but the instructions seem straightforward (http://cufflinks.cbcb.umd.edu/gff.html).

Also, out of interest, which genome are you using? Maybe there is already a gtf annotation available already?

Cheers

Sam
Illuminoid is offline   Reply With Quote
Old 05-08-2013, 01:33 PM   #5
emmm
Junior Member
 
Location: Toronto, Canada

Join Date: Jan 2011
Posts: 1
Default

I had this problem too and solved it by changing my gff to a gtf (following Illuminoid's suggestion).

This is the command: gffread -E myspecies.gff3 -T -o- > myspecies.gtf

And I ran cuffmerge with the option '-g myspecies.gtf'

Thanks!!
emmm is offline   Reply With Quote
Old 06-17-2013, 02:38 PM   #6
Baoqing
Member
 
Location: Texas

Join Date: Jan 2013
Posts: 24
Default

Hi, guys

Just wanted to clarify if there is any argument as

“You can run cuffmerge without a "-r <ref.gtf>" but if I do this, will I never have annotations in my later files?”

I looked the manual, it does has a option -g to spefify annotation file. Am I wrong? I am interested in this discussion because when i follow the instruction to run my cuffmerge with the following command:
cuffmerge -g genes.gtf -s genome.fa -p 4 assemblies_mt.txt

at the end, in my different expressed gene file(gene_exp.diff), i got the gene_name (or is it gene_name ? or transcript_id) for each of the transcripts rather than gene_id(which i really wanted). Is there any way to do that in cufflinks? Thanks a lot for your help!
Baoqing is offline   Reply With Quote
Old 06-18-2013, 12:47 AM   #7
amcloon
Member
 
Location: Germany

Join Date: Sep 2012
Posts: 15
Default

Baoqing,
I don't remember, but I might have meant "-g" and not "-r". in any case, I found that it really matters how your .gtf file is coded. It doesn't just matter that it is .gtf, it also really matters how the final column is coded, and so maybe play around with what you call each term in the file to make sure that cufflinks is pulling out the label that you want?

Why not just re-code the file so that "gene_name" says what you want to show up? I think that was what finally worked for me, although to be honest, I switched to using DEseq, which ended up being much clearer for me, and works since splice variance isn't something that I'm worried about for my bacterium.
amcloon is offline   Reply With Quote
Old 06-18-2013, 08:04 AM   #8
Baoqing
Member
 
Location: Texas

Join Date: Jan 2013
Posts: 24
Default

Thank you, amcloon

My programming skill was not good enough to change the setting of the cufflinks, it seems the gene names was the default for the output, or some argument that i have not known yet. Anyway, i was planning to write a script just to match my gene name to the original .gtf file to pull out the gene_id, however, i am not sure if this is the smart way to do it. Or do you know anything related to it can share? I do not really want to do something redundant if the information is already there.
Best,
Baoqing is offline   Reply With Quote
Old 06-18-2013, 08:22 AM   #9
amcloon
Member
 
Location: Germany

Join Date: Sep 2012
Posts: 15
Default

As I said above, what worked for me was changing the way the .gtf file was coded. But still, cuffdiff wasn't the best option for me in the end, even when the gene names showed up. Good luck.
amcloon is offline   Reply With Quote
Old 07-12-2013, 01:58 AM   #10
gailjjj
Junior Member
 
Location: Hong Kong

Join Date: Jun 2013
Posts: 1
Default

Here is my stupid method:
Quote:
##########in R##########
> library(cummeRbund)
> cuff <- readCufflinks('diff_out')

> gene.features<-annotation(genes(cuff))
> write.table(gene.features,'gene_anno.txt',sep='\t',row.names=F,col.names=T,quote=F)
> gene.matrix<-fpkmMatrix(genes(cuff))
> write.table(gene.matrix, 'gene_matrix.txt', sep='\t',row.names = F, col.names = T, quote = F)
> gene.count.matrix<-countMatrix(genes(cuff))
> write.table(gene.count.matrix, 'gene_count_matrix.txt', sep='\t',row.names = F, col.names = T, quote = F)

> isoform.features<-annotation(isoforms(cuff))
> write.table(isoform.features,'isoform_anno.txt',sep='\t',row.names=F,col.names=T,quote=F)
> isoform.matrix<-fpkmMatrix(isoforms(cuff))
> write.table(isoform.matrix, 'isoform_matrix.txt', sep='\t',row.names = F, col.names = T, quote = F)
> isoform.count.matrix<-countMatrix(isoforms(cuff))
> write.table(isoform.count.matrix, 'isoform_count_matrix.txt', sep='\t',row.names = F, col.names = T, quote = F)

> q()
############quit R###########
$paste isoform_anno.txt isoform_count_matrix.txt isoform_matrix.txt >isoform_count_fpkm_matrix
$paste gene_anno.txt gene_count_matrix.txt gene_matrix.txt >gene_count_fpkm_matrix
gailjjj is offline   Reply With Quote
Old 07-12-2013, 11:01 AM   #11
Baoqing
Member
 
Location: Texas

Join Date: Jan 2013
Posts: 24
Default

Thanks a lot! That worked brilliantly! I should have explored the cummeRbund Package more. I resolved this by using :

samtools view sample_1_name_sorted.bam | htseq-count -i gene_id - ~/Desktop/rnaseq/trimmed/genes.gtf > sample_1.txt

But When I compare the count results I got from the cummeRbund with the result i obtained from htseq-count, they are not entirely the same. Usually have a few counts off compared with one or the other? In other cases, several hundred counts differences occurred. Should that be a problem?

There are also some other discrepancies between the results from htseq-count and cummeRbund, for example
1 i notice that the gene names that displayed in the file generated from cummeRbund is "gene_short_name", the name i used from htseq-count was extracted from genes.gtf file, under the column of "gene_id".
However, i did not find any "gene_short_name" column in the genes.gtf file, there is a column "gene_name" instead, i am assuming you used this column instead?

2 some names are present, some are absent from the files. this might actually be explained by the two different columns of names we were using. I did some pattern match, this seemed indeed the case, some names were missing from the cummeRbund results can always be matched back to the names in the gene_id in the gene.gtf file in the same row! Could you confirm this with me?

Best,

Baoqing

Last edited by Baoqing; 07-12-2013 at 11:33 AM.
Baoqing is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:08 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO