SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
cufflinks : analysis comparison with and without a gtf reference file sohnic Bioinformatics 3 07-07-2019 05:40 AM
Best source for GTF file for use with TopHat/Cufflinks sdarko Bioinformatics 17 12-14-2012 11:48 AM
cufflinks won't read my GTF file moriah Bioinformatics 0 08-28-2011 12:31 AM
Cufflinks' computation of FPKM for --GTF and --GTF-guide estimation burt Bioinformatics 0 08-23-2011 11:59 PM
Cufflinks GTF file ECHo Bioinformatics 0 02-15-2010 02:59 AM

Reply
 
Thread Tools
Old 08-22-2011, 01:30 PM   #1
camelbbs
Member
 
Location: United States

Join Date: Jun 2011
Posts: 49
Default how to get a gtf file for cufflinks

Hi,

can i ask how to get a gtf file for tophat or cufflinks?

I just use ucsc table browser to get a gtf file, content is like this:

Code:
chr1	hg19_refGene	start_codon	67000042	67000044	0.000000	+	.	gene_id "NM_032291"; transcript_id "NM_032291";
Is that right? Do I need to add the gene symbol to this table?

Thanks,
Peter
camelbbs is offline   Reply With Quote
Old 08-22-2011, 05:25 PM   #2
ShaunMahony
Member
 
Location: University Park, PA

Join Date: Apr 2008
Posts: 27
Default

I recommend downloading GTF annotation from this page:
http://cufflinks.cbcb.umd.edu/igenomes.html

These files were designed to go with Tophat/Cufflinks and have all the expected fields.
ShaunMahony is offline   Reply With Quote
Old 08-23-2011, 07:29 AM   #3
camelbbs
Member
 
Location: United States

Join Date: Jun 2011
Posts: 49
Default

thanks so much
camelbbs is offline   Reply With Quote
Old 08-23-2011, 02:28 PM   #4
JueFish
Member
 
Location: Connecticut

Join Date: May 2010
Posts: 42
Default

Hi camelbbs,

Just saw your post and thought I'd give you a word of caution. Make sure you take a look at your annotation closely. We just started to play around with the iGenomes stuff, but I'll tell you right now that our usage of different annotations from UCSC (RefSeq, ENSEMBL, and Gencode (which should be pretty much the same as ENSEMBL) (and iGenome) have lead to very different results in different cases. Sort of depends on your question, but make sure that the annotation you are looking at is good for the stuff your most concerned about. One would hope that choice of annotation would be a robust parameter in these types of analysis, but we haven't found that to be the case. In the end, those of us who don't have the time to spend inordinate amount of time vetting these things have to take a close look at them and then make a decision to stick with. Good luck.
JueFish is offline   Reply With Quote
Old 08-23-2011, 02:34 PM   #5
JueFish
Member
 
Location: Connecticut

Join Date: May 2010
Posts: 42
Default

Hi camelbbs,

Just saw your post and thought I'd give you a word of caution. Make sure you take a look at your annotation closely. We just started to play around with the iGenomes stuff, but I'll tell you right now that our usage of different annotations from UCSC (RefSeq, ENSEMBL, and Gencode (which should be pretty much the same as ENSEMBL) (and iGenome) have lead to very different results in different cases. Sort of depends on your question, but make sure that the annotation you are looking at is good for the stuff your most concerned about. One would hope that choice of annotation would be a robust parameter in these types of analysis, but we haven't found that to be the case. In the end, those of us who don't have the time to spend inordinate amount of time vetting these things have to take a close look at them and then make a decision to stick with. Good luck.
JueFish is offline   Reply With Quote
Old 08-23-2011, 09:21 PM   #6
camelbbs
Member
 
Location: United States

Join Date: Jun 2011
Posts: 49
Default

Quote:
Originally Posted by JueFish View Post
Hi camelbbs,

Just saw your post and thought I'd give you a word of caution. Make sure you take a look at your annotation closely. We just started to play around with the iGenomes stuff, but I'll tell you right now that our usage of different annotations from UCSC (RefSeq, ENSEMBL, and Gencode (which should be pretty much the same as ENSEMBL) (and iGenome) have lead to very different results in different cases. Sort of depends on your question, but make sure that the annotation you are looking at is good for the stuff your most concerned about. One would hope that choice of annotation would be a robust parameter in these types of analysis, but we haven't found that to be the case. In the end, those of us who don't have the time to spend inordinate amount of time vetting these things have to take a close look at them and then make a decision to stick with. Good luck.
Thanks juefish,

So would you have some recommend for gtf choice. Which one is proved by most of work. And I am curious how they link the gene annotation to alignment sequences, is it by the coordinates?
camelbbs is offline   Reply With Quote
Old 08-24-2011, 07:35 AM   #7
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Quote:
Originally Posted by camelbbs View Post
Thanks juefish,

So would you have some recommend for gtf choice. Which one is proved by most of work. And I am curious how they link the gene annotation to alignment sequences, is it by the coordinates?
As far as I know GTF/GFF annotations are indeed related to the reference sequences (which is what I presume you mean by 'alignment sequences') solely by coordinate positions. This, of course, means that you need to be very careful to pick and use the GTF version that was created for your reference version.
westerman is offline   Reply With Quote
Old 08-24-2011, 07:56 AM   #8
JueFish
Member
 
Location: Connecticut

Join Date: May 2010
Posts: 42
Default

Well, camelbbs, "work" is an interesting way to put it. I can let you in on what little I know, but I'm still trying to work through these things myself. Someone else out there might have some more info or insight on this issue than myself. We've looked at four different annotations so far: Gencode, Ensembl, RefSeq, and IGenome (which should be a derivative of Ensembl, I think). We haven't really vetted Gencode or IGenome, because Gencode (hypothetically) should be very similar to Ensembl, while IGenome we just found and only briefly ran it through some stuff. So that brings us to RefSeq and Ensembl and I think most people find those two databases generally acceptable for whatever you would be interested in. Just from some rough calculations Ensembl appears to have about twice as many nucleotides annotated as opposed to RefSeq. This is likely because of a higher level of isoform annotation in Ensembl, so some nucleotides may be doubly annotated (NOTE: when you download the Ensembl ensGene gtf from UCSC and implement it into cufflinks, you end up with the transcript IDs for you genes not the gene IDs - can be very important to you depending on what you want). Just open up a chromosome in hg19 in UCSC genome browser and you can see how different they look. As to why they are different, again I'm not incredibly knowledgeable here, but each of these methods uses slightly different evidence to add to their respective databases. Mostly, I think they probably vary in two ways: 1) the computational methods they use for predicted gene tracks and 2) in the curation of these database. In the past, RefSeq was more submission based, so the evidence requirements would have appeared to be higher, but that all conjecture on my part. In the end, we've bascially come to view RefSeq as more conservative and gene-oriented and Ensembl as more computationally developed and more transcript-oriented. Anyone else out there have some better insight than me? I would love to hear it.

To answer your other question, yes, coordinates, chromosome, and ID determine genomic location and annotation.
JueFish is offline   Reply With Quote
Old 08-24-2011, 08:00 AM   #9
JueFish
Member
 
Location: Connecticut

Join Date: May 2010
Posts: 42
Default

Well, camelbbs, "work" is an interesting way to put it. I can let you in on what little I know, but I'm still trying to work through these things myself. Someone else out there might have some more info or insight on this issue than myself. We've looked at four different annotations so far: Gencode, Ensembl, RefSeq, and IGenome (which should be a derivative of Ensembl, I think). We haven't really vetted Gencode or IGenome, because Gencode (hypothetically) should be very similar to Ensembl, while IGenome we just found and only briefly ran it through some stuff. So that brings us to RefSeq and Ensembl and I think most people find those two databases generally acceptable for whatever you would be interested in. Just from some rough calculations Ensembl appears to have about twice as many nucleotides annotated as opposed to RefSeq. This is likely because of a higher level of isoform annotation in Ensembl, so some nucleotides may be doubly annotated (NOTE: when you download the Ensembl ensGene gtf from UCSC and implement it into cufflinks, you end up with the transcript IDs for you genes not the gene IDs - can be very important to you depending on what you want). Just open up a chromosome in hg19 in UCSC genome browser and you can see how different they look. As to why they are different, again I'm not incredibly knowledgeable here, but each of these methods uses slightly different evidence to add to their respective databases. Mostly, I think they probably vary in two ways: 1) the computational methods they use for predicted gene tracks and 2) in the curation of these database. In the past, RefSeq was more submission based, so the evidence requirements would have appeared to be higher, but that all conjecture on my part. In the end, we've bascially come to view RefSeq as more conservative and gene-oriented and Ensembl as more computationally developed and more transcript-oriented. Anyone else out there have some better insight than me? I would love to hear it.

To answer your other question, yes, coordinates, chromosome, and ID determine genomic location and annotation.
JueFish is offline   Reply With Quote
Old 07-07-2019, 05:30 AM   #10
brojee
Member
 
Location: Bhopal

Join Date: Jul 2019
Posts: 19
Default

Just observed your post and thought I'd give you an expression of alert. Ensure you investigate your comment intently. We just began to play around with the iGenomes stuff, yet I'll disclose to you right now that our use of various explanations from UCSC (RefSeq, ENSEMBL, and Gencode (which ought to be essentially equivalent to ENSEMBL) (and iGenome) have lead to altogether different outcomes in various cases. Kind of relies upon your inquiry, yet ensure that the comment you are taking a gander at is useful for the stuff your most worried about.
One would trust that decision of explanation would be a powerful parameter in these sorts of investigation, however we haven't observed that to be the situation. At last, those of us who don't have room schedule-wise to invest exorbitant measure of energy confirming these things need to investigate them and after that settle on a choice to stay with. Good karma.
brojee is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:56 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO