SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to download gene annotation from NCBI? jgarbe Bioinformatics 9 01-14-2015 11:26 AM
SpliceMap Gene annotations file for hg19 trickytank Bioinformatics 0 01-18-2011 05:44 PM
where to download hg19? cliff Bioinformatics 9 07-19-2010 09:30 AM
TopHat GFF3 for UCSC Gene HG19 Bio.X2Y Bioinformatics 5 06-07-2010 01:43 PM
download all gene sequences sinakv Bioinformatics 5 01-28-2010 02:19 AM

Reply
 
Thread Tools
Old 08-20-2012, 09:14 AM   #1
slowsmile
Member
 
Location: long island

Join Date: May 2011
Posts: 22
Default noiseq, where to download hg19 gene feature length?

Dear all
This might be a very native question but I really want to find out the answer ASAP.
I am trying to use Noiseq tool on human genomic data and would like to supply the function with the file of gene feature length. (the initial hg19 gtf file I downloaded was from USCS) and the gene counts are in gene symbol format. i.e.
"A1BG"
"A1BG-AS1"
"A1CF"

Does anyone which site I can use to directly download the gene length table, such as
Gene-ID Length
"A1BG" 2543
"A1BG-AS1" 248
"A1CF" 669

*I fabricated the length in the example just to show what format I am looking for.
Also, does any noiseq user believe the gene length information is really important?

Thanks a lot
slowsmile is offline   Reply With Quote
Old 08-21-2012, 04:44 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,576
Default

You should be able to use "Table Browser" from UCSC or BioMart tool from Ensembl for getting this info. See example filter below. I am using "Uniprot GeneName" filter for this example. Substitute with the appropriate Gene Name filter you are looking for.



You will need to subtract the "start" value from the "end" to get the length. You can easily export the table in "csv" format from BioMart and then edit in excel.
GenoMax is offline   Reply With Quote
Old 01-22-2014, 01:59 AM   #3
eastasiasnow
Junior Member
 
Location: China

Join Date: Jan 2014
Posts: 8
Default

Quote:
Originally Posted by GenoMax View Post
You should be able to use "Table Browser" from UCSC or BioMart tool from Ensembl for getting this info. See example filter below. I am using "Uniprot GeneName" filter for this example. Substitute with the appropriate Gene Name filter you are looking for.
yeah, thanks. I got nearly all my necessary features.

Quote:
Originally Posted by GenoMax View Post
You will need to subtract the "start" value from the "end" to get the length. You can easily export the table in "csv" format from BioMart and then edit in excel.
As for "feature length", I am a little confused. Your suggestion above is ok for "gene feature length", but what about "protein_coding feature" or "transcript feature"? Your method will include length of introns. In a rna-seq analysis with NOISeq, the feature length we provided doesn't mean the length of corresponding transcript? And if I want the transcript length feature, how do I get the file of length feature for all transcripts?

Last edited by eastasiasnow; 01-22-2014 at 02:01 AM. Reason: mis-meaning in my sencentes
eastasiasnow is offline   Reply With Quote
Old 01-22-2014, 04:20 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,576
Default

Under the attributes filter (instead of the gene start/end as in example above), under "sequence", you can either select "cDNA sequence" (to get all transcripts) or "exon sequence" (to get all possible exons) for a specific gene. Depending on the feature you choose "CDS length" (Exon features) or "Transcript Start/End" (Transcript Information) should give you the length in the FASTA headers. You will have to parse those out by a simple "grep".

OR

Under the attributes --> select "Features" and then the following.
Attached Images
File Type: png transcript.PNG (27.8 KB, 5 views)

Last edited by GenoMax; 01-22-2014 at 04:26 AM.
GenoMax is offline   Reply With Quote
Old 01-22-2014, 06:43 AM   #5
eastasiasnow
Junior Member
 
Location: China

Join Date: Jan 2014
Posts: 8
Default

@GenoMax
Thanks, This should work. I found a script from this post:
http://seqanswers.com/forums/showthread.php?t=4914

I made small changes to that awk script for applying it to the latest ensemble gff file of zea mays AGPv3.21, it works fine so far, and I can get the length of all transcrits.

Last edited by eastasiasnow; 01-22-2014 at 06:54 AM. Reason: grammer errors
eastasiasnow is offline   Reply With Quote
Reply

Tags
gene length, noiseq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:43 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO