SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   noiseq, where to download hg19 gene feature length? (http://seqanswers.com/forums/showthread.php?t=22647)

slowsmile 08-20-2012 08:14 AM

noiseq, where to download hg19 gene feature length?
 
Dear all
This might be a very native question but I really want to find out the answer ASAP.
I am trying to use Noiseq tool on human genomic data and would like to supply the function with the file of gene feature length. (the initial hg19 gtf file I downloaded was from USCS) and the gene counts are in gene symbol format. i.e.
"A1BG"
"A1BG-AS1"
"A1CF"

Does anyone which site I can use to directly download the gene length table, such as
Gene-ID Length
"A1BG" 2543
"A1BG-AS1" 248
"A1CF" 669

*I fabricated the length in the example just to show what format I am looking for.
Also, does any noiseq user believe the gene length information is really important?

Thanks a lot

GenoMax 08-21-2012 03:44 AM

You should be able to use "Table Browser" from UCSC or BioMart tool from Ensembl for getting this info. See example filter below. I am using "Uniprot GeneName" filter for this example. Substitute with the appropriate Gene Name filter you are looking for.

http://i.imgur.com/Hx1eA.png

You will need to subtract the "start" value from the "end" to get the length. You can easily export the table in "csv" format from BioMart and then edit in excel.

eastasiasnow 01-22-2014 12:59 AM

Quote:

Originally Posted by GenoMax (Post 81999)
You should be able to use "Table Browser" from UCSC or BioMart tool from Ensembl for getting this info. See example filter below. I am using "Uniprot GeneName" filter for this example. Substitute with the appropriate Gene Name filter you are looking for.

yeah, thanks. I got nearly all my necessary features.

Quote:

Originally Posted by GenoMax (Post 81999)
You will need to subtract the "start" value from the "end" to get the length. You can easily export the table in "csv" format from BioMart and then edit in excel.

As for "feature length", I am a little confused. Your suggestion above is ok for "gene feature length", but what about "protein_coding feature" or "transcript feature"? Your method will include length of introns. In a rna-seq analysis with NOISeq, the feature length we provided doesn't mean the length of corresponding transcript? And if I want the transcript length feature, how do I get the file of length feature for all transcripts?

GenoMax 01-22-2014 03:20 AM

1 Attachment(s)
Under the attributes filter (instead of the gene start/end as in example above), under "sequence", you can either select "cDNA sequence" (to get all transcripts) or "exon sequence" (to get all possible exons) for a specific gene. Depending on the feature you choose "CDS length" (Exon features) or "Transcript Start/End" (Transcript Information) should give you the length in the FASTA headers. You will have to parse those out by a simple "grep".

OR

Under the attributes --> select "Features" and then the following.

eastasiasnow 01-22-2014 05:43 AM

@GenoMax
Thanks, This should work. I found a script from this post:
http://seqanswers.com/forums/showthread.php?t=4914

I made small changes to that awk script for applying it to the latest ensemble gff file of zea mays AGPv3.21, it works fine so far, and I can get the length of all transcrits.


All times are GMT -8. The time now is 09:05 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.