View Single Post
Old 09-12-2012, 09:49 AM   #4
Senior Member
Location: Cambridge, UK

Join Date: May 2010
Posts: 311

Originally Posted by glados View Post
Yes, by using a reference annotation gtf. How do you suggest I use it? If I have a long list with genes I want to look at. Perhaps I can compare them somehow. I'm new to bioinformatics so it's not intuitive to me yet.
If you want to do it in R, this sample code will read the gtf file and extract the rows matching your list of genes:

## List (vector) of differentially expr. genes
degenes<- c('TNFRSF18', 'WASH7P') 

gtf<- read.table('genes.gtf', stringsAsFactors= FALSE, sep= '\t', quote= '')
gene_id<- sub('.*(gene_name \")', '', gtf$V9, perl= TRUE) ## NOTE: Replace gene_name with the feature to extract (e.g. gene_id, gene_symbol)
gene_id<- sub('\".*', '', gene_id, perl=TRUE)
gtf$gene_id<- gene_id

## All features in the GTF file for each DE gene
degtf<- gtf[gtf$gene_id %in% degenes,]

## Get start and end coordinates for each DE gene
decoords<- data.frame(aggregate(degtf[, c('V1', 'V7', 'V4')], by= list(gene_id= degtf$gene_id), min),
    gene_end= aggregate(degtf$V5, by= list(gene_id= degtf$gene_id), max)$x)
Hope it helps!
dariober is offline   Reply With Quote