SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   DESeq - get chromosome (http://seqanswers.com/forums/showthread.php?t=23233)

glados 09-11-2012 02:09 AM

DESeq - get chromosome
 
Hello all.

I've been searching for this question for hours now, and I give up and post it here instead. It should not be very complicated.

I have a long list of sig. genes in DESeq I would like to look at in more detail where the genes are located in the genome. How do I receive chromosome number (possibly location) for the genes in DESeq? I know how to get this information in Cuffdiff, but in DESeq I wonder if it's possible.

Simon Anders 09-12-2012 02:46 AM

How have you made the count table that you used as input for DESeq? I suppose, by using a GTF file. And that one contains all the information you need.

glados 09-12-2012 02:55 AM

Yes, by using a reference annotation gtf. How do you suggest I use it? If I have a long list with genes I want to look at. Perhaps I can compare them somehow. I'm new to bioinformatics so it's not intuitive to me yet.

dariober 09-12-2012 09:49 AM

Quote:

Originally Posted by glados (Post 83816)
Yes, by using a reference annotation gtf. How do you suggest I use it? If I have a long list with genes I want to look at. Perhaps I can compare them somehow. I'm new to bioinformatics so it's not intuitive to me yet.

If you want to do it in R, this sample code will read the gtf file and extract the rows matching your list of genes:

Code:

## List (vector) of differentially expr. genes
degenes<- c('TNFRSF18', 'WASH7P')

gtf<- read.table('genes.gtf', stringsAsFactors= FALSE, sep= '\t', quote= '')
gene_id<- sub('.*(gene_name \")', '', gtf$V9, perl= TRUE) ## NOTE: Replace gene_name with the feature to extract (e.g. gene_id, gene_symbol)
gene_id<- sub('\".*', '', gene_id, perl=TRUE)
gtf$gene_id<- gene_id

## All features in the GTF file for each DE gene
degtf<- gtf[gtf$gene_id %in% degenes,]

## Get start and end coordinates for each DE gene
decoords<- data.frame(aggregate(degtf[, c('V1', 'V7', 'V4')], by= list(gene_id= degtf$gene_id), min),
    gene_end= aggregate(degtf$V5, by= list(gene_id= degtf$gene_id), max)$x)

Hope it helps!
Dario

glados 09-13-2012 08:51 AM

Thank you dariober for contributing with your code! It works great!


All times are GMT -8. The time now is 03:35 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.