SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Annotate gene list (http://seqanswers.com/forums/showthread.php?t=62605)

BM7 09-10-2015 05:14 AM

Annotate gene list
 
I would like to annotate the results output file from Desq2 so it contains gene names and symbols. The RNA-seq count file I have used comes from Dexseq and contains ensembl gene IDs:
ENSMUSG00000000001:001
ENSMUSG00000000001:002
ENSMUSG00000000001:003
etc.
These refer to the the different exons of the gene.
I cannot annotate the result file because it contains the different exons. So how can I combine or merge the different exon counts for the same gene into one count for the gene?
Thanks in advance

cmccabe 09-11-2015 06:46 AM

I am not sure I understand completely, but if you have a file

Code:

ENSMUSG00000000001:001
ENSMUSG00000000001:002
ENSMUSG00000000001:003
ENSMUSG00000000002:001
ENSMUSG00000000002:002
ENSMUSG00000000002:002


you could use:

Code:

awk -F':' -v OFS='\t' '{sum[$1]+=$2} END{for (key in sum) print key, sum[key]}' file.txt
ENSMUSG00000000001      6
ENSMUSG00000000002      5

Hope this helps.


All times are GMT -8. The time now is 09:03 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.