SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
collapsing multiple genes into a meta-gene. tirohia Bioinformatics 1 08-17-2015 06:06 AM
Getting Gene positions (based on hapmap) using their names ramouna General 0 10-16-2013 01:25 AM
aviable tools for gene collapsing tujchl Bioinformatics 0 02-26-2012 04:22 AM
converting UCSC gene names to Hugo Symbol names efoss Bioinformatics 2 07-16-2011 12:41 PM

Reply
 
Thread Tools
Old 03-22-2017, 01:41 AM   #1
heso
Member
 
Location: Sweden

Join Date: May 2014
Posts: 19
Default Collapsing gene names based on partial string overlap

I have a long list of gene names with corresponding read counts. I'm mainly interested that the tRNAs with an identical anticodon are collapsed and the sum of their read counts is calculated.

Therefore, something like:
collapse names in lines containing "tRNA" based on the perfect match of the last 6 characters in the gene name (e.g. GluCTC) and sum up corresponding read counts. The new gene name can be "tRNA-" followed by the aforementioned 6 characters (e.g. tRNA-GluCTC)

The input (tab-delimited) looks like this:
Code:
Gm26624	                5761
Bre                     5658
chr10.tRNA90-GluCTC	5573
chr3.tRNA303-GluCTC	5558
chr1.tRNA709-GluCTC	5489
chr1.tRNA706-GlyGCC	4891
chr1.tRNA704-GlyGCC	4838
chr1.tRNA702-GlyGCC	4796
chr13.tRNA110-GlyGCC	4753
Gm13247	                4105
Rny3	                3736
chr1.tRNA485-LysTTT	3548
Rn7s2	                3385
chr19.tRNA107-LysTTT	3363
Any ideas how to do this? Awk?
heso is offline   Reply With Quote
Reply

Tags
collapse names

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:36 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO