Collapsing gene names based on partial string overlap

heso

Member

Join Date: May 2014

Posts: 19
- Share
- Tweet
#1

Collapsing gene names based on partial string overlap

03-22-2017, 01:41 AM

I have a long list of gene names with corresponding read counts. I'm mainly interested that the tRNAs with an identical anticodon are collapsed and the sum of their read counts is calculated.

Therefore, something like:
collapse names in lines containing "tRNA" based on the perfect match of the last 6 characters in the gene name (e.g. GluCTC) and sum up corresponding read counts. The new gene name can be "tRNA-" followed by the aforementioned 6 characters (e.g. tRNA-GluCTC)

The input (tab-delimited) looks like this:

Code:

Gm26624 5761 Bre 5658 chr10.tRNA90-GluCTC 5573 chr3.tRNA303-GluCTC 5558 chr1.tRNA709-GluCTC 5489 chr1.tRNA706-GlyGCC 4891 chr1.tRNA704-GlyGCC 4838 chr1.tRNA702-GlyGCC 4796 chr13.tRNA110-GlyGCC 4753 Gm13247 4105 Rny3 3736 chr1.tRNA485-LysTTT 3548 Rn7s2 3385 chr19.tRNA107-LysTTT 3363

Any ideas how to do this? Awk?
Tags: collapse names

Previous template Next

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad