Degenerate collapsing of specific motifs

drhicks

Junior Member

Join Date: Sep 2014

Posts: 1
- Share
- Tweet
#1

Degenerate collapsing of specific motifs

09-27-2014, 10:41 AM

Hi,

Some background:
I scanned the Arabidopsis thaliana genome for unique k-mer sites, in this instance, 20-mer sites. I started with an incredibly ambiguous 20-mer, NHNHNHNHNHNHNHNHNHGG, which contains ~5x10^9 possible sequences. In my scanning of the genome, I found ~90,000 unique sites.

Problem:
I want to somehow collapse these 90,000 unique 20-mers into a smaller set of ambiguous (IUPAC) sequences (ideally 100 or less total sequences) containing all 90,000 sequences, but not including the other 10^9 sequences (or as many as possible) originally contained in my NHNHNHNHNHNHNHNHNHGG motif.

I have no idea how to go about solving this problem with a script, or any available tools to do this.

If anyone can give me any advice, thanks a bunch!

Best,
Derrick
Tags: None

Previous template Next

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad