Seqanswers Leaderboard Ad

**Melissa** · 12-10-2019, 05:32 AM

What I will do is to write my own script to
1) blastn the sequences against itself (hopefully your sequences are long enough to justify using blast)
2) filter the results to remove blastn results of the same sequences and min e-value
3) Do single linkage clustering based on the blastn results
4) Choose the longest sequence for each cluster

There should be an easier way by using k-mer?!

**yzzhang** · 12-29-2019, 10:01 PM

have you tried CD-hit?

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Recover only longest version of sequence from multiple sequence fasta file - help

Comment

Comment

Latest Articles

ad_right_rmr

News