I have run CD-HIT on a set of DNA sequences (171 million). Now they are nicely clustered in a file ... but how can I efficiently retrieve all DNA sequences in clusters in particular frequency classes?
There is a nice utility in CD-HIT (plot_len1.pl) which gives me a table with sequence frequencies for various length classes. So all the frequency information is in the .clstr file, but how do I get only the information out that I want... and how do I link that then back to the original sequences? Lets say I want to retrieve all sequences that occur from 10 to 19 times in my input dataset?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
Latest Articles
Collapse
-
by seqadmin
The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...-
Channel: Articles
08-27-2024, 04:44 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 08:02 AM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
Yesterday, 08:02 AM
|
||
Started by seqadmin, 09-03-2024, 08:30 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
09-03-2024, 08:30 AM
|
||
Started by seqadmin, 08-27-2024, 04:40 AM
|
0 responses
21 views
0 likes
|
Last Post
by seqadmin
08-27-2024, 04:40 AM
|
||
New Single-Molecule Sequencing Platform Introduces Advanced Features for High-Throughput Genomics
by seqadmin
Started by seqadmin, 08-22-2024, 05:00 AM
|
0 responses
360 views
0 likes
|
Last Post
by seqadmin
08-22-2024, 05:00 AM
|