SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Annotate contigs with BLAST hit names; remove contigs with no hit Bueller_007 Bioinformatics 10 02-27-2013 10:22 AM
Retrieving reads with SNPs Gavin_Sherlock Genomic Resequencing 2 10-10-2012 08:54 AM
clusters per tile to clusters mm2 niceday General 3 07-27-2011 06:35 AM
Retrieving mismatch details from tophat traeki Bioinformatics 2 05-24-2011 11:03 AM
retrieving reads from SRA - lack of documentation NGSfan General 1 06-22-2010 06:09 AM

Reply
 
Thread Tools
Old 10-03-2012, 02:00 AM   #1
Tectona
Member
 
Location: Thailand

Join Date: Feb 2011
Posts: 11
Default Retrieving sequences from CD-HIT clusters

I have run CD-HIT on a set of DNA sequences (171 million). Now they are nicely clustered in a file ... but how can I efficiently retrieve all DNA sequences in clusters in particular frequency classes?
There is a nice utility in CD-HIT (plot_len1.pl) which gives me a table with sequence frequencies for various length classes. So all the frequency information is in the .clstr file, but how do I get only the information out that I want... and how do I link that then back to the original sequences? Lets say I want to retrieve all sequences that occur from 10 to 19 times in my input dataset?
Tectona is offline   Reply With Quote
Reply

Tags
cd-hit

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:38 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO