SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Clustering annotated sequences based on their GO terms pauboher Bioinformatics 7 04-29-2013 10:06 AM
similarity matrix of sequence alignment huda RNA Sequencing 2 03-23-2013 08:29 AM
PubMed: Graph-based clustering and characterization of repetitive sequences in next-g Newsbot! Literature Watch 1 10-31-2012 03:10 AM
Sequence based clustering to divide into groups atulkakrana Bioinformatics 2 07-19-2012 02:55 PM
Create one sequence based on overlapping primer sequences in amplicon ketan_bnf Bioinformatics 2 09-15-2011 12:33 AM

Reply
 
Thread Tools
Old 07-09-2013, 12:53 AM   #1
Shishir
Member
 
Location: Germany

Join Date: Nov 2012
Posts: 22
Default Clustering sequences based on sequence similarity

I have 8000 protein sequences that I want to cluster based on similarity (not identity) and select the longest representative sequence from each cluster. I checked several tools like HiFix, SiliX, ClusTR but could not find the optimal solution. I want to do clustering as like CD-Hit does to reduce dataset but based on sequence similarity rather that sequence identity.
Shishir is offline   Reply With Quote
Old 07-09-2013, 02:47 AM   #2
rhinoceros
Senior Member
 
Location: sub-surface moon base

Join Date: Apr 2013
Posts: 372
Default

Perhaps USEARCH or if you want something much more complicated, OrthoMCL..

Last edited by rhinoceros; 07-09-2013 at 02:58 AM.
rhinoceros is offline   Reply With Quote
Old 07-09-2013, 03:07 AM   #3
JackieBadger
Senior Member
 
Location: Halifax, Nova Scotia

Join Date: Mar 2009
Posts: 381
Default

So you want to group by functional similarity?
You can do this based on physio-chemical properties of the amino acids. You translate each amino acid into 5 different metrics and then use Discriminant Analysis of Principle Components to cluster based on these properties.Described here:http://www.biomedcentral.com/1471-2148/12/68

I can provide you with file formats/tips if needs be

Last edited by JackieBadger; 07-09-2013 at 03:55 AM.
JackieBadger is offline   Reply With Quote
Old 07-11-2013, 01:12 AM   #4
Shishir
Member
 
Location: Germany

Join Date: Nov 2012
Posts: 22
Default

Thanks, it seems interesting and I planned to use it later as currently I am not focused on functional similarity!
Quote:
Originally Posted by JackieBadger View Post
So you want to group by functional similarity?
You can do this based on physio-chemical properties of the amino acids. You translate each amino acid into 5 different metrics and then use Discriminant Analysis of Principle Components to cluster based on these properties.Described here:http://www.biomedcentral.com/1471-2148/12/68
I can provide you with file formats/tips if needs be
Shishir is offline   Reply With Quote
Old 07-11-2013, 01:13 AM   #5
Shishir
Member
 
Location: Germany

Join Date: Nov 2012
Posts: 22
Default

Thanks for the reply!

Quote:
Originally Posted by rhinoceros View Post
Perhaps USEARCH or if you want something much more complicated, OrthoMCL..
Shishir is offline   Reply With Quote
Reply

Tags
bioinformatics, clustering, orthologues, sequence analysis, sequence comparison

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:33 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO