SEQanswers What does Unique and Distinct K-mers mean?
 Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

 Similar Threads Thread Thread Starter Forum Replies Last Post balaena Bioinformatics 1 04-27-2016 03:47 AM sigma Bioinformatics 9 05-25-2012 04:22 AM charltt Bioinformatics 2 06-08-2011 12:03 PM sridharacharya RNA Sequencing 2 09-20-2010 06:39 AM

 07-04-2017, 02:12 AM #1 Elakkiya Junior Member   Location: India Join Date: Jul 2017 Posts: 4 What does Unique and Distinct K-mers mean? Hello! I am new to bioinformatics.I have generated the k-mers and unique k-mers from the reads.What does distinct k-mers mean and how it differ from unique k-mers.Can anyone help me clarify with an example pls.
 07-05-2017, 11:46 AM #2 Brian Bushnell Super Moderator   Location: Walnut Creek, CA Join Date: Jan 2014 Posts: 2,695 Consider "AAAAAA". When counting 3-mers, there are 4 of them. But there is only one unique 3-mer: "AAA".
 07-05-2017, 09:01 PM #3 Elakkiya Junior Member   Location: India Join Date: Jul 2017 Posts: 4 distinct k-mer Thanks Brain! But what is distinct k-mer mean?how it differ from unique k-mers. Thanks, Elakkiya
07-05-2017, 09:52 PM   #4
Brian Bushnell
Super Moderator

Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695

Quote:
 Originally Posted by Elakkiya Thanks Brain! But what is distinct k-mer mean?how it differ from unique k-mers. Thanks, Elakkiya
I don't use that term because I find it confusing. But I assume the authors mean, by "distinct kmers", the total number of counted kmers, whether unique or not. In my example, that would mean distinct kmers are 4 and unique kmers are 1. I discourage using the term "distinct kmers" since "distinct" is essentially synonymous with "unique", just less precise in this case. I suggest you call unique kmers "unique kmers". And I suggest you call the total number of kmers counted (whether unique or not) "total kmers" or "counted kmers" or "total kmers counted". But never call non-unique kmers "distinct kmers", since that's misleading. If two kmers are identical, nothing distinguishes them. Therefore, neither is unique from the other. And, by definition, they cannot be distinct while being identical. I'm not sure what software you are using that defines "unique kmers" and "distinct kmers" differently, but that definition is misleading and not useful.

I think that probably the authors think of what they call "distinct kmers" as "total kmers counted" and "unique kmers" as "unique kmers". But I suggest you contact them and inquire.

Last edited by Brian Bushnell; 07-05-2017 at 10:03 PM.

 07-05-2017, 10:47 PM #5 Elakkiya Junior Member   Location: India Join Date: Jul 2017 Posts: 4 Thanks for the Clarification Brian! I have contacted the author for the clarity. They mentioned in the table: total k-mers,unique k-mers,distinct k-mers.By seeing that i got confused.Let us wait for the reply from the authors. Thanks, Elakkiya
07-05-2017, 11:32 PM   #6
Brian Bushnell
Super Moderator

Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695

Quote:
 Originally Posted by Elakkiya Thanks for the Clarification Brian! I have contacted the author for the clarity. They mentioned in the table: total k-mers,unique k-mers,distinct k-mers.By seeing that i got confused.Let us wait for the reply from the authors. Thanks, Elakkiya
I can only think of two kmer counts... total, and unique. So, it seems like they may have a new category that I have not heard of, or there might be a misunderstanding. Please post the results of your investigation!

07-05-2017, 11:32 PM   #7
Brian Bushnell
Super Moderator

Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695

Quote:
 Originally Posted by Elakkiya Thanks for the Clarification Brian! I have contacted the author for the clarity. They mentioned in the table: total k-mers,unique k-mers,distinct k-mers.By seeing that i got confused.Let us wait for the reply from the authors. Thanks, Elakkiya
I can only think of two kmer counts... total, and unique. So, it seems like they may have a new category that I have not heard of, or there might be a misunderstanding. Please post the results of your investigation!

 07-07-2017, 03:50 AM #8 Elakkiya Junior Member   Location: India Join Date: Jul 2017 Posts: 4 Hi Brian The author replies as "Distinct k-mers should be count of k-mers that occur at least once in reads/data". k-mers: AAA, AAA, CCA, CCC, CCC, GGG, GGG, GGG, TTT total k-mers: 9x unique k-mers: 2x (CCA, TTT) distinct k-mers: 5x (AAA, CCA, CCC, GGG, TTT) Thanks, Elakkiya
 07-07-2017, 09:49 AM #9 Brian Bushnell Super Moderator   Location: Walnut Creek, CA Join Date: Jan 2014 Posts: 2,695 Oh, I see. I normally use the term "unique kmers" where he uses "distinct kmers", and "singleton kmers" or "depth-1 kmers" where he uses "unique kmers".