View Single Post
Old 01-09-2014, 01:11 PM   #2
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

Instead of trying to pre-select kmers you deem potentially interesting (which is just another way to say pre-biasing your result) you should look at using digital normalization of your data. Digital normalization effectively reduces the input data size and thus the required computational requirements by reducing the input of high abundance kmers. Logically, the 1000th copy of the same kmer does not add any new information to an de novo transcript assembly so you can safely remove copies of kmers above a certain threshold without adversely effecting your assembly.

Trinity has a digital normalization module built in, described here.

There is also digital normalization functionality in Titus Brown's khmer suite. Some links:

http://ged.msu.edu/angus/diginorm-2012/tutorial.html
http://ivory.idyll.org/blog/what-is-diginorm.html
kmcarr is offline   Reply With Quote