View Single Post
Old 01-09-2014, 10:15 AM   #1
Junior Member
Location: Maryland

Join Date: Dec 2011
Posts: 3
Default De Novo Transcriptome Assembly from Kmers only


I have RNA-Seq data from 8 time points comparing study to control. Each sample has been processed one sample per lane. I do not trust the reference sequence. What I'm interested in is the most significantly changing transcripts.


The kmers that are associated with each transcript should change in a coherent manner as the transcript expression changes. Comparing unique kmers first and extracting the most significantly changing kmers should enrich the transcriptome for the genes that are changing the most.

I have too much data to do a de-novo assembly. It's quite good quality and even eliminating low frequency reads (kmers) I still have too much data to feed to an assembler like Trinity.

Assuming I can select a much smaller set of kmers that are significantly changing. How would I feed the resulting set to an assembler to generate a transcriptome of enriched genes?

I don't mind if the contigs that are created from this process results in partial exons associated with the genes that are changing. I can identify them later.


1) How would you process a set of kmers to feed to an assembler resulting in a fasta file of contigs

2) If your answer suggests mapping the kmer back to the source read -- can you also suggest how you would do that efficiently (realistically in a decent time frame)

All thoughts are welcome

Joe Carl
joeseki is offline   Reply With Quote