Hi guys. I'm trying to understand the paper "Velvet: Algorithms for de novo short read assembly using de Bruijn graphs" http://genome.cshlp.org/content/18/5/821.long
As a student of the Uni I'll be preparing slides for presenting the afore-mentioned paper in the class. I'd very appreciate if you could answer few questions if you have any knowledge in velvet assembler and de Bruijn graphs as I just simply can't understand the main "Construction" part. I also couldn't understand it from the .ppt presentation file on the web-site of the tool.
1) They wrote "the reads are first hashed according to a predefined k-mer length." What did they mean by those words? Did they mean that for each k-mer of that read the hash table contains a record where you store the id of the first read having this k-mer and the position of that k-mer in the read? So whenever the k-mer is encountered for the second time you just do nothing and leave the hash table as it is, since it already contains that k-mer?
2) What did they mean by the words "the ordered set of original k-mers of that read is cut each time an overlap with another read begins or ends"? Did they mean the total overlap of the k-mer with k first or last characters of some other read? What is meant by "cutting" the set? Could you please show an example?
3) When we construct the nodes of the graph, how is determining of the arcs between them being done?
Thanks in advance and happy holidays!
As a student of the Uni I'll be preparing slides for presenting the afore-mentioned paper in the class. I'd very appreciate if you could answer few questions if you have any knowledge in velvet assembler and de Bruijn graphs as I just simply can't understand the main "Construction" part. I also couldn't understand it from the .ppt presentation file on the web-site of the tool.
1) They wrote "the reads are first hashed according to a predefined k-mer length." What did they mean by those words? Did they mean that for each k-mer of that read the hash table contains a record where you store the id of the first read having this k-mer and the position of that k-mer in the read? So whenever the k-mer is encountered for the second time you just do nothing and leave the hash table as it is, since it already contains that k-mer?
2) What did they mean by the words "the ordered set of original k-mers of that read is cut each time an overlap with another read begins or ends"? Did they mean the total overlap of the k-mer with k first or last characters of some other read? What is meant by "cutting" the set? Could you please show an example?
3) When we construct the nodes of the graph, how is determining of the arcs between them being done?
Thanks in advance and happy holidays!
Comment