![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
K-mer abundance and mapping score | KillJoy | De novo discovery | 0 | 02-01-2012 11:36 PM |
Query on K-mer using in velvet | ramadatta.88 | Bioinformatics | 2 | 10-04-2011 08:23 PM |
Large K-mer Velvet | NGS_user | De novo discovery | 3 | 06-10-2011 08:59 PM |
remove false positives from less than 17 mer | hpgala | Illumina/Solexa | 0 | 02-26-2011 01:30 PM |
Optimal k-mer and N50? | AronaldJ | De novo discovery | 1 | 12-28-2010 10:03 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: berd Join Date: Dec 2010
Posts: 181
|
![]()
in velvet & trans abyss (de novo assembly), we are using the k-mer approach.
why is it better to "break" each contig into a range of k-mer, instead of regular overlapping? why is it more sensitive? snp? thanks in advance.. |
![]() |
![]() |
![]() |
#2 |
Member
Location: Singapore Join Date: Jan 2009
Posts: 31
|
![]()
you mean the de bruijn approach? supposedly it resolves repeats better than overlapping method (OLC)
|
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: berd Join Date: Dec 2010
Posts: 181
|
![]()
Yes, de bruijn approach.
the structure of the graph is handling repeats(the hash table). in transcriptome there are not so many repeats.. but why to do it on a range of different k-mers? {20..49} |
![]() |
![]() |
![]() |
#4 |
Member
Location: Heidelberg Join Date: Feb 2011
Posts: 69
|
![]()
compared to the overlapping method it needs less RAM when you have a lot of "short" reads as input. A assembler with de brujin graphs is for next genSeq output.
edit: the higher the kmer the higher the less RAM you need, because normally the de brujin graph will be smaller. With kmer 49 the overlap between reeds must be 49-1bp! If you want more details about the algorithms you could get daniel zerbinos phd thesis, it is easy to read and for the understanding it helps a lot. Last edited by Thorondor; 03-07-2011 at 12:34 AM. |
![]() |
![]() |
![]() |
#5 |
Member
Location: Singapore Join Date: Jan 2009
Posts: 31
|
![]()
In transcriptome, specifying different k-mers is applied to accommodate transcripts with different sizes.
|
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: berd Join Date: Dec 2010
Posts: 181
|
![]()
i dont understand.
if i have this contig: AGTCAGTTTGGCCCTTG assume this is the output of solexa. is it all from the same transcript? how the k-mer accommodate with different sizes of this transcript??. |
![]() |
![]() |
![]() |
#7 |
Member
Location: Singapore Join Date: Jan 2009
Posts: 31
|
![]()
in transcriptome, the reads come from many dna transcripts, which is why the assembler uses different k-mer sizes to try to assemble them correctly. Meanwhile, in denovo assembly, the reads come from the whole genomic dna (one big sequence).
As for how exactly different k-mer accommodate transcripts, you might want to read the Oases paper. "Oases: De novo transcriptome assembler for very short reads". |
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: berd Join Date: Dec 2010
Posts: 181
|
![]()
if i have contig in length 50 bp,
how it will help me if i will break it to peices of 19bp with 18bp overlap to know its transcript size? and so on {19..49} fix me if i wrong.. i am not sure that i understood you correctley. did you mean that maybe the size of the current sub-transcript is 19, and if i will leave it size 50bp, i will miss the 19bp? that is why i have to use k=19? |
![]() |
![]() |
![]() |
#9 | |
Senior Member
Location: berd Join Date: Dec 2010
Posts: 181
|
![]() Quote:
if i am using kmer 49, my de brujin graph will be in size 1. but if i am using kmer 19 it will be much bigger... what did you mean when you said it become smaller? (it become smaller just in the number of overlaps.. but in it size it becoming bigger) thanks.. |
|
![]() |
![]() |
![]() |
#10 |
Member
Location: Heidelberg Join Date: Feb 2011
Posts: 69
|
![]()
well not really. You have 1 READ with length 50! But you should think in HIGH numbers and there it's different. ;-)
for a kmer 3 there are 4^3 possibilities of nodes for the de Brujin graph: AAA, AAG, AGG, GGG, GAG, GAC, GCC..... for higher kmers like 49 you have 4^49, normally you never reach the maximum of nodes for such high kmers. So less nodes compared to kmer 19, less overlaps, less junctions => smaller de brujin graph => easier to calculate. Problem is you will miss transcripts with a low coverage because the reads won't overlap with 48bp. |
![]() |
![]() |
![]() |
#11 |
Senior Member
Location: berd Join Date: Dec 2010
Posts: 181
|
![]()
now i am really confused.....
you said: the higher the kmer the less RAM you need, graph become smaller. and now you said: less nodes compared to kmer 19, less overlaps, less junctions => smaller de brujin graph => easier to calculate. so when the graph become smaller? when i have less overlaps, less junctions ,smaller de brujin graph ? |
![]() |
![]() |
![]() |
#12 |
Member
Location: Heidelberg Join Date: Feb 2011
Posts: 69
|
![]()
49 kmer compared to kmer 19. You really should read a bit more about the algorithm. :-/ 19 is a really LOW kmer you normally choose higher kmers but of course this depends on your read length.
|
![]() |
![]() |
![]() |
#13 |
Senior Member
Location: berd Join Date: Dec 2010
Posts: 181
|
![]()
now i understood!
cheers mate! ![]() |
![]() |
![]() |
![]() |
Thread Tools | |
|
|