SEQanswers

Go Back   SEQanswers > Applications Forums > De novo discovery



Similar Threads
Thread Thread Starter Forum Replies Last Post
K-mer abundance and mapping score KillJoy De novo discovery 0 02-01-2012 11:36 PM
Query on K-mer using in velvet ramadatta.88 Bioinformatics 2 10-04-2011 08:23 PM
Large K-mer Velvet NGS_user De novo discovery 3 06-10-2011 08:59 PM
remove false positives from less than 17 mer hpgala Illumina/Solexa 0 02-26-2011 01:30 PM
Optimal k-mer and N50? AronaldJ De novo discovery 1 12-28-2010 10:03 AM

Reply
 
Thread Tools
Old 03-06-2011, 11:55 PM   #1
papori
Senior Member
 
Location: berd

Join Date: Dec 2010
Posts: 181
Default Why using k-mer?

in velvet & trans abyss (de novo assembly), we are using the k-mer approach.

why is it better to "break" each contig into a range of k-mer, instead of regular overlapping?
why is it more sensitive? snp?

thanks in advance..
papori is offline   Reply With Quote
Old 03-07-2011, 12:16 AM   #2
rwenang
Member
 
Location: Singapore

Join Date: Jan 2009
Posts: 31
Default

you mean the de bruijn approach? supposedly it resolves repeats better than overlapping method (OLC)
rwenang is offline   Reply With Quote
Old 03-07-2011, 12:21 AM   #3
papori
Senior Member
 
Location: berd

Join Date: Dec 2010
Posts: 181
Default

Yes, de bruijn approach.
the structure of the graph is handling repeats(the hash table).
in transcriptome there are not so many repeats..
but why to do it on a range of different k-mers? {20..49}
papori is offline   Reply With Quote
Old 03-07-2011, 12:29 AM   #4
Thorondor
Member
 
Location: Heidelberg

Join Date: Feb 2011
Posts: 69
Default

compared to the overlapping method it needs less RAM when you have a lot of "short" reads as input. A assembler with de brujin graphs is for next genSeq output.

edit: the higher the kmer the higher the less RAM you need, because normally the de brujin graph will be smaller. With kmer 49 the overlap between reeds must be 49-1bp! If you want more details about the algorithms you could get daniel zerbinos phd thesis, it is easy to read and for the understanding it helps a lot.

Last edited by Thorondor; 03-07-2011 at 12:34 AM.
Thorondor is offline   Reply With Quote
Old 03-07-2011, 12:42 AM   #5
rwenang
Member
 
Location: Singapore

Join Date: Jan 2009
Posts: 31
Default

In transcriptome, specifying different k-mers is applied to accommodate transcripts with different sizes.
rwenang is offline   Reply With Quote
Old 03-07-2011, 12:52 AM   #6
papori
Senior Member
 
Location: berd

Join Date: Dec 2010
Posts: 181
Default

i dont understand.
if i have this contig:
AGTCAGTTTGGCCCTTG
assume this is the output of solexa.

is it all from the same transcript?

how the k-mer accommodate with different sizes of this transcript??.
papori is offline   Reply With Quote
Old 03-07-2011, 01:14 AM   #7
rwenang
Member
 
Location: Singapore

Join Date: Jan 2009
Posts: 31
Default

in transcriptome, the reads come from many dna transcripts, which is why the assembler uses different k-mer sizes to try to assemble them correctly. Meanwhile, in denovo assembly, the reads come from the whole genomic dna (one big sequence).

As for how exactly different k-mer accommodate transcripts, you might want to read the Oases paper. "Oases: De novo transcriptome assembler for very short reads".
rwenang is offline   Reply With Quote
Old 03-07-2011, 01:38 AM   #8
papori
Senior Member
 
Location: berd

Join Date: Dec 2010
Posts: 181
Default

if i have contig in length 50 bp,
how it will help me if i will break it to peices of 19bp with 18bp overlap to know its transcript size?
and so on {19..49}

fix me if i wrong..
i am not sure that i understood you correctley.
did you mean that maybe the size of the current sub-transcript is 19, and if i will leave it size 50bp, i will miss the 19bp?
that is why i have to use k=19?
papori is offline   Reply With Quote
Old 03-07-2011, 01:56 AM   #9
papori
Senior Member
 
Location: berd

Join Date: Dec 2010
Posts: 181
Default

Quote:
Originally Posted by Thorondor View Post
compared to the overlapping method it needs less RAM when you have a lot of "short" reads as input. A assembler with de brujin graphs is for next genSeq output.

edit: the higher the kmer the higher the less RAM you need, because normally the de brujin graph will be smaller. With kmer 49 the overlap between reeds must be 49-1bp! If you want more details about the algorithms you could get daniel zerbinos phd thesis, it is easy to read and for the understanding it helps a lot.
assuming i have only 1 contig in length 50.
if i am using kmer 49, my de brujin graph will be in size 1.
but if i am using kmer 19 it will be much bigger...

what did you mean when you said it become smaller?
(it become smaller just in the number of overlaps.. but in it size it becoming bigger)

thanks..
papori is offline   Reply With Quote
Old 03-07-2011, 03:52 AM   #10
Thorondor
Member
 
Location: Heidelberg

Join Date: Feb 2011
Posts: 69
Default

well not really. You have 1 READ with length 50! But you should think in HIGH numbers and there it's different. ;-)

for a kmer 3 there are 4^3 possibilities of nodes for the de Brujin graph: AAA, AAG, AGG, GGG, GAG, GAC, GCC.....

for higher kmers like 49 you have 4^49, normally you never reach the maximum of nodes for such high kmers. So less nodes compared to kmer 19, less overlaps, less junctions => smaller de brujin graph => easier to calculate. Problem is you will miss transcripts with a low coverage because the reads won't overlap with 48bp.
Thorondor is offline   Reply With Quote
Old 03-07-2011, 04:16 AM   #11
papori
Senior Member
 
Location: berd

Join Date: Dec 2010
Posts: 181
Default

now i am really confused.....
you said: the higher the kmer the less RAM you need, graph become smaller.
and now you said: less nodes compared to kmer 19, less overlaps, less junctions => smaller de brujin graph => easier to calculate.

so when the graph become smaller?
when i have less overlaps, less junctions ,smaller de brujin graph ?
papori is offline   Reply With Quote
Old 03-07-2011, 04:28 AM   #12
Thorondor
Member
 
Location: Heidelberg

Join Date: Feb 2011
Posts: 69
Default

49 kmer compared to kmer 19. You really should read a bit more about the algorithm. :-/ 19 is a really LOW kmer you normally choose higher kmers but of course this depends on your read length.
Thorondor is offline   Reply With Quote
Old 03-07-2011, 04:30 AM   #13
papori
Senior Member
 
Location: berd

Join Date: Dec 2010
Posts: 181
Default

now i understood!
cheers mate!
papori is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:50 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO