Seqanswers Leaderboard Ad

**rwenang** · 03-07-2011, 12:16 AM

you mean the de bruijn approach? supposedly it resolves repeats better than overlapping method (OLC)

**papori** · 03-07-2011, 12:21 AM

Yes, de bruijn approach.
the structure of the graph is handling repeats(the hash table).
in transcriptome there are not so many repeats..
but why to do it on a range of different k-mers? {20..49}

**Thorondor** · 03-07-2011, 12:29 AM

compared to the overlapping method it needs less RAM when you have a lot of "short" reads as input. A assembler with de brujin graphs is for next genSeq output.

edit: the higher the kmer the higher the less RAM you need, because normally the de brujin graph will be smaller. With kmer 49 the overlap between reeds must be 49-1bp! If you want more details about the algorithms you could get daniel zerbinos phd thesis, it is easy to read and for the understanding it helps a lot.

**rwenang** · 03-07-2011, 12:42 AM

In transcriptome, specifying different k-mers is applied to accommodate transcripts with different sizes.

**papori** · 03-07-2011, 12:52 AM

i dont understand.
if i have this contig:
AGTCAGTTTGGCCCTTG assume this is the output of solexa.

is it all from the same transcript?

how the k-mer accommodate with different sizes of this transcript??.

**rwenang** · 03-07-2011, 01:14 AM

in transcriptome, the reads come from many dna transcripts, which is why the assembler uses different k-mer sizes to try to assemble them correctly. Meanwhile, in denovo assembly, the reads come from the whole genomic dna (one big sequence).

As for how exactly different k-mer accommodate transcripts, you might want to read the Oases paper. "Oases: De novo transcriptome assembler for very short reads".

**papori** · 03-07-2011, 01:38 AM

if i have contig in length 50 bp,
how it will help me if i will break it to peices of 19bp with 18bp overlap to know its transcript size?
and so on {19..49}

fix me if i wrong..
i am not sure that i understood you correctley.
did you mean that maybe the size of the current sub-transcript is 19, and if i will leave it size 50bp, i will miss the 19bp?
that is why i have to use k=19?

**papori** · 03-07-2011, 01:56 AM

Originally posted by Thorondor View Post

compared to the overlapping method it needs less RAM when you have a lot of "short" reads as input. A assembler with de brujin graphs is for next genSeq output.

edit: the higher the kmer the higher the less RAM you need, because normally the de brujin graph will be smaller. With kmer 49 the overlap between reeds must be 49-1bp! If you want more details about the algorithms you could get daniel zerbinos phd thesis, it is easy to read and for the understanding it helps a lot.

assuming i have only 1 contig in length 50.
if i am using kmer 49, my de brujin graph will be in size 1.
but if i am using kmer 19 it will be much bigger...

what did you mean when you said it become smaller?
(it become smaller just in the number of overlaps.. but in it size it becoming bigger)

thanks..

**Thorondor** · 03-07-2011, 03:52 AM

well not really. You have 1 READ with length 50! But you should think in HIGH numbers and there it's different. ;-)

for a kmer 3 there are 4^3 possibilities of nodes for the de Brujin graph: AAA, AAG, AGG, GGG, GAG, GAC, GCC.....

for higher kmers like 49 you have 4^49, normally you never reach the maximum of nodes for such high kmers. So less nodes compared to kmer 19, less overlaps, less junctions => smaller de brujin graph => easier to calculate. Problem is you will miss transcripts with a low coverage because the reads won't overlap with 48bp.

**papori** · 03-07-2011, 04:16 AM

now i am really confused.....
you said: the higher the kmer the less RAM you need, graph become smaller.
and now you said: less nodes compared to kmer 19, less overlaps, less junctions => smaller de brujin graph => easier to calculate.

so when the graph become smaller?
when i have less overlaps, less junctions ,smaller de brujin graph ?

**Thorondor** · 03-07-2011, 04:28 AM

49 kmer compared to kmer 19. You really should read a bit more about the algorithm. :-/ 19 is a really LOW kmer you normally choose higher kmers but of course this depends on your read length.

**papori** · 03-07-2011, 04:30 AM

now i understood!
cheers mate!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 47 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Why using k-mer?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News