View Single Post
Old 07-12-2017, 05:11 AM   #1
Splinter479
Junior Member
 
Location: Berlin, Germany

Join Date: Aug 2012
Posts: 6
Default Reconstructing a de Bruijn Graph from Velvet's LastGraph

Hi,

I was wondering if someone can explain/ give hints to me about how to regain a de Bruijn Graph from Velvet's LastGraph representation of nodes and arcs?

I created an example with hashsize (k)=7 and told velvetg to "keep all data" by setting cov_cutoff=1 min_contig_lgth=1:

Code:
>SEQUENCE_0
TTTTTTCATGCATGCTAGCGTGTGTGTGT
>SEQUENCE_1
GTGTGTGTGTTAGCTGCAGTATGCGGAAC
>SEQUENCE_2
ACACACACACATCGATTGTTATGCGGAAC
>SEQUENCE_3
TATGCGGAACGCATTGCTAACTCGGGGGG
results in

Code:
5	4	7	1
NODE	1	12	13	13	0	0
GCTAGCATGCAT
GCTAGCGTGTGT
NODE	2	1	6	6	0	0
G
C
NODE	3	1	7	7	0	0
T
A
NODE	4	15	15	15	0	0
TAGCTGCAGTATGCG
CTGCAGCTAACACAC
NODE	5	14	14	14	0	0
TCGATTGTTATGCG
ACAATCGATGTGTG
ARC	1	-1	2
ARC	-1	2	1
ARC	2	3	6
ARC	-2	-3	4
ARC	3	4	1
ARC	-3	5	1
I don't get exactly HOW to read out the sequence from the traversal of the nodes, e.g. (1) --> (-1), (-1) --> (2) and so on.
Once I know how to read the sequence I could rebuild a simple de Bruijn Graph from it. Or is there maybe some simpler (back-)transformation from LastGraph to de Bruijn Graph?

The (very old) manual page of Velvet is not helping too much

Any help highly appreciated.

Thanks!


------------------------------
EDIT
------------------------------

Another example:
Let's take a sequence s=ACTGGACTGAA
As I recall this results in the dBG of s:
Code:
ACT --> CTG --> TGG --> GGA --> GAC --> (ACT...)
         | 
         |--> TGA --> GAA
Velvet's result is (k=3, velvetg cov_cutoff=1 min_contig_lgth=1):
1 1 3 1
NODE 1 5 7 7 0 0
TGGAC
CCAGT
ARC 1 1 1

This means I would be able to recover:
Code:
(Node 1 upper seq.)              TGGAC
                                 |||    --> ACTGGAC  (and an arbitrary amount of concatenations of this seq. since ARC (1) --> (1) )
(Node 1 lower seq. rev.comp.)  ACTGG
BUT, it seems to me that Velvet is missing the entire alternative path "TGAA" of the dBG of s.
I would have expected another node and an additional arc to find that sequence.

Am I right until here? Do I miss some/ should I change some parameter?

Last edited by Splinter479; 07-12-2017 at 11:26 PM.
Splinter479 is offline   Reply With Quote