Hello. Velvet firstly hashes all the reads, creating s.c "roadmaps". Could someone explain how exactly that helps in the construction of de Bruijn graph? Thank you.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Your question is probably better directed to the velvet-users list. Daniel Zerbino recently posted this answer to a similar question:
Also, have you read the Velvet paper and manual?
An international, peer-reviewed genome sciences journal featuring outstanding original research that offers novel insights into the biology of all organisms
Code:The Roadmap file is not normally meant to be parsed by the user, it is simply internal to velvet. However, if you really want to know the format is: ROADMAP $query_id $target_id $prev_kmers $target_start $target_finish etc. Where: $target_id is negative if on reverse strand [ $target_start, $target_finish [ is the domain of the target sequence being aligned (note that it is inclusive at the start and exclusive at the finish) $prev_kmers is the number of unaligned k-mers in the query sequence before the alignment to the target (note in the examples below how this value does not necessarily change from alignment to alignment, this means that they are contiguous on the query.)
-
I've read those papers and searched/thought a lot on that. Still no idea how it works exactly. Only assumptions. I need it for a presentation. Hashing k-mers seems to have no sense for winning the time if some regions in two reads overlap in k-1 chars. Only in cases when you have at least k length identical sequences in two or more reads.
Example:
ACCTCGAT GCTCTAGG
ACCT GCTC
CCTC CTCT
CTCG TCTA
TCGA CTAG
CGAT TAGG
PS. Thx for the tip, just found the list and wrote to them.Last edited by bioinf; 01-11-2011, 04:59 AM.
Comment
-
-
Daniel was very kind to write to me personally. What he wrote is:
In the assembly process of Velvet, an arc (i.e. a k-1 mer overlap between two k-mers) is only instantiated if a read actually goes from one k-mer to the next.
In your case, no read goes from CCTC to CTCT (simply because sequence CCTCT does not exist) therefore it is not picked up.
Comment
-
Originally posted by bioinf View PostDaniel was very kind to write to me personally. What he wrote is:
In the assembly process of Velvet, an arc (i.e. a k-1 mer overlap between two k-mers) is only instantiated if a read actually goes from one k-mer to the next.
In your case, no read goes from CCTC to CTCT (simply because sequence CCTCT does not exist) therefore it is not picked up.
Where does it say you need k identical chars? You need k-1 identical overlap..
The example he shows means that a read needs to have a read containing CCTCT before it is picked up. Both your reads in the example (ACCTCGAT and GCTCTAGG) do not contain CCTCT.
Comment
-
Ok, I see now. If the second read has CCTCT inside of it, it means it has two k-mers there CCTC and CTCT, which leads to a situation where I have the same CCTC k-mer in both reads. I guess what he implicitly meant was that you actually have to have identical k-mers in two reads in order to draw an arc from that X k-mer of the first read to a Y k-mer of the second read following right after that identical X k-mer of the very second read.
It is then in some sense related to the transitivity rule.
There is X in read 1
There is X in read 2
There is Y in read 2 connected to X in read 2
-----
Conclusion: draw arc from X of read 1 to Y of 2; thus you clearly see the connection between read 1 and read 2
Am I warmer?Last edited by bioinf; 01-12-2011, 02:15 AM.
Comment
-
Yes, i think that is it. However, since you have already had contact with Daniel Zerbino, you could better ask him, because he is the one who developed Velvet and knows it better than anyone here on this forum
I tried to make a small demo of how velvet works from the command line. Here is my fasta file:
>SEQUENCE_0
ACCTAG
>SEQUENCE_1
CCTAGAACG
velveth result/ 5 sequence.fst
velvetg result/ -min_contig_lgth 1 -long_mult_cutoff 1
Here is the output:
velveth:
[0.000001] Reading FastA file sequence.fst;
[0.000087] 2 sequences found
[0.000097] Done
[0.000155] Reading read set file result//Sequences;
[0.000192] 2 sequences found
[0.000759] Done
[0.000770] 2 sequences in total.
[0.000805] Writing into roadmap file result//Roadmaps...
[0.000832] Inputting sequences...
[0.000839] Inputting sequence 0 / 2
[0.000960] Done inputting sequences
[0.000969] Destroying splay table
[0.001020] Splay table destroyedvelvetg:
[0.000001] Reading roadmap file result//Roadmaps
[0.000102] 2 roadmaps reads
[0.000128] Creating insertion markers
[0.000138] Ordering insertion markers
[0.000156] Counting preNodes
[0.000165] 2 preNodes counted, creating them now
[0.000220] Adjusting marker info...
[0.000230] Connecting preNodes
[0.000260] Cleaning up memory
[0.000267] Done creating preGraph
[0.000274] Concatenation...
[0.000288] Renumbering preNodes
[0.000294] Initial preNode count 2
[0.000304] Destroyed 1 preNodes
[0.000311] Concatenation over!
[0.000317] Clipping short tips off preGraph
[0.000325] Concatenation...
[0.000330] Renumbering preNodes
[0.000336] Initial preNode count 1
[0.000343] Destroyed 1 preNodes
[0.000349] Concatenation over!
[0.000355] 1 tips cut off
[0.000361] 0 nodes left
[0.000412] Writing into pregraph file result//PreGraph...
[0.000489] Reading read set file result//Sequences;
[0.000533] 2 sequences found
[0.000605] Done
[0.000659] Reading pre-graph file result//PreGraph
[0.000690] Graph has 0 nodes and 2 sequences
[0.000712] Correcting graph with cutoff 0.200000
[0.000945] Determining eligible starting points
[0.000959] Done listing starting nodes
[0.000966] Initializing todo lists
[0.000972] Done with initilization
[0.000979] Activating arc lookup table
[0.000985] Done activating arc lookup table
[0.000992] Concatenation...
[0.000998] Renumbering nodes
[0.001004] Initial node count 0
[0.001012] Removed 0 null nodes
[0.001022] Concatenation over!
[0.001031] Clipping short tips off graph, drastic
[0.001037] Concatenation...
[0.001043] Renumbering nodes
[0.001049] Initial node count 0
[0.001055] Removed 0 null nodes
[0.001062] Concatenation over!
[0.001067] 0 nodes left
[0.001118] Writing into graph file result//Graph...
[0.001168] WARNING: NO COVERAGE CUTOFF PROVIDED
[0.001177] Velvet will probably leave behind many detectable errors
[0.001184] See manual for instructions on how to set the coverage cutoff parameter
[0.001195] Removing contigs with coverage < -1.000000...
[0.001207] Concatenation...
[0.001213] Renumbering nodes
[0.001218] Initial node count 0
[0.001225] Removed 0 null nodes
[0.001232] Concatenation over!
[0.001238] Concatenation...
[0.001244] Renumbering nodes
[0.001249] Initial node count 0
[0.001256] Removed 0 null nodes
[0.001262] Concatenation over!
[0.001271] Clipping short tips off graph, drastic
[0.001278] Concatenation...
[0.001284] Renumbering nodes
[0.001290] Initial node count 0
[0.001296] Removed 0 null nodes
[0.001302] Concatenation over!
[0.001308] 0 nodes left
[0.001314] WARNING: NO EXPECTED COVERAGE PROVIDED
[0.001321] Velvet will be unable to resolve any repeats
[0.001327] See manual for instructions on how to set the expected coverage parameter
[0.001335] Concatenation...
[0.001340] Renumbering nodes
[0.001346] Initial node count 0
[0.001353] Removed 0 null nodes
[0.001359] Concatenation over!
[0.001393] Writing contigs into result//contigs.fa...
[0.001427] Writing into stats file result//stats.txt...
[0.001788] Writing into graph file result//LastGraph...
[0.001859] EMPTY GRAPH
Final graph has 0 nodes and n50 of 0, max 0, total 0, using 0/2 readsLast edited by bioinf; 01-12-2011, 02:15 AM.
Comment
-
Probably has something to do with your coverage
Try to increase the size of your first read and double them for increased coverage;
>SEQUENCE_0
ACCTAGAACGT
>SEQUENCE_1
ACCTAGAACGT
>SEQUENCE_2
CCTAGAACGTT
>SEQUENCE_3
CCTAGAACGTT
This should work...
Comment
-
Hi all,
I am running denovo assembly with velvet color space build and when i run velveth_de, it terminates halfway before it writes the ./Roadmap file. I have no idea what the problem is and i am running this on a cluster with 12 cores and 2400mb memory. Version is 1.1.04.
this is what i see on the terminal:
[2468.477166] 73728280 sequences found
[2468.477184] Done
[2468.477501] Reading FastA file /v/users/okeke/reads/547_3H_doubleEncoded_input.de;
[2641.750344] 103910882 sequences found
[2641.750359] Done
[2641.750610] Reading FastA file /v/users/okeke/reads/547_1D_doubleEncoded_input.de;
[2765.845768] 79934018 sequences found
[2765.845785] Done
[2765.846042] Reading FastA file /v/users/okeke/reads/547_4D_doubleEncoded_input.de;
[3176.325687] 251833816 sequences found
[3176.325703] Done
[3176.434276] Reading read set file pine_denovo/Sequences;
Killed
2804.848u 794.606s 1:04:54.36 92.4% 0+0k 0+0io 30pf+0w
Thanks for your help in advance.Last edited by urchgene; 06-21-2011, 05:08 AM.
Comment
-
Latest Articles
Collapse
-
by seqadmin
Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...-
Channel: Articles
12-16-2024, 07:57 AM -
-
by seqadmin
Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.
Long-Read Sequencing
Long-read sequencing has seen remarkable advancements,...-
Channel: Articles
12-02-2024, 01:49 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 12-17-2024, 10:28 AM
|
0 responses
26 views
0 likes
|
Last Post
by seqadmin
12-17-2024, 10:28 AM
|
||
Started by seqadmin, 12-13-2024, 08:24 AM
|
0 responses
43 views
0 likes
|
Last Post
by seqadmin
12-13-2024, 08:24 AM
|
||
Started by seqadmin, 12-12-2024, 07:41 AM
|
0 responses
29 views
0 likes
|
Last Post
by seqadmin
12-12-2024, 07:41 AM
|
||
Started by seqadmin, 12-11-2024, 07:45 AM
|
0 responses
42 views
0 likes
|
Last Post
by seqadmin
12-11-2024, 07:45 AM
|
Comment