SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
[Velvet,assembly] core dumped occured by runnning velvet matador3000 De novo discovery 0 12-17-2011 07:31 AM
Velvet Assembler: expected coverage versus estimated coverage versus effective covera DMCH Bioinformatics 1 11-30-2011 04:21 AM
Velvet assembler bioinf Bioinformatics 31 08-24-2011 09:19 AM
velvet's output: Roadmaps shuang Bioinformatics 1 08-18-2011 01:43 PM
velvet assembler contigs into gbrowse Zimbobo Bioinformatics 0 04-15-2010 01:36 PM

Reply
 
Thread Tools
Old 01-11-2011, 03:06 AM   #1
bioinf
Member
 
Location: Germany

Join Date: Nov 2010
Posts: 25
Default velvet assembler, roadmaps

Hello. Velvet firstly hashes all the reads, creating s.c "roadmaps". Could someone explain how exactly that helps in the construction of de Bruijn graph? Thank you.
bioinf is offline   Reply With Quote
Old 01-11-2011, 03:18 AM   #2
nickloman
Senior Member
 
Location: Birmingham, UK

Join Date: Jul 2009
Posts: 356
Default

Your question is probably better directed to the velvet-users list. Daniel Zerbino recently posted this answer to a similar question:

Also, have you read the Velvet paper and manual?
http://genome.cshlp.org/content/18/5/821
http://www.ebi.ac.uk/~zerbino/velvet/Manual.pdf

Code:
The Roadmap file is not normally meant to be parsed by the user, it is simply internal to velvet.

However, if you really want to know the format is:

ROADMAP    $query_id
$target_id    $prev_kmers    $target_start    $target_finish
etc.

Where:
$target_id is negative if on reverse strand
[ $target_start, $target_finish [ is the domain of the target sequence being aligned (note that it is inclusive at the start and exclusive at the finish)
$prev_kmers is the number of unaligned k-mers in the query sequence before the alignment to the target (note in the examples below how this value does not necessarily change from alignment to alignment, this means that they are contiguous on the query.)
nickloman is offline   Reply With Quote
Old 01-11-2011, 03:45 AM   #3
bioinf
Member
 
Location: Germany

Join Date: Nov 2010
Posts: 25
Default

I've read those papers and searched/thought a lot on that. Still no idea how it works exactly. Only assumptions. I need it for a presentation. Hashing k-mers seems to have no sense for winning the time if some regions in two reads overlap in k-1 chars. Only in cases when you have at least k length identical sequences in two or more reads.

Example:
Quote:
ACCTCGAT GCTCTAGG
ACCT GCTC
CCTC CTCT
CTCG TCTA
TCGA CTAG
CGAT TAGG
CCTC and CTCT overlap in k-1, but roadmaps are helpless here cause then don't bind these two reads although they are connected because of the afore-mentioned two k-mers.

PS. Thx for the tip, just found the list and wrote to them.

Last edited by bioinf; 01-11-2011 at 03:59 AM.
bioinf is offline   Reply With Quote
Old 01-11-2011, 06:39 AM   #4
nickloman
Senior Member
 
Location: Birmingham, UK

Join Date: Jul 2009
Posts: 356
Default

Quote:
Originally Posted by bioinf View Post
Only in cases when you have at least k length identical sequences in two or more reads.
My limited understanding is that needs to be the case. Either way I'm sure you'll get the answer on the Velvet mailing list.
nickloman is offline   Reply With Quote
Old 01-11-2011, 06:45 AM   #5
bioinf
Member
 
Location: Germany

Join Date: Nov 2010
Posts: 25
Default

Daniel was very kind to write to me personally. What he wrote is:
Quote:
In the assembly process of Velvet, an arc (i.e. a k-1 mer overlap between two k-mers) is only instantiated if a read actually goes from one k-mer to the next.

In your case, no read goes from CCTC to CTCT (simply because sequence CCTCT does not exist) therefore it is not picked up.
Now I have to understand his reply. But why is it required that we have k identical chars? De Bruijn graph requires k-1 overlaps and throughout the paper we're told that arcs represent overlaps in k-1 chars.
bioinf is offline   Reply With Quote
Old 01-11-2011, 11:52 PM   #6
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Quote:
Originally Posted by bioinf View Post
Daniel was very kind to write to me personally. What he wrote is:

Quote:
In the assembly process of Velvet, an arc (i.e. a k-1 mer overlap between two k-mers) is only instantiated if a read actually goes from one k-mer to the next.

In your case, no read goes from CCTC to CTCT (simply because sequence CCTCT does not exist) therefore it is not picked up.
Now I have to understand his reply. But why is it required that we have k identical chars? De Bruijn graph requires k-1 overlaps and throughout the paper we're told that arcs represent overlaps in k-1 chars.

Where does it say you need k identical chars? You need k-1 identical overlap..

The example he shows means that a read needs to have a read containing CCTCT before it is picked up. Both your reads in the example (ACCTCGAT and GCTCTAGG) do not contain CCTCT.
boetsie is offline   Reply With Quote
Old 01-12-2011, 12:12 AM   #7
bioinf
Member
 
Location: Germany

Join Date: Nov 2010
Posts: 25
Default

Ok, I see now. If the second read has CCTCT inside of it, it means it has two k-mers there CCTC and CTCT, which leads to a situation where I have the same CCTC k-mer in both reads. I guess what he implicitly meant was that you actually have to have identical k-mers in two reads in order to draw an arc from that X k-mer of the first read to a Y k-mer of the second read following right after that identical X k-mer of the very second read.

It is then in some sense related to the transitivity rule.
There is X in read 1
There is X in read 2
There is Y in read 2 connected to X in read 2
-----
Conclusion: draw arc from X of read 1 to Y of 2; thus you clearly see the connection between read 1 and read 2

Am I warmer?

Last edited by bioinf; 01-12-2011 at 01:15 AM.
bioinf is offline   Reply With Quote
Old 01-12-2011, 01:01 AM   #8
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Yes, i think that is it. However, since you have already had contact with Daniel Zerbino, you could better ask him, because he is the one who developed Velvet and knows it better than anyone here on this forum
boetsie is offline   Reply With Quote
Old 01-12-2011, 01:07 AM   #9
bioinf
Member
 
Location: Germany

Join Date: Nov 2010
Posts: 25
Default

Quote:
Yes, i think that is it. However, since you have already had contact with Daniel Zerbino, you could better ask him, because he is the one who developed Velvet and knows it better than anyone here on this forum
But thx for help guys. You helped a lot anyway.


I tried to make a small demo of how velvet works from the command line. Here is my fasta file:
Quote:
>SEQUENCE_0
ACCTAG
>SEQUENCE_1
CCTAGAACG
When I try to assemble these two reads by doing:
Quote:
velveth result/ 5 sequence.fst
velvetg result/ -min_contig_lgth 1 -long_mult_cutoff 1
What I get is an empty contig.fa Why does it happen?

Here is the output:
Quote:
velveth:
[0.000001] Reading FastA file sequence.fst;
[0.000087] 2 sequences found
[0.000097] Done
[0.000155] Reading read set file result//Sequences;
[0.000192] 2 sequences found
[0.000759] Done
[0.000770] 2 sequences in total.
[0.000805] Writing into roadmap file result//Roadmaps...
[0.000832] Inputting sequences...
[0.000839] Inputting sequence 0 / 2
[0.000960] Done inputting sequences
[0.000969] Destroying splay table
[0.001020] Splay table destroyed
Quote:
velvetg:
[0.000001] Reading roadmap file result//Roadmaps
[0.000102] 2 roadmaps reads
[0.000128] Creating insertion markers
[0.000138] Ordering insertion markers
[0.000156] Counting preNodes
[0.000165] 2 preNodes counted, creating them now
[0.000220] Adjusting marker info...
[0.000230] Connecting preNodes
[0.000260] Cleaning up memory
[0.000267] Done creating preGraph
[0.000274] Concatenation...
[0.000288] Renumbering preNodes
[0.000294] Initial preNode count 2
[0.000304] Destroyed 1 preNodes
[0.000311] Concatenation over!
[0.000317] Clipping short tips off preGraph
[0.000325] Concatenation...
[0.000330] Renumbering preNodes
[0.000336] Initial preNode count 1
[0.000343] Destroyed 1 preNodes
[0.000349] Concatenation over!
[0.000355] 1 tips cut off
[0.000361] 0 nodes left
[0.000412] Writing into pregraph file result//PreGraph...
[0.000489] Reading read set file result//Sequences;
[0.000533] 2 sequences found
[0.000605] Done
[0.000659] Reading pre-graph file result//PreGraph
[0.000690] Graph has 0 nodes and 2 sequences
[0.000712] Correcting graph with cutoff 0.200000
[0.000945] Determining eligible starting points
[0.000959] Done listing starting nodes
[0.000966] Initializing todo lists
[0.000972] Done with initilization
[0.000979] Activating arc lookup table
[0.000985] Done activating arc lookup table
[0.000992] Concatenation...
[0.000998] Renumbering nodes
[0.001004] Initial node count 0
[0.001012] Removed 0 null nodes
[0.001022] Concatenation over!
[0.001031] Clipping short tips off graph, drastic
[0.001037] Concatenation...
[0.001043] Renumbering nodes
[0.001049] Initial node count 0
[0.001055] Removed 0 null nodes
[0.001062] Concatenation over!
[0.001067] 0 nodes left
[0.001118] Writing into graph file result//Graph...
[0.001168] WARNING: NO COVERAGE CUTOFF PROVIDED
[0.001177] Velvet will probably leave behind many detectable errors
[0.001184] See manual for instructions on how to set the coverage cutoff parameter
[0.001195] Removing contigs with coverage < -1.000000...
[0.001207] Concatenation...
[0.001213] Renumbering nodes
[0.001218] Initial node count 0
[0.001225] Removed 0 null nodes
[0.001232] Concatenation over!
[0.001238] Concatenation...
[0.001244] Renumbering nodes
[0.001249] Initial node count 0
[0.001256] Removed 0 null nodes
[0.001262] Concatenation over!
[0.001271] Clipping short tips off graph, drastic
[0.001278] Concatenation...
[0.001284] Renumbering nodes
[0.001290] Initial node count 0
[0.001296] Removed 0 null nodes
[0.001302] Concatenation over!
[0.001308] 0 nodes left
[0.001314] WARNING: NO EXPECTED COVERAGE PROVIDED
[0.001321] Velvet will be unable to resolve any repeats
[0.001327] See manual for instructions on how to set the expected coverage parameter
[0.001335] Concatenation...
[0.001340] Renumbering nodes
[0.001346] Initial node count 0
[0.001353] Removed 0 null nodes
[0.001359] Concatenation over!
[0.001393] Writing contigs into result//contigs.fa...
[0.001427] Writing into stats file result//stats.txt...
[0.001788] Writing into graph file result//LastGraph...
[0.001859] EMPTY GRAPH
Final graph has 0 nodes and n50 of 0, max 0, total 0, using 0/2 reads

Last edited by bioinf; 01-12-2011 at 01:15 AM.
bioinf is offline   Reply With Quote
Old 01-12-2011, 01:34 AM   #10
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Probably has something to do with your coverage

Try to increase the size of your first read and double them for increased coverage;

>SEQUENCE_0
ACCTAGAACGT
>SEQUENCE_1
ACCTAGAACGT
>SEQUENCE_2
CCTAGAACGTT
>SEQUENCE_3
CCTAGAACGTT

This should work...
boetsie is offline   Reply With Quote
Old 01-12-2011, 01:40 AM   #11
bioinf
Member
 
Location: Germany

Join Date: Nov 2010
Posts: 25
Default

Yes, worked for me. Thx!
bioinf is offline   Reply With Quote
Old 06-21-2011, 04:44 AM   #12
urchgene
Member
 
Location: helsinki

Join Date: Oct 2010
Posts: 14
Default

Hi all,

I am running denovo assembly with velvet color space build and when i run velveth_de, it terminates halfway before it writes the ./Roadmap file. I have no idea what the problem is and i am running this on a cluster with 12 cores and 2400mb memory. Version is 1.1.04.

this is what i see on the terminal:

[2468.477166] 73728280 sequences found
[2468.477184] Done
[2468.477501] Reading FastA file /v/users/okeke/reads/547_3H_doubleEncoded_input.de;
[2641.750344] 103910882 sequences found
[2641.750359] Done
[2641.750610] Reading FastA file /v/users/okeke/reads/547_1D_doubleEncoded_input.de;
[2765.845768] 79934018 sequences found
[2765.845785] Done
[2765.846042] Reading FastA file /v/users/okeke/reads/547_4D_doubleEncoded_input.de;
[3176.325687] 251833816 sequences found
[3176.325703] Done
[3176.434276] Reading read set file pine_denovo/Sequences;
Killed
2804.848u 794.606s 1:04:54.36 92.4% 0+0k 0+0io 30pf+0w

Thanks for your help in advance.

Last edited by urchgene; 06-21-2011 at 05:08 AM.
urchgene is offline   Reply With Quote
Old 02-06-2013, 07:03 AM   #13
A_Morozov
Member
 
Location: Russia, Irkutsk

Join Date: Feb 2011
Posts: 40
Default

Same stuff as in urchgene's post observed with velvet 1.2.08.

Seems like a problem is just a lack of RAM. I moved to cluster with more of it and everything works fine. Would be nice if velvet could write normal crash logs, though.

Last edited by A_Morozov; 02-12-2013 at 08:39 PM.
A_Morozov is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:12 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO