Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • bioinf
    Member
    • Nov 2010
    • 25

    velvet assembler, roadmaps

    Hello. Velvet firstly hashes all the reads, creating s.c "roadmaps". Could someone explain how exactly that helps in the construction of de Bruijn graph? Thank you.
  • nickloman
    Senior Member
    • Jul 2009
    • 355

    #2
    Your question is probably better directed to the velvet-users list. Daniel Zerbino recently posted this answer to a similar question:

    Also, have you read the Velvet paper and manual?



    Code:
    The Roadmap file is not normally meant to be parsed by the user, it is simply internal to velvet.
    
    However, if you really want to know the format is:
    
    ROADMAP    $query_id
    $target_id    $prev_kmers    $target_start    $target_finish
    etc.
    
    Where:
    $target_id is negative if on reverse strand
    [ $target_start, $target_finish [ is the domain of the target sequence being aligned (note that it is inclusive at the start and exclusive at the finish)
    $prev_kmers is the number of unaligned k-mers in the query sequence before the alignment to the target (note in the examples below how this value does not necessarily change from alignment to alignment, this means that they are contiguous on the query.)

    Comment

    • bioinf
      Member
      • Nov 2010
      • 25

      #3
      I've read those papers and searched/thought a lot on that. Still no idea how it works exactly. Only assumptions. I need it for a presentation. Hashing k-mers seems to have no sense for winning the time if some regions in two reads overlap in k-1 chars. Only in cases when you have at least k length identical sequences in two or more reads.

      Example:
      ACCTCGAT GCTCTAGG
      ACCT GCTC
      CCTC CTCT
      CTCG TCTA
      TCGA CTAG
      CGAT TAGG
      CCTC and CTCT overlap in k-1, but roadmaps are helpless here cause then don't bind these two reads although they are connected because of the afore-mentioned two k-mers.

      PS. Thx for the tip, just found the list and wrote to them.
      Last edited by bioinf; 01-11-2011, 04:59 AM.

      Comment

      • nickloman
        Senior Member
        • Jul 2009
        • 355

        #4
        Originally posted by bioinf View Post
        Only in cases when you have at least k length identical sequences in two or more reads.
        My limited understanding is that needs to be the case. Either way I'm sure you'll get the answer on the Velvet mailing list.

        Comment

        • bioinf
          Member
          • Nov 2010
          • 25

          #5
          Daniel was very kind to write to me personally. What he wrote is:
          In the assembly process of Velvet, an arc (i.e. a k-1 mer overlap between two k-mers) is only instantiated if a read actually goes from one k-mer to the next.

          In your case, no read goes from CCTC to CTCT (simply because sequence CCTCT does not exist) therefore it is not picked up.
          Now I have to understand his reply. But why is it required that we have k identical chars? De Bruijn graph requires k-1 overlaps and throughout the paper we're told that arcs represent overlaps in k-1 chars.

          Comment

          • boetsie
            Senior Member
            • Feb 2010
            • 245

            #6
            Originally posted by bioinf View Post
            Daniel was very kind to write to me personally. What he wrote is:

            In the assembly process of Velvet, an arc (i.e. a k-1 mer overlap between two k-mers) is only instantiated if a read actually goes from one k-mer to the next.

            In your case, no read goes from CCTC to CTCT (simply because sequence CCTCT does not exist) therefore it is not picked up.
            Now I have to understand his reply. But why is it required that we have k identical chars? De Bruijn graph requires k-1 overlaps and throughout the paper we're told that arcs represent overlaps in k-1 chars.

            Where does it say you need k identical chars? You need k-1 identical overlap..

            The example he shows means that a read needs to have a read containing CCTCT before it is picked up. Both your reads in the example (ACCTCGAT and GCTCTAGG) do not contain CCTCT.

            Comment

            • bioinf
              Member
              • Nov 2010
              • 25

              #7
              Ok, I see now. If the second read has CCTCT inside of it, it means it has two k-mers there CCTC and CTCT, which leads to a situation where I have the same CCTC k-mer in both reads. I guess what he implicitly meant was that you actually have to have identical k-mers in two reads in order to draw an arc from that X k-mer of the first read to a Y k-mer of the second read following right after that identical X k-mer of the very second read.

              It is then in some sense related to the transitivity rule.
              There is X in read 1
              There is X in read 2
              There is Y in read 2 connected to X in read 2
              -----
              Conclusion: draw arc from X of read 1 to Y of 2; thus you clearly see the connection between read 1 and read 2

              Am I warmer?
              Last edited by bioinf; 01-12-2011, 02:15 AM.

              Comment

              • boetsie
                Senior Member
                • Feb 2010
                • 245

                #8
                Yes, i think that is it. However, since you have already had contact with Daniel Zerbino, you could better ask him, because he is the one who developed Velvet and knows it better than anyone here on this forum

                Comment

                • bioinf
                  Member
                  • Nov 2010
                  • 25

                  #9
                  Yes, i think that is it. However, since you have already had contact with Daniel Zerbino, you could better ask him, because he is the one who developed Velvet and knows it better than anyone here on this forum
                  But thx for help guys. You helped a lot anyway.


                  I tried to make a small demo of how velvet works from the command line. Here is my fasta file:
                  >SEQUENCE_0
                  ACCTAG
                  >SEQUENCE_1
                  CCTAGAACG
                  When I try to assemble these two reads by doing:
                  velveth result/ 5 sequence.fst
                  velvetg result/ -min_contig_lgth 1 -long_mult_cutoff 1
                  What I get is an empty contig.fa Why does it happen?

                  Here is the output:
                  velveth:
                  [0.000001] Reading FastA file sequence.fst;
                  [0.000087] 2 sequences found
                  [0.000097] Done
                  [0.000155] Reading read set file result//Sequences;
                  [0.000192] 2 sequences found
                  [0.000759] Done
                  [0.000770] 2 sequences in total.
                  [0.000805] Writing into roadmap file result//Roadmaps...
                  [0.000832] Inputting sequences...
                  [0.000839] Inputting sequence 0 / 2
                  [0.000960] Done inputting sequences
                  [0.000969] Destroying splay table
                  [0.001020] Splay table destroyed
                  velvetg:
                  [0.000001] Reading roadmap file result//Roadmaps
                  [0.000102] 2 roadmaps reads
                  [0.000128] Creating insertion markers
                  [0.000138] Ordering insertion markers
                  [0.000156] Counting preNodes
                  [0.000165] 2 preNodes counted, creating them now
                  [0.000220] Adjusting marker info...
                  [0.000230] Connecting preNodes
                  [0.000260] Cleaning up memory
                  [0.000267] Done creating preGraph
                  [0.000274] Concatenation...
                  [0.000288] Renumbering preNodes
                  [0.000294] Initial preNode count 2
                  [0.000304] Destroyed 1 preNodes
                  [0.000311] Concatenation over!
                  [0.000317] Clipping short tips off preGraph
                  [0.000325] Concatenation...
                  [0.000330] Renumbering preNodes
                  [0.000336] Initial preNode count 1
                  [0.000343] Destroyed 1 preNodes
                  [0.000349] Concatenation over!
                  [0.000355] 1 tips cut off
                  [0.000361] 0 nodes left
                  [0.000412] Writing into pregraph file result//PreGraph...
                  [0.000489] Reading read set file result//Sequences;
                  [0.000533] 2 sequences found
                  [0.000605] Done
                  [0.000659] Reading pre-graph file result//PreGraph
                  [0.000690] Graph has 0 nodes and 2 sequences
                  [0.000712] Correcting graph with cutoff 0.200000
                  [0.000945] Determining eligible starting points
                  [0.000959] Done listing starting nodes
                  [0.000966] Initializing todo lists
                  [0.000972] Done with initilization
                  [0.000979] Activating arc lookup table
                  [0.000985] Done activating arc lookup table
                  [0.000992] Concatenation...
                  [0.000998] Renumbering nodes
                  [0.001004] Initial node count 0
                  [0.001012] Removed 0 null nodes
                  [0.001022] Concatenation over!
                  [0.001031] Clipping short tips off graph, drastic
                  [0.001037] Concatenation...
                  [0.001043] Renumbering nodes
                  [0.001049] Initial node count 0
                  [0.001055] Removed 0 null nodes
                  [0.001062] Concatenation over!
                  [0.001067] 0 nodes left
                  [0.001118] Writing into graph file result//Graph...
                  [0.001168] WARNING: NO COVERAGE CUTOFF PROVIDED
                  [0.001177] Velvet will probably leave behind many detectable errors
                  [0.001184] See manual for instructions on how to set the coverage cutoff parameter
                  [0.001195] Removing contigs with coverage < -1.000000...
                  [0.001207] Concatenation...
                  [0.001213] Renumbering nodes
                  [0.001218] Initial node count 0
                  [0.001225] Removed 0 null nodes
                  [0.001232] Concatenation over!
                  [0.001238] Concatenation...
                  [0.001244] Renumbering nodes
                  [0.001249] Initial node count 0
                  [0.001256] Removed 0 null nodes
                  [0.001262] Concatenation over!
                  [0.001271] Clipping short tips off graph, drastic
                  [0.001278] Concatenation...
                  [0.001284] Renumbering nodes
                  [0.001290] Initial node count 0
                  [0.001296] Removed 0 null nodes
                  [0.001302] Concatenation over!
                  [0.001308] 0 nodes left
                  [0.001314] WARNING: NO EXPECTED COVERAGE PROVIDED
                  [0.001321] Velvet will be unable to resolve any repeats
                  [0.001327] See manual for instructions on how to set the expected coverage parameter
                  [0.001335] Concatenation...
                  [0.001340] Renumbering nodes
                  [0.001346] Initial node count 0
                  [0.001353] Removed 0 null nodes
                  [0.001359] Concatenation over!
                  [0.001393] Writing contigs into result//contigs.fa...
                  [0.001427] Writing into stats file result//stats.txt...
                  [0.001788] Writing into graph file result//LastGraph...
                  [0.001859] EMPTY GRAPH
                  Final graph has 0 nodes and n50 of 0, max 0, total 0, using 0/2 reads
                  Last edited by bioinf; 01-12-2011, 02:15 AM.

                  Comment

                  • boetsie
                    Senior Member
                    • Feb 2010
                    • 245

                    #10
                    Probably has something to do with your coverage

                    Try to increase the size of your first read and double them for increased coverage;

                    >SEQUENCE_0
                    ACCTAGAACGT
                    >SEQUENCE_1
                    ACCTAGAACGT
                    >SEQUENCE_2
                    CCTAGAACGTT
                    >SEQUENCE_3
                    CCTAGAACGTT

                    This should work...

                    Comment

                    • bioinf
                      Member
                      • Nov 2010
                      • 25

                      #11
                      Yes, worked for me. Thx!

                      Comment

                      • urchgene
                        Member
                        • Oct 2010
                        • 14

                        #12
                        Hi all,

                        I am running denovo assembly with velvet color space build and when i run velveth_de, it terminates halfway before it writes the ./Roadmap file. I have no idea what the problem is and i am running this on a cluster with 12 cores and 2400mb memory. Version is 1.1.04.

                        this is what i see on the terminal:

                        [2468.477166] 73728280 sequences found
                        [2468.477184] Done
                        [2468.477501] Reading FastA file /v/users/okeke/reads/547_3H_doubleEncoded_input.de;
                        [2641.750344] 103910882 sequences found
                        [2641.750359] Done
                        [2641.750610] Reading FastA file /v/users/okeke/reads/547_1D_doubleEncoded_input.de;
                        [2765.845768] 79934018 sequences found
                        [2765.845785] Done
                        [2765.846042] Reading FastA file /v/users/okeke/reads/547_4D_doubleEncoded_input.de;
                        [3176.325687] 251833816 sequences found
                        [3176.325703] Done
                        [3176.434276] Reading read set file pine_denovo/Sequences;
                        Killed
                        2804.848u 794.606s 1:04:54.36 92.4% 0+0k 0+0io 30pf+0w

                        Thanks for your help in advance.
                        Last edited by urchgene; 06-21-2011, 05:08 AM.

                        Comment

                        • A_Morozov
                          Member
                          • Feb 2011
                          • 40

                          #13
                          Same stuff as in urchgene's post observed with velvet 1.2.08.

                          Seems like a problem is just a lack of RAM. I moved to cluster with more of it and everything works fine. Would be nice if velvet could write normal crash logs, though.
                          Last edited by A_Morozov; 02-12-2013, 09:39 PM.

                          Comment

                          Latest Articles

                          Collapse

                          • SEQadmin2
                            Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                            by SEQadmin2


                            I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                            Here are nine questions we think about, in roughly the order they matter, before...
                            06-18-2026, 07:11 AM
                          • SEQadmin2
                            From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                            by SEQadmin2


                            Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                            The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                            ...
                            06-02-2026, 10:05 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by SEQadmin2, Yesterday, 05:37 AM
                          0 responses
                          6 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-26-2026, 11:10 AM
                          0 responses
                          16 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-17-2026, 06:09 AM
                          0 responses
                          51 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-09-2026, 11:58 AM
                          0 responses
                          110 views
                          0 reactions
                          Last Post SEQadmin2  
                          Working...