Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • velvet assembler, roadmaps

    Hello. Velvet firstly hashes all the reads, creating s.c "roadmaps". Could someone explain how exactly that helps in the construction of de Bruijn graph? Thank you.

  • #2
    Your question is probably better directed to the velvet-users list. Daniel Zerbino recently posted this answer to a similar question:

    Also, have you read the Velvet paper and manual?
    An international, peer-reviewed genome sciences journal featuring outstanding original research that offers novel insights into the biology of all organisms



    Code:
    The Roadmap file is not normally meant to be parsed by the user, it is simply internal to velvet.
    
    However, if you really want to know the format is:
    
    ROADMAP    $query_id
    $target_id    $prev_kmers    $target_start    $target_finish
    etc.
    
    Where:
    $target_id is negative if on reverse strand
    [ $target_start, $target_finish [ is the domain of the target sequence being aligned (note that it is inclusive at the start and exclusive at the finish)
    $prev_kmers is the number of unaligned k-mers in the query sequence before the alignment to the target (note in the examples below how this value does not necessarily change from alignment to alignment, this means that they are contiguous on the query.)

    Comment


    • #3
      I've read those papers and searched/thought a lot on that. Still no idea how it works exactly. Only assumptions. I need it for a presentation. Hashing k-mers seems to have no sense for winning the time if some regions in two reads overlap in k-1 chars. Only in cases when you have at least k length identical sequences in two or more reads.

      Example:
      ACCTCGAT GCTCTAGG
      ACCT GCTC
      CCTC CTCT
      CTCG TCTA
      TCGA CTAG
      CGAT TAGG
      CCTC and CTCT overlap in k-1, but roadmaps are helpless here cause then don't bind these two reads although they are connected because of the afore-mentioned two k-mers.

      PS. Thx for the tip, just found the list and wrote to them.
      Last edited by bioinf; 01-11-2011, 04:59 AM.

      Comment


      • #4
        Originally posted by bioinf View Post
        Only in cases when you have at least k length identical sequences in two or more reads.
        My limited understanding is that needs to be the case. Either way I'm sure you'll get the answer on the Velvet mailing list.

        Comment


        • #5
          Daniel was very kind to write to me personally. What he wrote is:
          In the assembly process of Velvet, an arc (i.e. a k-1 mer overlap between two k-mers) is only instantiated if a read actually goes from one k-mer to the next.

          In your case, no read goes from CCTC to CTCT (simply because sequence CCTCT does not exist) therefore it is not picked up.
          Now I have to understand his reply. But why is it required that we have k identical chars? De Bruijn graph requires k-1 overlaps and throughout the paper we're told that arcs represent overlaps in k-1 chars.

          Comment


          • #6
            Originally posted by bioinf View Post
            Daniel was very kind to write to me personally. What he wrote is:

            In the assembly process of Velvet, an arc (i.e. a k-1 mer overlap between two k-mers) is only instantiated if a read actually goes from one k-mer to the next.

            In your case, no read goes from CCTC to CTCT (simply because sequence CCTCT does not exist) therefore it is not picked up.
            Now I have to understand his reply. But why is it required that we have k identical chars? De Bruijn graph requires k-1 overlaps and throughout the paper we're told that arcs represent overlaps in k-1 chars.

            Where does it say you need k identical chars? You need k-1 identical overlap..

            The example he shows means that a read needs to have a read containing CCTCT before it is picked up. Both your reads in the example (ACCTCGAT and GCTCTAGG) do not contain CCTCT.

            Comment


            • #7
              Ok, I see now. If the second read has CCTCT inside of it, it means it has two k-mers there CCTC and CTCT, which leads to a situation where I have the same CCTC k-mer in both reads. I guess what he implicitly meant was that you actually have to have identical k-mers in two reads in order to draw an arc from that X k-mer of the first read to a Y k-mer of the second read following right after that identical X k-mer of the very second read.

              It is then in some sense related to the transitivity rule.
              There is X in read 1
              There is X in read 2
              There is Y in read 2 connected to X in read 2
              -----
              Conclusion: draw arc from X of read 1 to Y of 2; thus you clearly see the connection between read 1 and read 2

              Am I warmer?
              Last edited by bioinf; 01-12-2011, 02:15 AM.

              Comment


              • #8
                Yes, i think that is it. However, since you have already had contact with Daniel Zerbino, you could better ask him, because he is the one who developed Velvet and knows it better than anyone here on this forum

                Comment


                • #9
                  Yes, i think that is it. However, since you have already had contact with Daniel Zerbino, you could better ask him, because he is the one who developed Velvet and knows it better than anyone here on this forum
                  But thx for help guys. You helped a lot anyway.


                  I tried to make a small demo of how velvet works from the command line. Here is my fasta file:
                  >SEQUENCE_0
                  ACCTAG
                  >SEQUENCE_1
                  CCTAGAACG
                  When I try to assemble these two reads by doing:
                  velveth result/ 5 sequence.fst
                  velvetg result/ -min_contig_lgth 1 -long_mult_cutoff 1
                  What I get is an empty contig.fa Why does it happen?

                  Here is the output:
                  velveth:
                  [0.000001] Reading FastA file sequence.fst;
                  [0.000087] 2 sequences found
                  [0.000097] Done
                  [0.000155] Reading read set file result//Sequences;
                  [0.000192] 2 sequences found
                  [0.000759] Done
                  [0.000770] 2 sequences in total.
                  [0.000805] Writing into roadmap file result//Roadmaps...
                  [0.000832] Inputting sequences...
                  [0.000839] Inputting sequence 0 / 2
                  [0.000960] Done inputting sequences
                  [0.000969] Destroying splay table
                  [0.001020] Splay table destroyed
                  velvetg:
                  [0.000001] Reading roadmap file result//Roadmaps
                  [0.000102] 2 roadmaps reads
                  [0.000128] Creating insertion markers
                  [0.000138] Ordering insertion markers
                  [0.000156] Counting preNodes
                  [0.000165] 2 preNodes counted, creating them now
                  [0.000220] Adjusting marker info...
                  [0.000230] Connecting preNodes
                  [0.000260] Cleaning up memory
                  [0.000267] Done creating preGraph
                  [0.000274] Concatenation...
                  [0.000288] Renumbering preNodes
                  [0.000294] Initial preNode count 2
                  [0.000304] Destroyed 1 preNodes
                  [0.000311] Concatenation over!
                  [0.000317] Clipping short tips off preGraph
                  [0.000325] Concatenation...
                  [0.000330] Renumbering preNodes
                  [0.000336] Initial preNode count 1
                  [0.000343] Destroyed 1 preNodes
                  [0.000349] Concatenation over!
                  [0.000355] 1 tips cut off
                  [0.000361] 0 nodes left
                  [0.000412] Writing into pregraph file result//PreGraph...
                  [0.000489] Reading read set file result//Sequences;
                  [0.000533] 2 sequences found
                  [0.000605] Done
                  [0.000659] Reading pre-graph file result//PreGraph
                  [0.000690] Graph has 0 nodes and 2 sequences
                  [0.000712] Correcting graph with cutoff 0.200000
                  [0.000945] Determining eligible starting points
                  [0.000959] Done listing starting nodes
                  [0.000966] Initializing todo lists
                  [0.000972] Done with initilization
                  [0.000979] Activating arc lookup table
                  [0.000985] Done activating arc lookup table
                  [0.000992] Concatenation...
                  [0.000998] Renumbering nodes
                  [0.001004] Initial node count 0
                  [0.001012] Removed 0 null nodes
                  [0.001022] Concatenation over!
                  [0.001031] Clipping short tips off graph, drastic
                  [0.001037] Concatenation...
                  [0.001043] Renumbering nodes
                  [0.001049] Initial node count 0
                  [0.001055] Removed 0 null nodes
                  [0.001062] Concatenation over!
                  [0.001067] 0 nodes left
                  [0.001118] Writing into graph file result//Graph...
                  [0.001168] WARNING: NO COVERAGE CUTOFF PROVIDED
                  [0.001177] Velvet will probably leave behind many detectable errors
                  [0.001184] See manual for instructions on how to set the coverage cutoff parameter
                  [0.001195] Removing contigs with coverage < -1.000000...
                  [0.001207] Concatenation...
                  [0.001213] Renumbering nodes
                  [0.001218] Initial node count 0
                  [0.001225] Removed 0 null nodes
                  [0.001232] Concatenation over!
                  [0.001238] Concatenation...
                  [0.001244] Renumbering nodes
                  [0.001249] Initial node count 0
                  [0.001256] Removed 0 null nodes
                  [0.001262] Concatenation over!
                  [0.001271] Clipping short tips off graph, drastic
                  [0.001278] Concatenation...
                  [0.001284] Renumbering nodes
                  [0.001290] Initial node count 0
                  [0.001296] Removed 0 null nodes
                  [0.001302] Concatenation over!
                  [0.001308] 0 nodes left
                  [0.001314] WARNING: NO EXPECTED COVERAGE PROVIDED
                  [0.001321] Velvet will be unable to resolve any repeats
                  [0.001327] See manual for instructions on how to set the expected coverage parameter
                  [0.001335] Concatenation...
                  [0.001340] Renumbering nodes
                  [0.001346] Initial node count 0
                  [0.001353] Removed 0 null nodes
                  [0.001359] Concatenation over!
                  [0.001393] Writing contigs into result//contigs.fa...
                  [0.001427] Writing into stats file result//stats.txt...
                  [0.001788] Writing into graph file result//LastGraph...
                  [0.001859] EMPTY GRAPH
                  Final graph has 0 nodes and n50 of 0, max 0, total 0, using 0/2 reads
                  Last edited by bioinf; 01-12-2011, 02:15 AM.

                  Comment


                  • #10
                    Probably has something to do with your coverage

                    Try to increase the size of your first read and double them for increased coverage;

                    >SEQUENCE_0
                    ACCTAGAACGT
                    >SEQUENCE_1
                    ACCTAGAACGT
                    >SEQUENCE_2
                    CCTAGAACGTT
                    >SEQUENCE_3
                    CCTAGAACGTT

                    This should work...

                    Comment


                    • #11
                      Yes, worked for me. Thx!

                      Comment


                      • #12
                        Hi all,

                        I am running denovo assembly with velvet color space build and when i run velveth_de, it terminates halfway before it writes the ./Roadmap file. I have no idea what the problem is and i am running this on a cluster with 12 cores and 2400mb memory. Version is 1.1.04.

                        this is what i see on the terminal:

                        [2468.477166] 73728280 sequences found
                        [2468.477184] Done
                        [2468.477501] Reading FastA file /v/users/okeke/reads/547_3H_doubleEncoded_input.de;
                        [2641.750344] 103910882 sequences found
                        [2641.750359] Done
                        [2641.750610] Reading FastA file /v/users/okeke/reads/547_1D_doubleEncoded_input.de;
                        [2765.845768] 79934018 sequences found
                        [2765.845785] Done
                        [2765.846042] Reading FastA file /v/users/okeke/reads/547_4D_doubleEncoded_input.de;
                        [3176.325687] 251833816 sequences found
                        [3176.325703] Done
                        [3176.434276] Reading read set file pine_denovo/Sequences;
                        Killed
                        2804.848u 794.606s 1:04:54.36 92.4% 0+0k 0+0io 30pf+0w

                        Thanks for your help in advance.
                        Last edited by urchgene; 06-21-2011, 05:08 AM.

                        Comment


                        • #13
                          Same stuff as in urchgene's post observed with velvet 1.2.08.

                          Seems like a problem is just a lack of RAM. I moved to cluster with more of it and everything works fine. Would be nice if velvet could write normal crash logs, though.
                          Last edited by A_Morozov; 02-12-2013, 09:39 PM.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM
                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          18 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          22 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          16 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-04-2024, 09:00 AM
                          0 responses
                          47 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X