Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Can someone please explain why we need to have the HashMap and store there the id of the first read where that k-mer was encountered? Is it not just sufficient to walk the graph and write down the k-mers to build up the original sequence? What is this HashMap else used for?

    Comment


    • #17
      I don't know where you got the word "HashMap" from - I think that is Java. Any association between reads and their kmers is for the purposes of paired-end resolution and read usage statistics.

      Are you going to put this presentation up somewhere?
      --
      Jeremy Leipzig
      Bioinformatics Programmer
      --
      My blog
      Twitter

      Comment


      • #18
        I'm no biologist I'm a programmer. Hash map is not related to any specific language(Java, C++ etc), it is a data structure for a O(1) constant time access to an element (at least in the best case). The article describes that we keep the info about the first occurence of the k-mer in the hashmap. What I don't get is why we would need this information for a traceback? I can assemble the sequence by just following the arcs and writing down the k-mers. Why would I need an information about the reads which are represented by those k-mers after the graph is already constructed. Is it meant that the hashmap is needed for the construction itself and only? (question to all who might know)
        Are you going to put this presentation up somewhere?
        Of course. This is my seminar presentation at the Uni.
        Any association between reads and their kmers is for the purposes of paired-end resolution and read usage statistics.
        It can't be used for the usage statistics, since the hashmap contains the information about only the first read where certain k-mer is found. There might be several reads with the same k-mer, but at our disposal is the information of the location of only one such read.

        Intuitively I think that it is done to link up all the reads which have such k-mer. Read set is analyzed one-by-one and each k-mer is added to the hash map in form of the id of the first read where it was found. Any subsequent requests in another reads for the storage of the same k-mer are denied. Afterwards when all information is stored we walk all reads again. Each time k-mer of some read is retrieved it is being looked up in the hashmap and there we find the id of the read where it was found for the first time so we can link these reads. The same is done further. We get such one-to-many correspondance. That's what I assume from the paper since it is stated unclear in it but I can't present my assumptions on the slides.
        Last edited by bioinf; 01-06-2011, 10:57 AM.

        Comment


        • #19
          If going back to the biological details. Could you please explain how repeats in the DNA lead to the gaps between contigs? Yes they are overlapped although they shouldn't be, but how does it lead to "gaps"? Since velvet cuts all tips longer than 2k, then whenever a repeat with a big portion of sequence after it is overlapped to the k-mer which was found earlier such "tip" will be discarded.
          Last edited by bioinf; 01-08-2011, 11:31 AM.

          Comment


          • #20
            @bioinf: I am not sure I fully get your question but here are my two cents. If there is a repeat then either there will be a node reported with a coverage higher than the expected coverage or there will be a loop. In the later case, assembler, while making contigs, dont know the frequency of the repeat and hence cannot connect the contigs to the right and left of the repeat and therefore report them as 2 different contigs with a gap in between...
            As far as the tips are concerned, I couldnt connect "tips" with "repeats" as I thought tips occur when there is a sequencing error at the end of the read. It has nothing to do with repeat.
            Please do correct me if I am wrong as I am also trying to understand the logic of velvet.
            Can you also post your presentation or email me?

            - Parit

            Comment


            • #21
              yes please post it
              --
              Jeremy Leipzig
              Bioinformatics Programmer
              --
              My blog
              Twitter

              Comment


              • #22
                For repeats, you can have a look at his dissertation

                We train scientists at all levels to get the most out of publicly available biological data.


                See Chapter 4. Hope this makes it more clear.

                Boetsie

                Comment


                • #23
                  Is this presentation available?
                  --
                  Jeremy Leipzig
                  Bioinformatics Programmer
                  --
                  My blog
                  Twitter

                  Comment


                  • #24
                    dude seem to have vanished :O hope presentation went fine.

                    Comment


                    • #25
                      Hey guys,
                      was anyone able to compile Velvet 1.1.04, released yesterday by D. Zerbino?

                      Code:
                      src/readSet.c:34: fatal error: zlib.h: File or directory not found
                      compilation terminated.
                      Hope someone has an idea, thanks a lot!

                      Edit: Problem is solved, thanks a lot!
                      Last edited by Jenzo; 05-20-2011, 12:40 AM. Reason: Problem solved

                      Comment


                      • #26
                        Originally posted by Jenzo View Post
                        Hey guys,
                        was anyone able to compile Velvet 1.1.04, released yesterday by D. Zerbino?

                        Code:
                        src/readSet.c:34: fatal error: zlib.h: File or directory not found
                        compilation terminated.
                        Hope someone has an idea, thanks a lot!

                        Edit: Problem is solved, thanks a lot!
                        So what was the solution?

                        Comment


                        • #27
                          you can copy the *.o files in third-party/zlib-1.2.3 from an older velvet version. I am pretty sure that they did not changed.

                          Comment


                          • #28
                            Originally posted by nilshomer View Post
                            So what was the solution?
                            I'm going to hazard a guess that they had to either install zlib or modify the makefile to link up correctly.

                            Comment


                            • #29
                              Daniel Zerbino wrote today:
                              Dear all,

                              my sincere apologies for the compilation bug which was lying in the
                              recently updated code. I have just updated the repositories. Thanks to
                              Sylvain Forêt for quickly correcting it.
                              [...]
                              Regards,

                              Daniel

                              Comment


                              • #30
                                yup Jenzo, also did get this email, but the oases compilation bug "src/readSet.c:34: fatal error: zlib.h: File or directory not found compilation terminated." is still there. ;-)

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                18 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                22 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                17 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                49 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X