Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • n50 or nodes

    in denovo assembly- do we go for the highest n50 or the smallest no. of genertaed nodes?

    also, velvetoptimizer might be good in optimizing the parameters-
    I never used it and can you advise me how to call the velvetoptimizer perl script which is present within contrib directory- what is the command please?

  • #2
    also, is there a limit for the kmer (for 250 reads)- is it ok to try up to 200 or above??

    Comment


    • #3
      The L50 (length of the contig at which 50% of the assembly is in contigs at least that long) is important. The total number of contigs or nodes is not.

      As for kmer length, just use whatever length gives the best continuity; the longest you can use depends on read length, read quality, sequencing depth. I think Velvet needs to be compiled with an indicator of maximum kmer length, though, so if you try ~200 and it fails to run you may need to recompile. Also, it's generally good to avoid using even kmer lengths due to palindromes.

      Comment


      • #4
        A good paper on the subject of K-mer optimisation is:

        Zerbino, D, R. ( 2010). Using the Velvet de novo assembler for short-read sequencing technologies. Curr Protoc Bioinformatics. 11.

        This will inform you of the best practises when assessing your de novo assembly. Below is the information required for the velvet optimiser

        Home page:



        Manual:



        The manual will provide you with the command needed and parameters.

        Comment


        • #5
          for 250 read- is there is a limit for kmer value (as in the paper, it is mentioned that the best kmer should be between 21bp and average read length-10) so for 250 read - kmer should be between 21 and 100??

          Comment


          • #6
            K-mer length is an important parameter, the higher your k-mer length the less likely you are to see this sequence by chance, however as k-mer length is increased towards the read length coverage will drop and more gaps will appear. It is kind of a trade-off.

            Maybe start with a K-mer of half your read length.

            125.

            Then do an assembly using k-mer lengths of 95, 115, 135, 145. Have a look and see what happens, this should help you to understand what is going on under the hood when you change these parameters (provided you have the computational availability).

            No. of nodes and N50 kind of one hand in hand, if you have less contigs you are likely to have a higher N50. So pick the optimum k-mer length that assembles your genome into the smallest number of contigs and a good N50. Hope this helps.

            Comment


            • #7
              yes-this very useful- I will give it a go- thank you

              Comment


              • #8
                Good luck. I would definitely encourage reading as much literature as you can on the subject of genome assembly there are many reviews out there. This advice is in no way comprehensive and to trust your assembly you will have to look at the coverage across the contigs to ensure there is no spikes. as well as maybe a comparison (mummer) to a closely related genome?

                Comment


                • #9
                  thanks again- I would like to try velvetoptimizer script - but how to install bioperl (I have perl already)

                  Comment


                  • #10

                    Comment


                    • #11
                      now how can I be sure that the assembly I got is the best?- I tried different kmer then picked the one with the highest N50 and adjusted exp_cov then cov_cutoff then ins_length until I get the highest N50-
                      how can we be sure that this is the best assembly?- should I map the raw reads to the contigs file??

                      Comment


                      • #12
                        Originally posted by mmmm View Post
                        should I map the raw reads to the contigs file??
                        Yes. A better assembly will have a higher mapping rate, higher pairing rate, lower ambiguous mapping rate, and lower error rate.

                        Comment


                        • #13
                          after mapping raw reads to the denovo assembled contigs- "using 2 different kmer- 27 and 85- from the below statistics which assembly is better (is it when kmer is 85- as annotation of both of the contigs show slightly different results)?- and I am not sure which one I should depend on?

                          Kmer (85)_remapping statistics:
                          768224 + 0 in total (QC-passed reads + QC-failed reads)
                          0 + 0 duplicates
                          766555 + 0 mapped (99.78%:-nan%)
                          768224 + 0 paired in sequencing
                          384185 + 0 read1
                          384039 + 0 read2
                          760726 + 0 properly paired (99.02%:-nan%)
                          765163 + 0 with itself and mate mapped
                          1392 + 0 singletons (0.18%:-nan%)
                          3816 + 0 with mate mapped to a different chr
                          3576 + 0 with mate mapped to a different chr (mapQ>=5)

                          Kmer (27): mapping statistics:
                          773803 + 0 in total (QC-passed reads + QC-failed reads)
                          0 + 0 duplicates
                          772153 + 0 mapped (99.79%:-nan%)
                          773803 + 0 paired in sequencing
                          387022 + 0 read1
                          386781 + 0 read2
                          763731 + 0 properly paired (98.70%:-nan%)
                          770783 + 0 with itself and mate mapped
                          1370 + 0 singletons (0.18%:-nan%)
                          8196 + 0 with mate mapped to a different chr
                          7755 + 0 with mate mapped to a different chr (mapQ>=5)

                          Comment


                          • #14
                            why total reads are different- although the same fastq files are used (but only different kmers)- I think it does not make sense??

                            Comment


                            • #15
                              any advice please in this regard??- thanks

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X