Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is high Kmer coverage in PE reads, good?

    Hi
    I am doing de novo assembly of a bacterial genome, through paired end reads using velvet.
    If I use Kmer length 75, in the Kmer coverage formula along with other parameters I get:

    Ck = C * (L - k + 1) / L
    Ck=(2935775*250*2/2900000) *(250-75+1)/250
    =356

    where length of reads in 250
    Number of reads:2935775
    expected genome size is 2900000

    Velvet manual says : If Ck is above 20, you might be "wasting" coverage. What does it mean and what Kmer should I choose?
    Last edited by konika; 05-21-2014, 04:10 AM.

  • #2
    Typically, the longer the kmer, the better the assembly, until you hit the point of too little coverage. Since you have nice long reads and fairly high coverage, you will probably get a better assembly with longer kmers, maybe K=127 or even higher. The assembly should be fast, so just try with a range of kmers and look at the L50 to see which appears to be best.

    However, I find that Velvet assemblies often get worse when the coverage is really high, so you may want to reduce it with normalization or subsampling.

    Comment


    • #3
      Your genome coverage is pretty high, so I would recommend subsampling the data and then try VelvetOptimiser to find the best assembly parameters. The benefit of subsampling is usually a better assembly, but it also means the job will have lower computational requirements and faster run times.

      Comment


      • #4
        Hi, I have tried using subsample (40% of original number of reads) and run VelvetOptimiser on it, but get the result
        Velvet details:
        Velvet version: 1.2.08
        Compiled categories: 10
        Compiled max kmer length: 191
        Maximum number of velvetinstances to run: 1
        Will run velvet optimiser with the following paramters:
        Velveth parameter string:
        -shortPaired -fastq file1.fastq -shortPaired2 -fastq file2.fastq
        Velveth start hash values: 151
        Velveth end hash value: 153
        Velveth hash step value: 2
        Velvetg minimum coverage cutoff to use: 0

        Read tracking for final assembly off.
        File: file1.fastq has 1174310 reads of length 250
        File: file2.fastq has 1174310 reads of length 250
        Total reads: 2.3 million. Avg length: 250.0

        Memory use estimated to be: 230512.9GB for 1 threads.

        You probably won't have enough memory to run this job.
        Try decreasing the maximum number of threads used.
        (use the -t option to set max threads.)

        Any ideas, what should be done for this..
        Thanks

        Comment


        • #5
          Apart from this, I run velvetg on the subsample for Kmer hash 75-151, and kmercov 320-370 for each kmer length. I then compare all results for n50. Below is the result:
          file n50 total.length longest ncontig
          154 stats_h151_cov350.txt 20880 33577 20880 6
          155 stats_h151_cov360.txt 20880 33577 20880 6
          156 stats_h151_cov370.txt 20880 33577 20880 6
          151 stats_h151_cov320.txt 20880 33538 20880 6
          152 stats_h151_cov330.txt 20880 33518 20880 6
          153 stats_h151_cov340.txt 20880 33518 20880 6
          148 stats_h149_cov350.txt 20878 33569 20878 6
          149 stats_h149_cov360.txt 20878 33569 20878 6
          150 stats_h149_cov370.txt 20878 33569 20878 6
          146 stats_h149_cov330.txt 20878 33533 20878 6
          147 stats_h149_cov340.txt 20878 33508 20878 6
          145 stats_h149_cov320.txt 20878 33468 20878 6


          Looks like its in decreasing order of Kmer length for highest n50, and very small total length, something wrong with it?
          Last edited by konika; 05-26-2014, 02:42 AM. Reason: more to ask

          Comment


          • #6
            Originally posted by konika View Post
            Velvet details:
            Velvet version: 1.2.08
            Compiled categories: 10
            Try recompiling velvet with 'CATEGORIES=1'. I am assuming you only have 1 set of PE reads.


            Originally posted by konika View Post
            Will run velvet optimiser with the following paramters:
            Velveth parameter string:
            -shortPaired -fastq file1.fastq -shortPaired2 -fastq file2.fastq
            What parameters did you use to run Velvet Optimiser? It looks like you should have

            '-shortPaired -separate -fastq file1.fastq file2.fastq'

            if what you have is PE reads.

            Comment


            • #7
              Originally posted by konika View Post
              Apart from this, I run velvetg on the subsample for Kmer hash 75-151, and kmercov 320-370 for each kmer length. I then compare all results for n50. Below is the result:
              file n50 total.length longest ncontig
              154 stats_h151_cov350.txt 20880 33577 20880 6
              155 stats_h151_cov360.txt 20880 33577 20880 6
              156 stats_h151_cov370.txt 20880 33577 20880 6
              151 stats_h151_cov320.txt 20880 33538 20880 6
              152 stats_h151_cov330.txt 20880 33518 20880 6
              153 stats_h151_cov340.txt 20880 33518 20880 6
              148 stats_h149_cov350.txt 20878 33569 20878 6
              149 stats_h149_cov360.txt 20878 33569 20878 6
              150 stats_h149_cov370.txt 20878 33569 20878 6
              146 stats_h149_cov330.txt 20878 33533 20878 6
              147 stats_h149_cov340.txt 20878 33508 20878 6
              145 stats_h149_cov320.txt 20878 33468 20878 6


              Looks like its in decreasing order of Kmer length for highest n50, and very small total length, something wrong with it?


              The differences in n50 and total length are very small, I doubt that they are significant. How did you calculate the coverage, did Velvet Optimiser calculate the coverage? Velvet doesn't do so well with very high coverage.
              Last edited by mastal; 05-26-2014, 03:38 AM.

              Comment


              • #8
                Originally posted by mastal View Post
                Try recompiling velvet with 'CATEGORIES=1'. I am assuming you only have 1 set of PE reads.




                What parameters did you use to run Velvet Optimiser? It looks like you should have

                '-shortPaired -separate -fastq file1.fastq file2.fastq'

                if what you have is PE reads.
                Hi, Yes I am using just one set of paired end reads, will try with correct command and add 'CATEGORIES=1' there. New command will be :


                VelvetOptimiser.pl -s 75 -e 159 -t 1 -f '-shortPaired -separate -fastq file1.fastq -file2.fastq' --optFuncKmer 'n50' -g 2800000 -o '-exp_cov 350' 'CATEGORIES=1'
                looks ok?

                Comment


                • #9
                  Originally posted by mastal View Post
                  The differences in n50 and total length are very small, I doubt that they are significant. How did you calculate the coverage, did Velvet Optimiser calculate the coverage? Velvet doesn't do so well with very high coverage.
                  Hi, I describe the Kmer coverage calculation above in the first post. This result was from a subsample of total reads. Yes,it is quite high coverage, waiting for velvetOptimiser to run.
                  Last edited by konika; 05-26-2014, 04:27 AM.

                  Comment


                  • #10
                    Not sure that you can adjust the number of categories when running velvet optimiser, it may just be reporting the settings you have on velvet.

                    You may need to recompile velvet. Reducing the MAXKMERLENGTH (to the longest kmer you actually are going to use) when recompiling velvet should also reduce the memory usage.

                    Comment


                    • #11
                      Originally posted by konika View Post
                      Hi, I describe the Kmer coverage calculation above in the first post. This result was from a subsample of total reads. Yes,it is quite high coverage, waiting for velvetOptimer to run.
                      OK, if you have subsampled your data, then you are using fewer reads and so your coverage will be proportionally less. Can velvet optimiser give you the correct value to use for coverage?

                      Comment


                      • #12
                        I solved the memory thing, It was due to the wrong genome size.
                        The size should be in megabases so 2.8. Will post if I get useful result from VelvetOptimiser. Thanks

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 03-27-2024, 06:37 PM
                        0 responses
                        12 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-27-2024, 06:07 PM
                        0 responses
                        11 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        53 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        69 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X