Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extending contigs N50 (SOAPdenovo)?

    Hellow fellows,

    I've been assembling a genome of a snake with SOAPdenovo using big set of paired-end reads that were sequenced in Illumina hiScan.
    I already made a lot of assemblies tuning the parameters and using filtered/non-filtered reads, but I'm always getting a low contig N50 of 1.4k (against >30k on references).

    I've read something about breaking the scaffolds in contigs and using the reads again to map one end to the gap but I couldn't find a detailed explanation on how to do that. The fact is that in all assemblies that took this approach they could extend contigs lengths and N50 as a consequence.

    Could someone recommend me a text to read about that? has any of you ever done this?


    Also, another doubt I have is regarding evaluating scaffolds with QUAST (http://quast.bioinf.spbau.ru/manual.html)... I see it analyzes the scaffold file in two ways, and one of them gives me a bigger N50 (i.e. 3.5k) but I don't know the differences between the two results quast gives me.
    No problem when evaluating contigs though.
    Any clues here too?


    Thanks a lot in advance!

    Condomitti.

  • #2
    No one has tips on this??

    Comment


    • #3
      There is no different ways of calculating N50, only one way. Find a perl script that does it.

      Doesn't soap already provide you with scaffold and contig files? There should be no need to break scaffolds into contigs.

      Also, does SOAP not give you the stats? the n50, the longest contig...?

      What you are suggesting to do with reads is called read walking, where you maps reads to the ends of contigs to extend. You will generate missassemblies due to repeats while doing that.

      The best thing you can try is other assemblers, other parameters.

      Comment


      • #4
        Thank you for your reply, AdrianP!

        Yes, SOAP gives me all the assembly stats. I have N50 value indeed. My concern is regarding the low N50 value.

        I have tried SGA, which only increased that value in a few units.



        cheers,
        Condomitti.

        Comment


        • #5
          You have illumina data, SGA might not be your best choice. A few questions.

          What is the rough nucleotide coverage of the genome?
          What about the genome size?
          How long are your reads? And how long is the insert size?

          Comment


          • #6
            The genome size is 2.2Gbp

            Considering contigs > 800bp, the nucleotide coverage is 524.993.890bp


            Reads vary from 40-100 and insert size 300bp.


            N50 1549bp
            largest contig: 28.559bp


            Cheers,
            Condomitti.

            Comment


            • #7
              Originally posted by condomitti View Post
              The genome size is 2.2Gbp

              Considering contigs > 800bp, the nucleotide coverage is 524.993.890bp


              Reads vary from 40-100 and insert size 300bp.


              N50 1549bp
              largest contig: 28.559bp


              Cheers,
              Condomitti.
              I think you gave me the assembly size rather than the nucleotide coverage. Nucleotide coverage is how many reads overlap any given DNA sequence from the genome.

              Your reads vary in length. That is not normal for illumina sequencing, did you trim them, or do you have different libraries?

              What kmer values did you use when assembling with SOAP?

              Comment


              • #8
                Ohh you are right, sorry about that...

                I'm working with ~130x fold.

                I did trim them, and applied some filters to remove duplications etc.

                I have tried some different values for kmer, using both single and multi-kmer strategies.

                With single kmer, the one that generated better results was 65.
                Using multi-kmer 61-71 I could get that result I've written above.

                Comment


                • #9
                  Using untrimmed libraries, try SPAdes wither kmers 23,33,43,53,63,73

                  Comment


                  • #10
                    Thanks AdrianP! I'll take a look.

                    Cheers,
                    Condomitti.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    25 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    28 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    24 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    52 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X