Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • N50 less than 2000

    Dear all,
    I am trying to assemble a bacterial genome. I have trimmed each read and also removed low quality reads. I also removed reads having "N". The genome coverage is above 150. I used velvet and SOAPdenovo with different parameters and k-mers but always getting N50 less than 2000. Any suggestion ?? Thanks in advance

  • #2
    I recommend ABySS. Is your data paired end or single read? How long are the reads? ABySS-PE is particularly great if you iterate through a good range of kmers and find the optimum one.

    Also, 150x may be too high. You might want to split your data set in half and repeat assembly on each half. Now it is difficult to do an Illumina run that goes less than 100x on a bacterial genome. But the assemblers are not designed to work with coverages this high for the most part.

    --
    Phillip

    Comment


    • #3
      @pmiguel,
      Thanks for your response. My reads are paired-end and of 72 nucleotide long. I trimmed 12 nucleotide from the ends.

      Comment


      • #4
        Yes, try using 1/3rd or 1/2 of the data set and/or using ABySS-PE. For ABySS, kmers of 40-63 are worth trying.

        --
        Phillip

        Comment


        • #5
          Originally posted by sarbashis View Post
          I am trying to assemble a bacterial genome. I have trimmed each read and also removed low quality reads. I also removed reads having "N".
          I'd suggest something like FastQC to check how good the input data really is.

          Then i would recommend quality based trimming (probably Q20 or above given your data supply) rather then fixed length, and don't just bin a read because of an N at the end. I also strongly suggest trimming for adapters (though that may not help the N50, it will help correctness), and be careful that you don't 'unpair' the files by dropping one of a pair.

          Blatant plug: The trimmomatic will do all you need - you can get it here. PM me if you need help with it.

          Originally posted by sarbashis View Post
          The genome coverage is above 150. I used velvet and SOAPdenovo with different parameters and k-mers but always getting N50 less than 2000. Any suggestion ?? Thanks in advance
          Are these SOAP numbers based on scaffolding or just assembly. SOAP AFAIK doesn't use paired information at all in the assembly stage, so it looks very weak if judged by contigs. Drop anything shorter than 2 * k if you're checking the output of SOAP - most of the smaller shrapnel is junk.

          BTW SOAP seems to work best with a k of around 45% of the read length.

          You can also try the SOAP corrector - it helps, if sometimes marginally. If nothing else, the kmer frequency graph can be enlightening.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          18 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          22 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          47 views
          0 likes
          Last Post seqadmin  
          Working...
          X