Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Validating of Pacbio Assembly

    Hi !
    I'm working on an hybrid de novo assembly of a 2 Giga bases genome using Pacbio and illumina Paired End reads.
    I would like then to validate my assembly and do some controls such as detecting collapsed repeats, misassemblies and improve the consensus accuracy.
    So could you please say me which is the best tool that can do that and is suitable for large genomes like mine.

  • #2
    It is going to be difficult to validate an assembly if the only data you have is the data used for the assembly itself.
    What do your current assembly stats look like, what is your coverage of PacBio and Illumina data, N50 of the PacBio reads, insert sizes for your Illumina libraries?
    If you can assemble leaving out a specific set of data you could then use that for validation.
    For fully denovo validation, you could look at annotations / gene predictions, if you have a good gene model, or something like ALE, but I havn't seen it used on such a large genome.

    Comment


    • #3
      Dear rhall,

      My genome assembly is not finished yet, however I can give you some statistics:
      -Estimated genome size : 2Gb
      - Illumina Paired End : 40x insert size : 400bp
      -Illumina Mate pair 5kb : 4x
      - Illumina Mate pair 20kb : 1x
      - Pacbio reads : 14x with N50 : 8331bp
      I have found Pilon from broad institute as a interesting tool, however il looks as time and memory consuming

      Comment


      • #4
        The mate pair libraries will not add much, if anything to the hybrid assembly, leave them out and use them to test consistency.
        Pushing the PacBio coverage to 20X (http://wgs-assembler.sourceforge.net..._Release_Notes) would allow you to do an assembly with only the PacBio data, then use the Illumina for validation.

        Comment


        • #5
          In fact I would use mate pair libraries for for ultra-scaffolding the resulting assembly using SSPACE. Then, I would close gaps using a stand-alone program like GapFiller. I have noted that Pilon closes gaps also in addition to the assembly validation.
          Does GapFiller corrects misassemblies ?

          Comment


          • #6
            I'm not familiar with GapFiller.
            Conceptually given the N50 of the PacBio data I would assume that the coverage @ 20kb and 5kb results in higher information content in the PacBio reads than the 4x of 5kb and 1x of 20kb mate pair data. Therefore I would be surprised if the mate pair data added much information in the way of scaffolding, it may be more useful in validation.

            Comment


            • #7
              OK wright. Which tool would you suggest to me for validating assembly with illumina reads.

              Comment


              • #8
                I'm not too familiar with it myself, but AMOS contains tools that can be used for assembly validation using mate pair data, in particular the asmQC tool.

                Comment


                • #9
                  Can I use Paired end and mate pair reads with asmQC tool ?

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  25 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  29 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  24 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X