Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • HGAP and PBcR self-correction

    I'd like to know about difference between HGAP and PBcR self-correction.
    Both pipelines perform DeNovo assembly by using PacBio reads only.
    And these have Preassembly step and need to run Quiver after running CeleraAssembler.

    What is the difference of these?

    Many thanks,
    Last edited by kende; 05-28-2015, 02:46 AM.

  • #2
    The basic difference is the algorithm used to calculate the overlaps / alignments for the preassembly step. Using the MHAP algorithm in PBcR sould be significantly quicker for large genomes than HGAP. The Celera Assembler OLC (overlap layout consensus) after the preassembled reads are generated is essentially the same in both pipelines. One difference with quiver is that is is automatically ran in HGAP, but with PBcR you will have to run the resequencing and quiver step independently.
    Check out https://github.com/PacificBiosciences/Bioinformatics-Training/wiki for more info.

    Comment


    • #3
      Thank you very much for your reply!
      I understand.

      Comment


      • #4
        I only worked with bacterial genomes <10MB using , does it make sense to switch to PBcR-MHAP?

        I read the MHAP paper and noticed the following:

        PBcR-MHAP seems to be 10x faster but lower in sensitivity. True?

        In the MHAP paper, it says it was using -minReadLength 2000 for BLASR. I found that by default, HGAP set -minReadLength to 200. Would it make sense for me to set to 2000?

        HGAP takes bax.h5 files but PBcR takes filtered fastq. Does that mean PBcR is missing some preprocessing steps?

        What are the advantages of using PBcR-BLASR over HGAP?

        Comment


        • #5
          I'll try to give my opinion one question at a time.
          I only worked with bacterial genomes <10MB using , does it make sense to switch to PBcR-MHAP?
          Probably not, unless you sequence enough that HGAP is a computational bottleneck, have difficulties running HGAP on your system for technical reasons, or you are struggling to get sufficient coverage, unlikely given one bacteria a cell. The latest version of PBcR-MHAP may give better low coverage assemblies (see release notes).
          PBcR-MHAP seems to be 10x faster but lower in sensitivity. True?
          I believe this was true when the paper was initially published, but the latest release of PBcR-MHAP has a sensitive setting, which I think is comparable to HGAP, but I have not compared the two.
          In the MHAP paper, it says it was using -minReadLength 2000 for BLASR. I found that by default, HGAP set -minReadLength to 200. Would it make sense for me to set to 2000?
          Given sufficient coverage, likely for <10Mb, increasing the minReadLength will probably increase the robustness of HGAP, but in most cases you will probably see very little difference.
          HGAP takes bax.h5 files but PBcR takes filtered fastq. Does that mean PBcR is missing some preprocessing steps?
          No, the filtered.fastq is a simple subset of the data in the bax.h5, HGAP also runs the same filtering process before running the assembly, but this step is largely hidden. This does get to one of the major reasons I would stick to HGAP, quiver correction requires a cmp.h5, which has to be generated from the bax.h5. in HGAP quiver correction is automatic, when running PBcR the quiver correction has to be done 'manually' at the end.
          What are the advantages of using PBcR-BLASR over HGAP?
          I would expect very similar results, I don't see any advantage other than PBcR may be easier to install and maintain. The disadvantage is that you don't get the quiver correction and the nice SMRT Portal gui for job management.

          Comment


          • #6
            Thanks rhall for your very detailed reply!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            57 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            45 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Working...
            X