Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • FALCON assembler

    I am trying to figure out the new diploid assembler (FALCON) from PacBio. I have a really silly question. The first step parameters (according to devnet) are:

    python get_rdata.py queries.fofn targets.fofn m4.fofn 72 0 16 8 64 50 50 | falcon_wrap.py > p-reads-0.fa

    There are three "file of files" requested but it is unclear which smrtcell files need to be in them. I guess that one file should have the "bax.h5" files, another the "bas.h5" (perhaps) but after that I am a bit stuck...

    If anyone has got this to work could you post an example of which files should be linked in the queries, targets and m4 files?

    Thanks

    Edward

  • #2
    The developer says he is working on a step-by-step tutorial, but the short answer is that the three fofn files are generated by the HBAR_WF2.py script from HBAR-DTK repo on github, so you can check it out to see what it is doing.

    Comment


    • #3
      I'm not aware of anyone other than the developer that has ran this, you are definitely on the bleeding edge. I'm going to give it a try myself, will post with my experiences. As the previous poster pointed out the first step is to generate the overlap/alignment information for the raw reads using HBAR_WF2.py: https://github.com/PacificBiosciences/HBAR-DTK

      Comment


      • #4
        Do we have any updates on this ? Has the OP figured out how to get FALCON working ?

        Comment


        • #5
          I have been using FALCON, it is relatively straight forward, my notes:

          Install HBAR-DTK into a virtual env- https://github.com/PacificBioscience...BAR_README.rst

          Then install FALCON, I had to correct the installed versions of pyparsing and rdflib:
          Code:
          pip install pyparsing==1.5.7
          pip install rdflib==4.0.1
          pip install git+[url]https://github.com/PacificBiosciences/FALCON[/url]
          Code:
          cp <SMRT_analysis>/analysis/bin/sawriter <virtual env>/bin/
          Then run HBAR-DTK3.py using the following cfg file, note a lot of the options are not required for FALCON, but I've left them in:
          Code:
          [General]
          # list of files of the initial bas.h5 files
          input_fofn = input.fofn
          
          # The length cutoff used for seed reads used for initial mapping
          length_cutoff = 6000 
          
          # The length cutoff used for seed reads usef for pre-assembly
          length_cutoff_pr = 6000
          
          # The read quality cutoff used for seed reads
          RQ_threshold = 0.75
          
          # SGE job option for distributed mapping 
          sge_option_dm = -pe smp 8 -q secondary 
          
          # SGE job option for m4 filtering
          sge_option_mf = -pe smp 4 -q secondary
          
          # SGE job option for pre-assembly
          sge_option_pa = -pe smp 16 -q secondary
          
          # SGE job option for CA 
          sge_option_ca = -pe smp 4 -q secondary
          
          # SGE job option for Quiver
          sge_option_qv = -pe smp 16 -q secondary
          
          # SGE job option for "qsub -sync y" to sync jobs in the different stages
          sge_option_ck = -pe smp 1 -q secondary
          
          sge_option_qf = -pe smp 8 -q secondary
          
          # blasr for initial read-read mapping for each chunck (do not specific the "-out" option). 
          # One might need to tune the bestn parameter to match the number of distributed chunks to get more optimized results 
          blasr_opt = -nCandidates 50 -minMatch 12 -maxLCPLength 15 -bestn 24 -minPctIdentity 70.0 -maxScore -1000 -nproc 8
          
          #This is used for running quiver, not required for FALCON
          SEYMOUR_HOME = <SMRT Analysis install>
          
          #The number of best alignment hits used for pre-assembly
          #It should be about the same as the final PLR coverage, slight higher might be OK.
          bestn = 36
          
          # target choices are "pre_assembly", "draft_assembly", "all"
          # "pre_assembly" : generate pre_assembly for any long read assembler to use
          # "draft_assembly": automatic submit CA assembly job when pre-assembly is done
          # "all" : submit job for using Quiver to do final polish
          target = mapping 
          
          # number of chunks for distributed mapping
          preassembly_num_chunk = 8 
          
          # number of chunks for pre-assembly. 
          # One might want to use bigger chunk data sizes (smaller dist_map_num_chunk) to 
          # take the advantage of the suffix array index used by blasr
          dist_map_num_chunk = 2
          
          # "tmpdir" is for preassembly. A lot of small files are created and deleted during this process. 
          # It would be great to use ramdisk for this. Set tmpdir to a NFS mount will probably have very bad performance.
          tmpdir = /tmp
          
          # "big_tmpdir" is for quiver, better in a big disk
          big_tmpdir = /tmp
          
          # various trimming parameters
          min_cov = 8
          max_cov = 64
          trim_align = 50
          trim_plr = 50
          
          # number of processes used by by blasr during the preassembly process
          q_nproc = 16
          Code:
          python <virtual env>/bin/HBAR_WF3.py HBAR.cfg
          You should now have the m4 file for input into FALCON.

          To run on a single node as separate jobs consecutively, note this can be distributed using a queuing system:
          Code:
          for i in {0..15}; do
          python <virtual env>/bin/get_rdata.py ./0-fasta_files/queries.fofn ./0-fasta_files/targets.fofn ./2-preads-falcon/m4_files.fofn 72 ${i} 16 8 64 50 50 > p-reads-${i}.fasta
          done
          Join all the preassembled reads:
          Code:
          cat p-reads-*.fasta > preads.fasta
          Generate overlaps:
          Code:
          falcon_overlap.py --min_len 4000 --n_core 24 --d_core 3 preads.fa > preads.ovlp
          Assemble:
          Code:
          falcon_asm.py preads.ovlp  preads.fa
          Hopefully this will allow people to get started with FALCON, a better howto is in the works.
          Last edited by rhall; 02-18-2014, 01:41 PM. Reason: mistake in the code

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          51 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          67 views
          0 likes
          Last Post seqadmin  
          Working...
          X