Go Back   SEQanswers > Sequencing Technologies/Companies > Pacific Biosciences

Similar Threads
Thread Thread Starter Forum Replies Last Post
IDBA-UD assembler Mark Bioinformatics 5 05-29-2013 06:28 AM
EDENA assembler vani s kulkarni Illumina/Solexa 3 02-26-2012 09:40 PM
Velvet assembler bioinf Bioinformatics 31 08-24-2011 10:19 AM
PE Assembler ewalt98 Bioinformatics 2 04-01-2011 02:21 PM
SOAPdenovo assembler shailesh Bioinformatics 6 10-04-2010 07:35 PM

Thread Tools
Old 01-18-2014, 02:41 AM   #1
Junior Member
Location: Switzerland

Join Date: Jul 2010
Posts: 3
Default FALCON assembler

I am trying to figure out the new diploid assembler (FALCON) from PacBio. I have a really silly question. The first step parameters (according to devnet) are:

python queries.fofn targets.fofn m4.fofn 72 0 16 8 64 50 50 | > p-reads-0.fa

There are three "file of files" requested but it is unclear which smrtcell files need to be in them. I guess that one file should have the "bax.h5" files, another the "bas.h5" (perhaps) but after that I am a bit stuck...

If anyone has got this to work could you post an example of which files should be linked in the queries, targets and m4 files?


oakeley is offline   Reply With Quote
Old 01-21-2014, 05:56 PM   #2
Junior Member
Location: San Francisco

Join Date: Sep 2008
Posts: 8

The developer says he is working on a step-by-step tutorial, but the short answer is that the three fofn files are generated by the script from HBAR-DTK repo on github, so you can check it out to see what it is doing.
phenotype is offline   Reply With Quote
Old 01-22-2014, 09:58 AM   #3
Senior Member
Location: San Francisco

Join Date: Aug 2012
Posts: 318

I'm not aware of anyone other than the developer that has ran this, you are definitely on the bleeding edge. I'm going to give it a try myself, will post with my experiences. As the previous poster pointed out the first step is to generate the overlap/alignment information for the raw reads using
rhall is offline   Reply With Quote
Old 02-14-2014, 06:58 AM   #4
Location: USA

Join Date: Oct 2013
Posts: 10

Do we have any updates on this ? Has the OP figured out how to get FALCON working ?
curious.genome is offline   Reply With Quote
Old 02-14-2014, 11:30 AM   #5
Senior Member
Location: San Francisco

Join Date: Aug 2012
Posts: 318

I have been using FALCON, it is relatively straight forward, my notes:

Install HBAR-DTK into a virtual env-

Then install FALCON, I had to correct the installed versions of pyparsing and rdflib:
pip install pyparsing==1.5.7
pip install rdflib==4.0.1
pip install git+
cp <SMRT_analysis>/analysis/bin/sawriter <virtual env>/bin/
Then run using the following cfg file, note a lot of the options are not required for FALCON, but I've left them in:
# list of files of the initial bas.h5 files
input_fofn = input.fofn

# The length cutoff used for seed reads used for initial mapping
length_cutoff = 6000 

# The length cutoff used for seed reads usef for pre-assembly
length_cutoff_pr = 6000

# The read quality cutoff used for seed reads
RQ_threshold = 0.75

# SGE job option for distributed mapping 
sge_option_dm = -pe smp 8 -q secondary 

# SGE job option for m4 filtering
sge_option_mf = -pe smp 4 -q secondary

# SGE job option for pre-assembly
sge_option_pa = -pe smp 16 -q secondary

# SGE job option for CA 
sge_option_ca = -pe smp 4 -q secondary

# SGE job option for Quiver
sge_option_qv = -pe smp 16 -q secondary

# SGE job option for "qsub -sync y" to sync jobs in the different stages
sge_option_ck = -pe smp 1 -q secondary

sge_option_qf = -pe smp 8 -q secondary

# blasr for initial read-read mapping for each chunck (do not specific the "-out" option). 
# One might need to tune the bestn parameter to match the number of distributed chunks to get more optimized results 
blasr_opt = -nCandidates 50 -minMatch 12 -maxLCPLength 15 -bestn 24 -minPctIdentity 70.0 -maxScore -1000 -nproc 8

#This is used for running quiver, not required for FALCON
SEYMOUR_HOME = <SMRT Analysis install>

#The number of best alignment hits used for pre-assembly
#It should be about the same as the final PLR coverage, slight higher might be OK.
bestn = 36

# target choices are "pre_assembly", "draft_assembly", "all"
# "pre_assembly" : generate pre_assembly for any long read assembler to use
# "draft_assembly": automatic submit CA assembly job when pre-assembly is done
# "all" : submit job for using Quiver to do final polish
target = mapping 

# number of chunks for distributed mapping
preassembly_num_chunk = 8 

# number of chunks for pre-assembly. 
# One might want to use bigger chunk data sizes (smaller dist_map_num_chunk) to 
# take the advantage of the suffix array index used by blasr
dist_map_num_chunk = 2

# "tmpdir" is for preassembly. A lot of small files are created and deleted during this process. 
# It would be great to use ramdisk for this. Set tmpdir to a NFS mount will probably have very bad performance.
tmpdir = /tmp

# "big_tmpdir" is for quiver, better in a big disk
big_tmpdir = /tmp

# various trimming parameters
min_cov = 8
max_cov = 64
trim_align = 50
trim_plr = 50

# number of processes used by by blasr during the preassembly process
q_nproc = 16
python <virtual env>/bin/ HBAR.cfg
You should now have the m4 file for input into FALCON.

To run on a single node as separate jobs consecutively, note this can be distributed using a queuing system:
for i in {0..15}; do
python <virtual env>/bin/ ./0-fasta_files/queries.fofn ./0-fasta_files/targets.fofn ./2-preads-falcon/m4_files.fofn 72 ${i} 16 8 64 50 50 > p-reads-${i}.fasta
Join all the preassembled reads:
cat p-reads-*.fasta > preads.fasta
Generate overlaps:
Code: --min_len 4000 --n_core 24 --d_core 3 preads.fa > preads.ovlp
Code: preads.ovlp  preads.fa
Hopefully this will allow people to get started with FALCON, a better howto is in the works.

Last edited by rhall; 02-18-2014 at 01:41 PM. Reason: mistake in the code
rhall is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 07:31 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO