HGAP_Assembly_AdvancedHGAP (Hierarchical Genome Assembly Process) performs high quality de novo assembly using a single PacBio library prep. HGAP consists of pre-assembly, de novo assembly with Celera Assembler, and assembly polishing with Quiver.1activereferenceTruecommon/protocols/preprocessing/Fetch.1.xmlcommon/protocols/filtering/PreAssemblerSFilter.1.xmlcommon/protocols/assembly/PreAssemblerHGAP.1.xml --overlapTolerance 100 --trimHit 50 common/protocols/assembly/CeleraAssemblerHGAP.1.xmlcommon/protocols/referenceuploader/ReferenceUploaderHGAP.1.xmlcommon/protocols/mapping/BLASR.1.xmlcommon/protocols/consensus/AssemblyPolishing.1.xmlSets up inputsFilter reads for use in the pre-assembly step of HGAP, the hierarchical genome assembly process.whitelist.txtThe minimum subread length. Shorter subreads will be filtered and excluded from further analysis.500The minimum polymerase read quality determines the quality cutoff. Polymerase reads with lower quality will be filtered and excluded from further analysis.0.80The minimum polymerase read length. Shorter polymerase reads will be excluded from further analysis.500Pre-assemble long reads as the first step of the Hierarchical Genome Assembly process (HGAp).FalseFalseFalseFalseCompute "Minimum Seed Read Length"TrueMinimum length of reads to use as seeds for pre-assembly500The -bestn and -nCandidates options should be approximately equal to the expected seed read coverage -minReadLength 200 -maxScore -1000 -bestn 24 -maxLCPLength 16 -nCandidates 100 -L --overlapTolerance 100 --trimHit 50 60Allows partially aligned reads to participate in pre-assembled read consensus.FalseTrims the low-quality regions from the FASTQ sequence entries.True --qvCut=50 --minSeqLen=500 Assemble with CCS reads instead of subreads. In most cases assembling with subreads will be preferred.FalseThis module wraps the Celera Assembler v7.0FalseFalseFalseFalseFalseFalseFalseFalseApproximate genome size in base pairs320000pacbioReadsTrue500Seconds to wait for runCA outputs to be copied into job dir.
600Fold coverage to target when picking frgMinLen for assembly.
Typically 15 to 25.
15Overlapper error rate0.06Overlaps shorter than this length are not computed.40Sets the length of the seeds used by the seed and extend algorithm.14Enter the server path to an existing spec fileTrueFalsereferencesawriter -blt 8 -weltercreateSequenceDictionarysamtools faidx
BLASR maps reads to genomes by finding the highest scoring local alignment or set of local alignments between the read and the genome. The first set of alignments is found by querying an index of the reference genome, and then refining until only high scoring alignments are retained. Additional pulse metrics are loaded into the resulting cmp.h5 file to enable downstream use of the Quiver algorithm.
The maximum number of matches of each read to the reference
sequence that will be evaluated. maxHits should be greater
than the expected number of repeats if you want to spread hits
out on the genome.
10The maximum allowed divergence of a read from the reference sequence.30The minimum anchor size defines the length of the read that must match against the reference sequence.12TrueTrueTrue--seed=1 --minAccuracy=0.75 --minLength=50 --useQuality DeletionQV,IPD,InsertionQV,PulseWidth,QualityValue,MergeQV,SubstitutionQV,DeletionTagThe default option of loadPulses is 'byread'. Option 'bymetric'
is desined to sacrifice memory for increased speed, especially
for jobs of which the number of reference contigs is large. bymetricPolish a pure-PacBio assembly for maximum accuracy using the Quiver algorithm.Filter out reads with Map QV less than 10. Coverage in repeat regions shorter than the read length will be reduced.TrueHGAP_Assembly_Advanced.1.xml