RS_HGAP_AssemblyHGAP version 3. PacBio de novo assembler optimized for speed.3activereferenceTrue/PHShome/ry077/bin/smrtanalysis/userdata/references/2kb_controlcommon/protocols/preprocessing/Fetch.1.xmlcommon/protocols/filtering/PreAssemblerSFilter.1.xmlcommon/protocols/control/KeepControlReads.1.xmlcommon/protocols/assembly/PreAssemblerHGAP.3.xmlcommon/protocols/assembly/AssembleUnitig.1.xmlcommon/protocols/referenceuploader/ReferenceUploaderUnitig.1.xmlcommon/protocols/mapping/BLASR_Resequencing.1.xmlcommon/protocols/consensus/AssemblyPolishing.1.xmlSets up inputsFilter reads for use in the pre-assembly step of HGAP, the hierarchical genome assembly process.3000Subreads shorter than this value (in base pairs) are filtered out and excluded from analysis.0.80Polymerase reads with lower quality than this value are filtered out and excluded from analysis.100Polymerase reads shorter than this value (in base pairs) are filtered out and excluded from analysis./data/talkowski/Samples/XDP/PacBioBAC/BACassembly/blasr/whitelist.txtUsing DAG-based consensus algorithm, pre-assemble long reads as the first step of the Hierarchical Genome Assembly process (HGAP). Version 2 is a stepping stone for scaling to much larger genomes.FalseSpecify whether or not to compute the minimum seed read length that results in at least 30X target genome coverage, by the longest subreads. This is based on the genome size you specified.2000The minimum length of reads (in base pairs) to use as seeds for pre-assembly.6The number of pieces to split the data files into while running PreAssembler.10The number of alignments to consider for each read for a particular chunk.24The number of potential alignments BLASR should consider across all chunks for a particular read.6The minimum coverage to maintain correction for a read. If the coverage falls below that threshold, the read will be broken at that juntion. -noSplitSubreads -minReadLength 200 -maxScore -1000 -maxLCPLength 16 The -bestn and -nCandidates options should be approximately equal to the expected seed read coverageThis module runs Celera Assembler v8.1 to the unitig step, then finishes with our custom unitig consensus caller124000The approximate genome size, in base pairs.pacbioReads50025Fold coverage to target for when picking the minimum fragment length for assembly; typically 15 to 25.
0.06Trimming and assembly overlaps above this error limit won't be detected.40Overlaps shorter than this length (in base pairs) are not computed.14The length of the seeds (in base pairs) used by the seed-and-extend algorithm.The path to an existing specification file used to run the assembly program.1analysis/etc/celeraAssembler/unitig.specFalseFalseTruereferencesawriter -blt 8 -weltersamtools faidx
BLASR maps reads to genomes by finding the highest scoring local alignment or set of local alignments between the read and the genome. The first set of alignments is found by querying an index of the reference genome, and then refining until only high scoring alignments are retained. Additional pulse metrics are loaded into the resulting cmp.h5 file to enable downstream use of the Quiver algorithm.
10
The maximum number of matches of each read to the reference
sequence that will be evaluated. maxHits should be greater
than the expected number of repeats if you want to spread hits
out on the genome.
30The maximum allowed divergence (in %) of a read from the reference sequence.12The minimum size of the read (in base pairs) that must match against the reference.TrueSpecify whether or not to output a BAM representation of the cmp.h5 file.TrueSpecify whether or not to output a BED representation of the depth of coverage summary.TrueSpecify that if BLASR maps a read to more than one location with equal probability, then it randomly selects which location it chooses as the best location. If not set, defaults to the first on the list of matches.--seed=1 --minAccuracy=0.75 --minLength=50 --concordant --algorithmOptions="-useQuality"Specify additional Pbalign options. For advanced users only.DeletionQV,IPD,InsertionQV,PulseWidth,QualityValue,MergeQV,SubstitutionQV,DeletionTagbymetricThe default option of loadPulses is 'byread'. Option 'bymetric'
is desined to sacrifice memory for increased speed, especially
for jobs of which the number of reference contigs is large. Polish a pure-PacBio assembly for maximum accuracy using the Quiver algorithm.TrueSpecify whether or not to filter out reads where Map QV is less than 10. Reduces coverage in repeat regions that are shorter than the read length.settings.xml