Google Health recently released a new technique to characterize sequencing errors. The tool, named best (Bam Error Stats Tool), is a command-line tool that uses accurate reference assemblies to quantify sequencing errors.
Similar methodologies to examine sequencing data quality have been created; however, best has improved upon these processes while introducing additional metrics for characterizing sequencing errors that can be applied to each sequencing technology.
The basic process of best first begins by iterating through reads aligned to an accurate reference assembly, then counting the type and number of errors, and finally grouping the values into different summary files.
The output can include errors involved with the distribution of indel lengths, the sequencing yield based on an error threshold, and quantifying errors such as mismatches and indels. best can also provide error distributions arranged by GC content, read length, quality score, and additional variables.
In addition, best is helpful for evaluating DeepConsensus and can be used to assess basecalling and other sequencing methods.
Further information about best can be found on GitHub, and the current preprint can be read here.
Similar methodologies to examine sequencing data quality have been created; however, best has improved upon these processes while introducing additional metrics for characterizing sequencing errors that can be applied to each sequencing technology.
The basic process of best first begins by iterating through reads aligned to an accurate reference assembly, then counting the type and number of errors, and finally grouping the values into different summary files.
The output can include errors involved with the distribution of indel lengths, the sequencing yield based on an error threshold, and quantifying errors such as mismatches and indels. best can also provide error distributions arranged by GC content, read length, quality score, and additional variables.
In addition, best is helpful for evaluating DeepConsensus and can be used to assess basecalling and other sequencing methods.
Further information about best can be found on GitHub, and the current preprint can be read here.