Variant Analysis and Genome Assembly: Recommended Tools for Next-Level Sequencing Analysis

Published: 05-19-2023, 10:10 AM
1727 views
0 comments
- Share
- Tweet

Variant Analysis and Genome Assembly: Recommended Tools for Next-Level Sequencing Analysis
Continuing from our previous article, we share variant analysis and genome assembly tools recommended by our experts Dr. Medhat Mahmoud, Postdoctoral Research Fellow at Baylor College of Medicine, and Dr. Ming "Tommy" Tang, Director of Computational Biology at Immunitas and author of From Cell Line to Command Line.

Variant detection and analysis tools
Mahmoud classifies variant detection work into two main groups: short variants (<50 base-pairs), which include single nucleotide variants (SNVs) and insertions and deletions (indels); and longer variants (≥50 bp) such as structural variants (SV). Similarly, he divides variant analysis tools into two categories, one tailored for short-read data and another specifically designed to handle long-read data.

One exception to this separation is PRINCESS, a comprehensive variant analysis tool that takes the reads, aligns them using several available tools, and then calls short and long variants while additionally phasing them. PRINCESS can detect haplotype-resolved SNVs, SVs, and methylation events. Mahmoud is a developer of this powerful tool, which has the framework to perform QC and long-read analysis.

Short variants with short reads
Our next recommendation comes from Tang, who suggests using GATK (Genomic Analysis ToolKit) for variant analysis. This analysis toolkit is an industry standard for variant discovery, and it provides a wide range of tools for different variant workflows. In addition, Tang explains that Illumina's analysis platform, DRAGEN (Dynamic Read Analysis for GENomics), is another great tool if one has access to it. The combination of these two resources forms DRAGEN-GATK, which can further streamline and improve the variant analysis process.

Mahmoud recommends two more resources for short variant work using short reads. The first is FreeBayes, a haplotype-based variant detector. It can detect variants in regions with low read coverage and is well-suited for large-scale sequencing projects. The other recommendation is for samtools, one of the most well-known variant detection platforms. Instead of a single tool, samtools is a collection of comprehensive programs used for read alignment and variant calling. This bioinformatics toolset can process and analyze DNA sequence alignment data, enabling various operations such as format conversion, filtering, and variant calling.

Short variants with long reads
Beginning with DeepVariant, Mahmoud suggests several tools that can be used with sequencing data generated from long-read instruments. DeepVariant can work with short- and long-read data, and it uses a deep learning-based variant caller that is capable of detecting variants in complex regions. The next tool, Clair, is specifically used for calling variants with single-molecule sequencing data. It is a germline small variant caller that uses pileup data and deep neural networks. The creators of Clair have also more recently released an updated version, Clair3, and a Nanopore-specific variant caller, Clair3-trio, which is designed for trio variant calling.

Two other highly utilized variant callers for long reads are Longshot and Medaka. Longshot uses haplotype information from the long-read data to correctly detect and phase SNVs in diploid genomes. Alternatively, Medaka is an ONT-specific tool designed for creating consensus sequences and variant calls. Users should also note that the diploid variant calling workflow for Medaka has been deprecated and it’s recommended to use Clair3 instead.

Structural variants with short reads
Parliament2 stands as a consensus SV framework that combines multiple top-performing methods to efficiently identify high-quality SVs from short-read DNA sequencing data on a large scale. Another popular tool named DELLY is specifically made for detecting various types of SVs, including deletions, tandem duplications, inversions, and translocations. It utilizes paired-end and split-read data to accurately identify these structural variations.

LUMPY, a commonly employed tool for detecting structural variants, takes paired-end and split-read data to detect structural variants. It also incorporates read-depth information, enhancing its ability to identify SVs accurately. Finally, Manta is a versatile solution for SV detection that utilizes both paired-end and split-read data to detect a wide range of structural variants, such as deletions, insertions, inversions, and complex rearrangements.

Structural variants with long reads
The first tool Mahmoud suggests for detecting structural variants from long-read data is Sniffles. There is now a newer version called Sniffles2, which offers a complete redesign with enhanced capabilities for germline SV calling. It also facilitates family and population SV calling on a larger scale and introduces innovative approaches for identifying mosaic SVs. In addition, cuteSV is a long-read-based approach that enables in-depth analysis of the complex signatures of structural variants inferred from read alignments. Originally developed for constructing the syndip benchmark dataset, Dipcall is a variant-calling pipeline that operates based on a reference, specifically designed for a pair of phased haplotype assemblies. The last resource, PBSV, is actually a suite of tools for PacBio long-read sequencing data. These tools call and analyze SVs in diploid genomes, with single-sample calling and joint (multi-sample) calling provided.

Genome assembly and analysis tools
Assembling genomes involves different tools depending on the read lengths used for the process. True to their name, assemblies from short reads utilize smaller DNA fragments that are generally high in coverage but have a limited ability to resolve complex genomic regions. Conversely, long-read assemblies use longer DNA fragments, allowing for higher resolution of complex genomic regions but typically have lower coverage.

Short-read assemblies
For short-read genome assemblies, Mahmoud recommends SPAdes, ABySS, Velvet, and SOAPdenovo2. SPAdes is known for its ability to handle diverse sequencing data types and produce high-quality assemblies. ABySS employs a de Bruijn graph approach and is particularly adept at handling large and complex genomes. Velvet stands out for its fast and memory-efficient performance, making it suitable for small to medium-sized genomes. Additionally, SOAPdenovo2 is specifically designed to handle large and complex genomes while aiming to minimize errors during the assembly process. Each of these assemblers offers valuable tools for researchers working with different genomic data types and sizes, catering to various assembly needs.

Long-read assemblies
There are several influential tools Mahmoud advocates for long-read assembly. Canu is a popular choice that can effectively handle various types of long-read data and produce high-quality assemblies. Shasta, along with its polishing algorithms MarginPolish and HELEN, is a de novo long-read assembler that offers reliable assembly solutions. Specifically designed for long-read data, Flye is a tool recognized for its ability to generate highly accurate assemblies. For metagenome assembly, metaFlye provides a scalable solution using repeat graphs. Lastly, wtdbg2 is a de novo assembler that employs a repeat graph approach, making it well-suited for handling long-read data.

Attached is a PDF containing links to the websites, GitHub pages, and original publications for each resource. If you use a tool that wasn’t listed in this article, log in and tell us about the tool in the comments below! And don’t forget to read our final article on tool recommendations.

Attached Files

Sequencing Analysis Tools2.pdf (384.5 KB, 188 views)
Tags: None

Likes 1
Please sign into your account to post comments.

Investigating the Gut Microbiome Through Diet and Spatial Biology

by seqadmin

The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health¹. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
- Channel: Articles
02-24-2025, 06:31 AM
Quality Control Essentials for Next-Generation Sequencing Workflows

by seqadmin

Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

Nucleic Acid Quality Control
Preparing for NGS starts with isolating the...
- Channel: Articles
02-10-2025, 01:58 PM
An Introduction to the Technologies Transforming Precision Medicine

by seqadmin

In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...
- Channel: Articles
01-27-2025, 07:46 AM

TIGR Systems Offer a Compact Alternative to CRISPR for Gene Editing

by seqadmin

A search for new genome-editing tools has led researchers from MIT’s McGovern Institute and the Broad Institute of MIT to identify a distinct class...
- Channel: News
03-03-2025, 01:15 PM
Highlights from AGBT 2025 – Part II

by seqadmin

Continuing from our previous article, we’re reviewing the top announcements from the AGBT 2025 General Meeting. In this...
- Channel: News
02-28-2025, 12:58 PM
Highlights from AGBT 2025 – Part I

by seqadmin

Commemorating 25 years of innovation, the Advances in Genome Biology and Technology (AGBT) conference launched this...
- Channel: News
02-24-2025, 02:48 PM
Selecting the Right AI Model for Bioinformatics Research

by seqadmin

A team of researchers led by Jianxin Wang at the School of Computer Science and Engineering, Central South University, conducted an extensive analysis...
- Channel: News
02-21-2025, 02:46 PM

Seqanswers Leaderboard Ad

Announcement

Variant Analysis and Genome Assembly: Recommended Tools for Next-Level Sequencing Analysis

Variant Analysis and Genome Assembly: Recommended Tools for Next-Level Sequencing Analysis

About the Author

Latest Articles

ad_right_rmr

News