Very Short Read aligner

Rupinder

Junior Member

Join Date: May 2009

Posts: 1
- Share
- Tweet
#1

Very Short Read aligner

05-31-2009, 12:47 PM

Hi All,

I am in process of building my own short read aligner for a lab working with cancer-genome sequencing. And have following questions:

1. As part for a our problem - we have with us read length ranging from 18-22 bp.

I am aware of several aligners currently available, and I was wondering if they were any issues at all, working with read length of above range. From all the research papers of aligners that I have gone through; usually the range has varied from 30 bp and upwards.

It would be really helpful if other members , who might have worked on similar read length, could advice me.

2. Also, are there any species-specific statistical heuristics that are known when employing an approach for short read alignment. To be more specific for example lets say for non-mammalian sequence source , I would like to set different set of parameters when doing a short read alignment as compared to mammalian source.

3. As I mentioned, I am in process of developing my own aligner, any piece of advise or suggestions would be very valuable from members who have embarked on the same . I am still in brain-storming phase, and trying the scope my problem range that this short read aligner would address. For now formally this aligner should be able to:
a. align read's length ranging 18-22 bp to reference genome
b. ungapped alignment
c. Applies a species - specific statistical scoring heuristic (If it all it makes sense to use one in first place.)

I am looking forward to any advice or suggestion or perhaps even a general discussion on process of developing a short read aligner from scratch.

thank you

regards
Rupinder
Tags: aligners, alignment tool, short read length
Torst

Senior Member

Join Date: Apr 2008

Posts: 275
- Share
- Tweet
#2

06-02-2009, 07:10 PM

Rupinder,

Originally posted by Rupinder View Post

I am looking forward to any advice or suggestion or perhaps even a general discussion on process of developing a short read aligner from scratch.

My advice is not to write (yet another) short read aligner, unless you are doing it to "learn" or as part of a student project.

There are so many already: MAQ, Shrimp, SOAP, Bowtie, ELAND, Novocraft etc. They are all doing essentially the same thing, and most of them are very efficient indeed, and you are unlikely to better them. I'm sure you could find the appropriate parameters to suit your data, and 20-24 bp reads are not a problem. The reason most of them are 30+ bp is that most Illumina data is around that length, but SOLID3 data is often 24 bp and the aligners work well. Shrimp and others have post-processing scripts to help correct for bias with sequences with non-uniform statistics.

I think you would be better off using the existing tools with appropriate settings, rather than write your own, and get on with the science further down the line.

(But if you have the time, and want to learn more about alignment, optimization, programming etc, go for it! Especially if you want to write a GPU enabled version)

--Torsten
Comment

Previous template Next

Investigating the Gut Microbiome Through Diet and Spatial Biology

by seqadmin

The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health¹. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
- Channel: Articles
02-24-2025, 06:31 AM
Quality Control Essentials for Next-Generation Sequencing Workflows

by seqadmin

Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

Nucleic Acid Quality Control
Preparing for NGS starts with isolating the...
- Channel: Articles
02-10-2025, 01:58 PM

Topics	Statistics	Last Post
TIGR Systems Offer a Compact Alternative to CRISPR for Gene Editing by seqadmin Started by seqadmin, 03-03-2025, 01:15 PM	0 responses 149 views 0 likes	Last Post by seqadmin 03-03-2025, 01:15 PM
Highlights from AGBT 2025 – Part II by seqadmin Started by seqadmin, 02-28-2025, 12:58 PM	0 responses 220 views 0 likes	Last Post by seqadmin 02-28-2025, 12:58 PM
Highlights from AGBT 2025 – Part I by seqadmin Started by seqadmin, 02-24-2025, 02:48 PM	0 responses 589 views 0 likes	Last Post by seqadmin 02-24-2025, 02:48 PM
Selecting the Right AI Model for Bioinformatics Research by seqadmin Started by seqadmin, 02-21-2025, 02:46 PM	0 responses 259 views 0 likes	Last Post by seqadmin 02-21-2025, 02:46 PM

Seqanswers Leaderboard Ad

Announcement

Very Short Read aligner

Comment

Latest Articles

ad_right_rmr

News