Seqanswers Leaderboard Ad

**NicoBxl** · 04-02-2012, 02:26 AM

same question for bwa

**Rocketknight** · 04-02-2012, 03:39 AM

Because a short sequence like 7 bases would map all over the place, it's very unlikely that any read aligner will handle it properly. The algorithms they use are mostly designed to handle sequences no shorter than the shortest reads that come from Illumina sequencers (32bp I think).

The good news is that since you're looking for a relatively small number of specific 7-base sequences without gaps or mismatches, a simple string search should be able to do it for you. A Python or Perl script could just loop over every line in the reference genome and print out any location where it finds one of the matching strings. If you have no idea how to code one, let me know and I'll write you one when I have a few spare minutes.

**yuelics** · 04-04-2012, 04:30 AM

Originally posted by Rocketknight View Post

Because a short sequence like 7 bases would map all over the place, it's very unlikely that any read aligner will handle it properly. The algorithms they use are mostly designed to handle sequences no shorter than the shortest reads that come from Illumina sequencers (32bp I think).

The good news is that since you're looking for a relatively small number of specific 7-base sequences without gaps or mismatches, a simple string search should be able to do it for you. A Python or Perl script could just loop over every line in the reference genome and print out any location where it finds one of the matching strings. If you have no idea how to code one, let me know and I'll write you one when I have a few spare minutes.

Hi Rocketknight,

Thanks a lot for your reply. I actually managed to get Bowtie working on the short 7mer with a few additional options. The tricky thing of writing a script to do it is that the alignment does not need to be exact (i.e. 2 mismatches somewhere in that 7mer are allowed).

**Rocketknight** · 04-04-2012, 05:57 AM

You're going to get a huge amount of matches if you search a large genome with those parameters (by my back-of-the-envelope calculations, a 7bp string with two allowed mismatches will hit by chance more than 0.1% of the time in a statistically average genome). In other words, for a 1GB genome, you should be seeing over one million matches for each 7-mer on average. Does Bowtie really report all of those matches?

Edit: If it doesn't, all isn't lost - it's definitely possible to write a string-searcher with mismatching in Python (though I give no guarantees about running time). I'm willing to help if you're stuck, it sounds like an interesting problem.

Extra edit: Whoops, mistake with my calculations. You should expect a random hit rate as high as about 0.45%. For the mouse genome (~3GB) you should expect to see around 13-14 million hits per 7-mer by chance.

**hanshart** · 03-22-2013, 02:15 AM

Originally posted by Rocketknight View Post

... it's definitely possible to write a string-searcher with mismatching in Python (though I give no guarantees about running time). I'm willing to help if you're stuck, it sounds like an interesting problem.

It's possible to use fqgrep for the approximative sequence search.

Topics	Statistics	Last Post
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, Today, 07:29 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 07:29 AM
Genetic Barcodes and Single-Cell Sequencing Illuminate Tumor Initiation and Chemoresistance in Breast Cancer by seqadmin Started by seqadmin, 10-15-2024, 06:35 AM	0 responses 11 views 0 likes	Last Post by seqadmin 10-15-2024, 06:35 AM
Study Identifies Key Protein Involved in DNA Replication Process by seqadmin Started by seqadmin, 10-14-2024, 02:44 PM	0 responses 11 views 0 likes	Last Post by seqadmin 10-14-2024, 02:44 PM
New Computational Methods Advance Genomic Studies Across Multiple Fields by seqadmin Started by seqadmin, 10-11-2024, 06:55 AM	0 responses 19 views 0 likes	Last Post by seqadmin 10-11-2024, 06:55 AM

Seqanswers Leaderboard Ad

Announcement

minimal read length accepted by Bowtie

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News