SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
what does "edit distance" mean? efoss Bioinformatics 4 09-30-2013 03:50 AM
Edit distance in BWA Rachelly Bioinformatics 4 05-05-2012 11:19 PM
Bowtie Exec format error. Wrong Architecture. tboothby Bioinformatics 2 12-01-2011 11:25 AM
Edit distance sequence tags available (link within) bcf Sample Prep / Library Generation 1 02-12-2011 04:43 PM
PEM data: edit distance meritxellop Bioinformatics 1 07-04-2010 08:57 AM

Reply
 
Thread Tools
Old 11-06-2009, 06:03 AM   #1
biterbilen
Junior Member
 
Location: Basel

Join Date: Jun 2009
Posts: 6
Default BWA concise format output -edit distance wrong

Hi Heng,

We ran BWA on simulated data of length 40/80nc which has max 2 errors in the prefix 20nc sequence and a total of 3 errors with the following parameters.

$HOME/bin64/bwa-0.5.4/bwa aln -i 0 -n $(($n+1)) -o $(($n+1)) -e $(($n+1)) -N -t 8 -l 20 -k 2

First we couldn't get any hits with $n=2 although the initial number of errors are supposed to <=3.
After running with $n=3 we got hits with edit distance >=4 when the alignment is done for the BWA output region, these turned out to be 3 edit distance hits. (See 2 examples below; I can provide more if necessary)

Is is due to a bug in the software?


>chr10|101603578|101603617|-|seqTGA_7580|39ins-30ins-39ins 5 5
chr10 -101603576 4
chr10 -101603575 4
chr20 +52093242 4
chr10 -101603577 4
chr10 -101603578 4
>chr10|102009379|102009418|+|seqTGT_3501|39ins-6ins-14ins 2 2
chr10 +102009377 4
chr10 +102009376 4


chr10|101603578|101603617|-|seqTGA_7580|39ins-30ins-39ins (43 nc) 43..1 chr10 101603578..101603617
chr10
errors: 3 orientation: -
ANTNCATCGTCANTCATCATCTGCATCATCATCATCATCATCA
| | |||||||| ||||||||||||||||||||||||||||||
A-T-CATCGTCA-TCATCATCTGCATCATCATCATCATCATCA

chr10|102009379|102009418|+|seqTGT_3501|39ins-6ins-14ins (43 nc) 1..43 chr10 102009379..102009418
chr10
errors: 3 orientation: +
GCTAAANCAAGTGTNGTACCGGGTTGTGGGAACGCAACGATNA
|||||| ||||||| |||||||||||||||||||||||||| |
GCTAAA-CAAGTGT-GTACCGGGTTGTGGGAACGCAACGAT-A


Biter Bilen
PhD Student
Zavolan Lab
Biozentrum
Klingelbergstrasse 50
4056 Basel Switzerland
biterbilen is offline   Reply With Quote
Old 11-06-2009, 01:23 PM   #2
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Do you map to the human genome? Is it possible to give a small test set to me?

Also note that bwa disallows gaps within 5bp towards either end of a read and the gap in these regions is likely to lead to more mismatches. Partly this strategy is good for speed and partly for accuracy as well.
lh3 is offline   Reply With Quote
Old 11-06-2009, 02:55 PM   #3
biterbilen
Junior Member
 
Location: Basel

Join Date: Jun 2009
Posts: 6
Default

Hi,

Yes, I map to the human genome.

I set -i to 0; why should we expect the problem is because of the gaps at the terminals of the tags? I also wonder why the tool gives >5 edit distance matches since the maximum edit distance should be 5 when -n is set to 4 which are 4+1 edit distance hits for non repetitive tags.

In the attachment is an archive for a fastq file, a samse formatted concise output format file by BWA, and the min edit distance alignment file for the given fastq file. I used precisely this command

$HOME/bin64/bwa-0.5.4/bwa aln -i 0 -n 4 -o 4 -e 4 -N -t 8 -l 20 -k 2

Biter Bilen
PhD Student
Zavolan Lab
Biozentrum
Klingelbergstrasse 50
4056 Basel Switzerland
Attached Files
File Type: zip bwa_edit_distance_problem.zip (22.6 KB, 5 views)
biterbilen is offline   Reply With Quote
Reply

Tags
bug, bwa, edit distance

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:47 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO