Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
BWA with multi read files thh32 Bioinformatics 6 02-26-2014 05:56 AM
BWA XT:A:U and XA for same read? SFAB Bioinformatics 1 11-19-2013 10:42 AM
Minimum Read Length for BWA persorrels Bioinformatics 6 10-05-2013 12:03 PM
Problem with BWA mapping second read pbluescript Bioinformatics 3 10-12-2011 08:52 AM
BWA Read Length AnamikaDarwin Bioinformatics 1 04-11-2009 12:47 AM

Thread Tools
Old 05-11-2015, 06:03 PM   #1
Location: Canada, Quebec

Join Date: Apr 2012
Posts: 53
Default BWA - read identity


I'm using BWA to map my short sequencing reads (Illumina MiSeq) on a reference sequence. Is this possible to set the minimum % of identity with BWA? For exemple, if I set 90%, then all reads under this threshold will be discarded. I know the parameter "-n" can do something like this but the manual is not really clear (for me). Is "-n" the minimal absolute mismatch in term of base pairs?

Thank you!

Antony03 is offline   Reply With Quote
Old 05-12-2015, 01:11 PM   #2
Brian Bushnell
Super Moderator
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707

Since nobody else has answered this, I will mention that I do not know of a way to generically do that in BWA. The manual says:
-n NUM Maximum edit distance if the value is INT, or the fraction of missing alignments given 2% uniform base error rate if FLOAT. In the latter case, the maximum edit distance is automatically chosen for different read lengths. [0.04]
...which is clear in the case of int but unclear in the case of float. If all your reads are the same length you can calculate the edit distance that would correspond to that identity, though: e.g. 150bp * (1-0.90) = 15.

Alternately, you can use BBMap, which has an "idfilter" flag - e.g. "idfilter=0.90", which will discard reads that don't map with at least 90% identity, regardless of their length. It also has an "idtag" flag, which will annotate all the alignments in the sam file with the percent identity, so you can manually filter them if you want.
Brian Bushnell is offline   Reply With Quote
Old 05-13-2015, 01:44 AM   #3
Senior Member
Location: Cambridge, UK

Join Date: May 2010
Posts: 311

[This is a hack. I'm putting it here more for curiosity than anything else]

On the aligned bam file you could capture the read length and the NM tag ("Edit distance to the reference, including ambiguous bases but excluding clipping") and print the alignment if L/NM < threshold. With samtools/awk:

samtools view -h aln.bam \
| awk -v maxpct=0.05 -v OFS='\t' -v FS='\t' '{
    if($0 ~ /^@/){
        print $0
    } else {
        xnm=gensub(/.*\tNM:i:/, "", "g", $0); 
        nm=gensub(/\t.*/, "", "g", xnm); 
        if(pct < maxpct) {
            print $0, "XP:f:"pct
}' | less
(The L/NM goes in tag XP)
dariober is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 06:11 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO