SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Odd bwa-mem alignment myronpeto Bioinformatics 16 03-12-2015 11:40 AM
bwa mem -R option dakin Bioinformatics 2 05-02-2013 07:42 AM
BWA conflicting flags ekageyama Bioinformatics 0 09-10-2012 03:47 AM
velveth_de running time and mem Ori Bioinformatics 10 11-03-2011 04:39 PM
Sam flags for bwa-aligned paired end reads with identical + / - strand coordinates spark Bioinformatics 0 03-09-2011 04:00 AM

Reply
 
Thread Tools
Old 05-20-2013, 03:06 PM   #1
mturchin20
Junior Member
 
Location: Chicago

Join Date: May 2013
Posts: 2
Default BWA-MEM and X0/X1/etc flags

Hey everyone,

I'm currently running bwa mem on some 454 single-end reads I have. My past experience with the bwa/samtools setup was with Illumina paired-end reads and using bwa aln/sampe (different projects). After looking at some of my first output for the bwa mem algorithm, I noticed there are no X0/X1 flags, which in the sampe output represent qualities such as number of best hits and number of suboptimal hits, respectively. Looking at the description of the bwa mem algorithm though on the bwa webpage (http://bio-bwa.sourceforge.net/bwa.shtml), it sounds like maybe the mem algorithm will only produce exact/unique matches, such that the X0/X1 flags are no longer needed (MEM -- maximal exact matches)? I'm interested in knowing this since I'd like to filter reads based on whether they have matches to multiple locations in the genome, and the X0/X1 tags were useful for doing this previously.

Has anyone played around with bwa mem yet and/or can comment on whether they still got X0/X1 flags when using it?

Thanks!

Last edited by mturchin20; 05-20-2013 at 03:18 PM.
mturchin20 is offline   Reply With Quote
Old 05-21-2013, 01:39 PM   #2
mturchin20
Junior Member
 
Location: Chicago

Join Date: May 2013
Posts: 2
Default

So I've looked into this a little further and thought I'd just put up what I'm currently thinking in case anyone has something else to contribute.

It looks like while the bwa samse/sampe produces X0/X1 tags, the bwa mem/bwa bwasw do not produce such tags. Instead (and now just specifically referring to bwa mem), it looks like the XP tag is used to identify both 1) long reads that have split alignments and 2) reads that have multiple alignments overall. In the former situation (of a long read being split), both reads will have a XP tag. Using the -M flag will cause the shorter read of the two to be flagged as "the alignment is not primary" in the bitwise FLAG field (column 2). Not including the -M flag will leave both reads as appearing 'primary' (which I'm guessing is what causes the issues with Picard/GATK that the documentation mentions?). In the latter situation (of a read having multiple hits overall), you can see the other sequence hits using the -a flag. In these situations, the 'primary' read does not contain a XP tag but the sub-primary reads do.

Given this situation, I still think it isn't straight forward to determine whether a read is 'uniquely mapped' or not. I've read about using the MAPQ value as a way to determine this, e.g. a MAPQ of 0 means the read is not uniquely mapped, but I feel like I see examples where the MAPQ is >0 and there are multiple alignments for that read. For example:

HXBHIOF01CJXWW 0 10 69647562 22 23M1D5M2I39M * 0 0 AACCTCAGGTGATCCGTCCGCCTGGCCTGCCCCAAAGTGCTGGGCTTACAGGCATGTGCCACCTTGCCC EE???:5332=CCC==<=>B@@><;5557444244;??CCC;;;;AED???EFECED>==?;:766854 NM:i:5 AS:i:46 XS:i:36
HXBHIOF01CJXWW 256 X 94956752 0 23M1D5M2I33M6S * 0 0 * * NM:i:5 AS:i:36 XP:Z:10,+69647562,23M1D5M2I39M,22,5;
HXBHIOF01CJXWW 272 19 16193041 0 39M2I5M1D23M * 0 0 * * NM:i:7 AS:i:36 XP:Z:10,+69647562,23M1D5M2I39M,22,5;
HXBHIOF01CJXWW 256 3 169612909 0 23M1D5M2I39M * 0 0 * * NM:i:8 AS:i:35 XP:Z:10,+69647562,23M1D5M2I39M,22,5;
HXBHIOF01CJXWW 256 12 122144914 0 22M1D6M2I26M13S * 0 0 * * NM:i:4 AS:i:34 XP:Z:10,+69647562,23M1D5M2I39M,22,5;

The first read has a MAPQ of 22 even though it maps to other regions in the genome (all with MAPQ of 0, admittedly). Additionally this is not a long read that has been split (excluding the -a flag only produces the first read).

So I guess my current thinking is that I should still use a MAPQ threshold (10 or 30?), but that this may not produce a full set of 'uniquely mapping' reads?
mturchin20 is offline   Reply With Quote
Old 07-05-2013, 05:24 PM   #3
rpique
Junior Member
 
Location: Detroit

Join Date: Jun 2013
Posts: 1
Default

Can you use the difference between the AS tag and the XS tag?
I can't see the XS tag in your output, it may come with the newest version.
rpique is offline   Reply With Quote
Old 08-08-2013, 09:11 PM   #4
Kennels
Senior Member
 
Location: Sydney

Join Date: Feb 2011
Posts: 149
Default

The "SA" tag is now used (v0.7.5) to identify reads which have chimeric alignment:
https://github.com/lh3/bwa/blob/master/NEWS
Kennels is offline   Reply With Quote
Old 04-16-2014, 11:57 AM   #5
tnp
Junior Member
 
Location: Denmark

Join Date: Nov 2008
Posts: 1
Default bam tags AS and XS

A comment to how uniquely mapped reads can be extracted. As far as I understand a read is uniquely mapped when AS and XS tags differ - pls correct me if im wrong. Actually I have seen XS values being bigger than AS values which I dont really understand - Cant find a thorough description of what XS orhter than its the score for a suboptimal alignment.

ex with XS bigger than AS:
HWUSI-EAS664L:24:64FGCAAXX:4:11:3337:6807 99 gi|407955691|dbj|AP012495.1| 1359455 0 101M = 1359704 351 CCAATGGCCGCTGCCAGTGGCGTTTTGTCGTGTCTTTCCGGGTTGGACTCAAGTTGATAGTTACCGGA
TAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTC IIHGDIIIIGIIDIIIIEHIIIIGIIGHIIIIHHHIIIIIHIEIIIHIIIHGHIIIFGHIFIHHHIICIGHHHIIIBCGHGEGHGGDGHBEGGIGEFEAD? NM:i:8 MD:Z:3G4T13A1A0A10A16A0C46 AS:i:62 XS:i:68

So is this an example of a uniquely mapped read ?
tnp is offline   Reply With Quote
Old 07-15-2015, 06:54 AM   #6
trotos
Member
 
Location: Cologne,Germany

Join Date: May 2012
Posts: 12
Default

Quote:
Originally Posted by Kennels View Post
The "SA" tag is now used (v0.7.5) to identify reads which have chimeric alignment:
https://github.com/lh3/bwa/blob/master/NEWS
hello, I dig up this post because of your comment on the SA: tag
could you please elaborate, the provided link is broken.
the main question is:
are the reads with the SA: code still unique? what about the ones that also incorporate the XA: code?
Here are some examples
Code:
BIOMICS-HISEQHI:522:HCYM5ADXX:2:1102:3525:68438 256     chr2    161581931       0       32M17H  *       0       0       TATCCATTTCATCTGGATTTTCTAGTTTATTT        CAC<E?E<E????DDF@DBEB<D?9??D99D<        NM:i:0  MD:Z:32 AS:i:32 XS:i:20 SA:Z:chr12,87311999,-,32M17S,0,0;
BIOMICS-HISEQHI:522:HCYM5ADXX:2:1212:1519:12286 256     chr2    161581931       1       32M19H  *       0       0       TATCCATTTCATCTGGATTTTCTAGTTTATTT        .):639=>?>?8??<)7;7)=8=8)0>8>?.6        NM:i:0  MD:Z:32 AS:i:32 XS:i:27 SA:Z:chr12,87311997,-,34M17S,1,1;       XA:Z:chr12,+46991828,32M19S,1;
BIOMICS-HISEQHI:522:HCYM5ADXX:2:2115:10843:5627 256     chr2    161581931       1       32M19H  *       0       0       TATCCATTTCATCTGGATTTTCTAGTTTATTT        B?@AB@AABBBBABBB<ABBB?BAB?BB?BBB        NM:i:0  MD:Z:32 AS:i:32 XS:i:27 SA:Z:chr12,87311997,-,34M17S,1,1;       XA:Z:chr12,+46991828,32M19S,1;
BIOMICS-HISEQHI:522:HCYM5ADXX:2:2115:14837:88360        256     chr2    161581931       1       32M19H  *       0       0       TATCCATTTCATCTGGATTTTCTAGTTTATTT        B?BBBBBBBB=ABBBBBBB?A?ABB0=A?BBB        NM:i:0  MD:Z:32 AS:i:32 XS:i:27 SA:Z:chr12,87311997,-,34M17S,1,1;       XA:Z:chr12,+46991828,32M19S,1;
BIOMICS-HISEQHI:522:HCYM5ADXX:2:1209:6904:71888 0       chr11   22585338        0       18S33M  *       0       0       TTTTCCCATTCTGTAGATTGTCTGTTTACTCTGATTATAGTTTATTTTGCT     IGGIIIIIIIIGA@GGGHFIIIIIFFHEIHIHFGIHIIIIFEIGHEGHCHF     NM:i:0  MD:Z:33 AS:i:33 XS:i:33 SA:Z:chr17,59076122,+,33M18S,0,0;       XA:Z:chr3,+175713108,51M,4;chr2,-116989725,16S35M,1;chr19,+21698327,7S29M15S,0;
BIOMICS-HISEQHI:522:HCYM5ADXX:2:1212:1519:12286 16      chr12   87311997        1       34M17S  *       0       0       TCTATAAACACCTCTAAGAAAATAAACTAGAAAATCCAGATGAAATGGATA     ??>?9@?=.:().))?>.)6.?>8>0)8=8=)7;7)<??8?>?>=936:).     NM:i:1  MD:Z:0A33       AS:i:33 XS:i:31 SA:Z:chr2,161581931,+,32M19S,1,0;
       XA:Z:chr9,-104599379,51M,5;chr3,+170653467,51M,5;
BIOMICS-HISEQHI:522:HCYM5ADXX:2:2115:10843:5627 16      chr12   87311997        1       34M17S  *       0       0       TCTATAAACACCTCTAAGAAAATAAACTAGAAAATCCAGATGAAATGGATA     BBBBBBABBABABABBBBBBBB?BB?BAB?BBBA<BBBABBBBAA@BA@?B     NM:i:1  MD:Z:0A33       AS:i:33 XS:i:31 SA:Z:chr2,161581931,+,32M19S,1,0;
       XA:Z:chr9,-104599379,51M,5;chr3,+170653467,51M,5;
BIOMICS-HISEQHI:522:HCYM5ADXX:2:2115:14837:88360        16      chr12   87311997        1       34M17S  *       0       0       TCTATAAACACCTCTAAGAAAATAAACTAGAAAATCCAGATGAAATGGATA     BBBBBBABB>A<B>B>A?7BBB?A=0BBA?A?BBBBBBBA=BBBBBBBB?B     NM:i:1  MD:Z:0A33       AS:i:33 XS:i:31 SA:Z:chr2,161581931,+,32M19S,1,0;       XA:Z:chr9,-104599379,51M,5;chr3,+170653467,51M,5;
BIOMICS-HISEQHI:522:HCYM5ADXX:2:1102:3525:68438 16      chr12   87311999        0       32M17S  *       0       0       TATAAACACCTCTAAGAAAATAAACTAGAAAATCCAGATGAAATGGATA       AA==5.DB8BB8=/??*<D99D??9?D<BEBD@FDD????E<E?E<CAC       NM:i:0  MD:Z:32 AS:i:32 XS:i:30 SA:Z:chr2,161581931,+,32M17S,0,0;       XA:Z:chr9,-104599381,49M,4;chr3,+170653467,49M,4;chr12,+46991828,49M,5;
BIOMICS-HISEQHI:522:HCYM5ADXX:2:1209:6904:71888 256     chr17   59076122        0       33M18H  *       0       0       TTTTCCCATTCTGTAGATTGTCTGTTTACTCTG       IGGIIIIIIIIGA@GGGHFIIIIIFFHEIHIHF       NM:i:0  MD:Z:33 AS:i:33 XS:i:32 SA:Z:chr11,22585338,+,18S33M,0,0;       XA:Z:chr4,-151801895,19S32M,0;
trotos is offline   Reply With Quote
Reply

Tags
bwa mem x0 x1

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:08 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO